CN106338711A

CN106338711A - Voice directing method and system based on intelligent equipment

Info

Publication number: CN106338711A
Application number: CN201610760099.4A
Authority: CN
Inventors: 黄石磊; 刘轶; 王昕�; 程刚; 王序; 杨乐辉
Original assignee: Peking University Shenzhen Graduate School; Konka Group Co Ltd
Current assignee: Peking University Shenzhen Graduate School; Konka Group Co Ltd
Priority date: 2016-08-30
Filing date: 2016-08-30
Publication date: 2017-01-18

Abstract

The invention discloses a voice directing method and system based on intelligent equipment. The method comprises that after that the intelligent equipment is started, a voice signal is obtained in real time; when the voice signal is detected by the intelligent equipment, a present foreground image of the intelligent equipment is obtained, and a candidate direction of a sound source is obtained according to the present foreground image; and according to the candidate direction of a sound source, the intelligent equipment carries out calculation via a positioning algorithm, and then determines a sound source direction of the voice signal according to a calculation result. According to the invention, the candidate direction of the voice signal obtained in advance is calculated by combining image collection, the candidate direction is checked via the algorithm, the optimal sound source direction is obtained, the computing complexity is reduced, and the positioning efficiency of the voice signal is improved.

Description

A kind of speech-oriented method and system based on smart machine

Technical field

The present invention relates to voice processing technology field, more particularly, to a kind of speech-oriented method based on smart machine and be System.

Background technology

In smart machine, particularly on domestic intelligent equipment, interactive voice is a kind of important interactive mode.At present The process of microphone array has had a lot of methods, such as linearly constrained minimum variance, maximum-likelihood criterion and maximum letter Make an uproar ratio criterion, the method such as Multiple Signal Classification (music), ESPRIT Signal parameter estimation (esprit), related algorithm Typically complex.

Existing microphone array signals processing method, is exactly typically to adopt doa method, doa method is exactly direction Of arrival, ripple reaches method estimates it is that the arrival bearing being received using one's own side's radar and being derived from objective emission machine is estimated.doa Method is estimated to be based primarily upon pure acoustical treatment, and one side computing is larger it is especially desirable to travel direction is searched for, and also there is Search Error Possibility.Easily error is positioned and computation complexity is high in therefore existing voice localization method.

Therefore, prior art has yet to be improved and developed.

Content of the invention

In view of the deficiencies in the prior art, present invention aim at provide a kind of speech-oriented method based on smart machine and System easily malfunctions and the high technical problem of computation complexity it is intended to solve positioning in existing voice localization method.

Technical scheme is as follows:

A kind of speech-oriented method based on smart machine, wherein, method includes:

After a, smart machine are opened, obtain voice signal in real time；

B, when smart machine detects voice signal, obtain smart machine current foreground image, according to current foreground image Obtain the candidate direction of Sounnd source direction；

After c, smart machine are calculated by location algorithm according to the candidate direction of Sounnd source direction, voice letter is positioned according to result of calculation Number Sounnd source direction.

The described speech-oriented method based on smart machine, wherein, described a specifically includes:

After a1, smart machine are opened, gather voice letter with the speech transducer of arrangement form at certain intervals by several Number.

The described speech-oriented method based on smart machine, wherein, includes before described a:

The foreground image that s, smart machine obtain in advance without foreground object as background image and stores.

The described speech-oriented method based on smart machine, wherein, described b specifically includes:

B11, when smart machine detects voice signal, smart machine passes through image or video capture device and obtains smart machine Current foreground image；

B12, the background image obtaining current foreground image and prestoring are calculated, and obtain Sounnd source direction according to result of calculation Candidate direction.

B21, when smart machine detects voice signal, smart machine passes through image or video capture device and obtains smart machine Current foreground image；

B22, obtain the position of the foreground object in current foreground image by image recognition algorithm, according to the position of foreground object Obtain the candidate direction of Sounnd source direction.

The described speech-oriented method based on smart machine, wherein, described c specifically includes:

After c1, smart machine obtain the candidate direction of Sounnd source direction, checked by direction of arrival algorithm；

C2, the Sounnd source direction according to checking computation results positioning voice signal.

A kind of speech-oriented system based on smart machine, wherein, system includes:

Voice signal acquisition module, after opening for smart machine, obtains voice signal in real time；

Candidate direction computing module, for when smart machine detects voice signal, obtaining the current foreground picture of smart machine Picture, obtains the candidate direction of Sounnd source direction according to current foreground image；

Locating module, after being calculated by location algorithm according to the candidate direction of Sounnd source direction for smart machine, is tied according to calculating Fruit positions the Sounnd source direction of voice signal.

The described speech-oriented system based on smart machine, wherein,

Described system also includes:

Background image acquisition module, the foreground image obtaining in advance without foreground object for intelligence as background image and stores；

Described candidate direction computing module specifically includes:

First foreground image acquiring unit, for when smart machine detects voice signal, smart machine passes through image or regards Frequency collecting device obtains the current foreground image of smart machine；

First candidate direction computing unit, the background image for obtaining current foreground image He prestore is calculated, root Obtain the candidate direction of Sounnd source direction according to result of calculation.

The described speech-oriented system based on smart machine, wherein,

Described voice signal acquisition module is additionally operable to after smart machine opens, by several at certain intervals and arrangement form Speech transducer collection voice signal；

Described candidate direction computing module specifically includes:

Second foreground image acquiring unit, for when smart machine detects voice signal, smart machine passes through image or regards Frequency collecting device obtains the current foreground image of smart machine；

Second candidate direction computing unit, for obtaining the position of the foreground object in current foreground image by image recognition algorithm Put, the candidate direction of the position acquisition Sounnd source direction according to foreground object.

The described speech-oriented system based on smart machine, wherein, described locating module specifically includes:

Computing unit, is obtained after the candidate direction of Sounnd source direction for smart machine, is checked by direction of arrival algorithm；

Positioning unit, for positioning the Sounnd source direction of voice signal according to checking computation results.

The invention provides a kind of speech-oriented method and system based on smart machine, the present invention is with reference to image acquisition meter Calculate the candidate direction obtaining voice signal in advance, by algorithm, candidate direction is checked, obtain optimal Sounnd source direction, subtract Lack the complexity calculating, improve the location efficiency of voice signal.

Brief description

Fig. 1 is a kind of flow chart of the preferred embodiment of speech-oriented method based on smart machine of the present invention.

Fig. 2 a is a kind of Background of the concrete application embodiment of speech-oriented method based on smart machine of the present invention Piece schematic diagram.

Fig. 2 b be the present invention a kind of concrete application embodiment of speech-oriented method based on smart machine current before Scape picture schematic diagram.

Fig. 2 c is a kind of difference meter of the concrete application embodiment of speech-oriented method based on smart machine of the present invention Picture schematic diagram after calculation.

Fig. 3 is a kind of sound source candidate of the concrete application embodiment of speech-oriented method based on smart machine of the present invention Angular range schematic diagram.

Fig. 4 is a kind of smart machine of the concrete application embodiment of speech-oriented method based on smart machine of the present invention Schematic diagram.

Fig. 5 is a kind of principle of work and power frame of the preferred embodiment of speech-oriented system based on smart machine of the present invention Figure.

Specific embodiment

For making the purpose of the present invention, technical scheme and effect clearer, clear and definite, below to the present invention further specifically Bright.It should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.

The invention provides a kind of flow chart of the preferred embodiment of the speech-oriented method based on smart machine, such as Fig. 1 Shown, wherein, method includes:

After step s100, smart machine are opened, obtain voice signal in real time.

When being embodied as, smart machine includes but is not limited to intelligent television, the intelligent terminal such as Intelligent flat.As shown in figure 4, It is a kind of embodiment of intelligent television for smart machine in the present invention.Smart machine includes sound collection array, images first-class dress Put.Wherein voice collecting array is used for collecting voice, typically has multiple sensors (mike), for gathering multi-path voice.

Further carry out in example, step s100 specifically includes:

After step s101, smart machine are opened, gathered with the speech transducer of arrangement form at certain intervals by several Voice signal.

When being embodied as, the voice collecting array in smart machine collects voice, typically has multiple sensor (Mikes Wind), at certain intervals and spread pattern, can obtain that same source of students sends has different to this multiple sensor Sound, for example, have different and postpone.Voice collecting array as shown in Figure 4 is located at the top of intelligent television, has 8 Mikes Wind, they are in uniform linear arrangement.(in other examples, can have other arrangement modes).

Step s200, when smart machine detects voice signal, obtain smart machine current foreground image, according to work as Front foreground image obtains the candidate direction of Sounnd source direction.

When being embodied as, when smart machine detects voice signal, with the image capture device crossing setting on smart machine Or video capture device gathers current foreground image.And obtain foreground image every a scheduled time.Smart machine institute The scene being located at is usually static, and the renewal speed of image does not need quickly, for example, to only need to 3 frames/second.According to current prospect The result of image determines the candidate direction of Sounnd source direction.Foreground image be smart machine display device towards scene.By pre- First calculate candidate direction, decrease the amount of calculation of substantial amounts of search, especially under the scene needing to calculate Sounnd source direction in real time, The optimal direction of acquisition of more precise and high efficiency.

Image capture device or video capture device collection image information thus calculating background and prospect, as shown in figure 4, Photographic head is located at the top center of intelligent TV set.

In further embodiment, also include before step s100:

The foreground image that step s10, smart machine obtain in advance without foreground object as background image and stores.

When being embodied as, for the user of smart machine, in image change part be typically all " people ", therefore this It is exactly the position of the sound source of possible candidate.Therefore background image is it is considered that be the image background of nobody.With intelligent television it is Example, shoot when not playing program can pictures in television set energising but at set intervals, for example every 10 second beats take the photograph one kind Picture, is averaged according to each time period or illumination condition in one day；Under general household situations, except a small amount of user lives Dynamic, it is essentially all static background, as shown in Figure 2 a.Static scene or referred to as background.

Candidate direction is calculated and can be calculated with the difference of background image or by image recognition algorithm using foreground image.

Further, when candidate direction calculates the difference that can adopt foreground image and background image, step s200 is specifically wrapped Include:

Step s211, when smart machine detects voice signal, smart machine passes through image or video capture device and obtains intelligence The current foreground image of energy equipment；

Step s212, the background image obtaining current foreground image and prestoring are calculated, according to result of calculation acquisition sound The candidate direction in source direction.

When being embodied as, there is user in the middle of scene, as shown in Figure 2 b, detect prospect using existing detection algorithm, It is exactly emerging user, that is, possible sound source direction.Pass through background and current shooting in concrete steps s212 To foreground image carry out Difference Calculation, obtain the difference portion of two width images, according to changing obvious region in error image (for example, difference exceedes certain threshold value and area exceedes the region of certain threshold value), you can obtain prospect as shown in Figure 2 c, front Scape i.e. user, the candidate direction of the Sounnd source direction being also simultaneously.

Further, logical candidate direction calculated image recognition algorithm and was calculated, and step s200 also includes:

Step s221, when smart machine detects voice signal, smart machine passes through image or video capture device and obtains intelligence The current foreground image of energy equipment；

Step s222, obtain the position of the foreground object in current foreground image by image recognition algorithm, according to foreground object Position acquisition Sounnd source direction candidate direction.

In another example, it is possible to use existing face recognition module carries out detecting possible sound source direction.Obtain After obtaining the position of foreground object (namely " people "), this position refers to putting down of the scope of view-finder photographing in photographic head The position (rectangular coordinate system, pixel represents) in face, by candidate direction computing module, is scaled the angular range in Fig. 3.

Simply, can be tried to achieve by tabling look-up, that is, realize calculating the plane of scope of view-finder every picture The angular range of element, is stored in form.The corresponding angular range of pixel further according to the scope of each object adds up.

After step s300, smart machine are calculated by location algorithm according to the candidate direction of Sounnd source direction, tie according to calculating Fruit positions the Sounnd source direction of voice signal.

When being embodied as, calculate the possible source direction of voice, on the basis of existing computational methods, according to candidate side To reducing the scope in possible direction further, and obtain more accurate beam direction.Wherein beam direction is generally voice The Sounnd source direction of signal.Specifically, after being checked using doa, you can obtain optimal Sounnd source direction, complete positioning and appoint Business, related voice signal can export and carry out subsequent treatment, such as speech recognition.

According to Sounnd source direction, voice signal is further amplified and denoising, the voice of beamformer output molding.

In further embodiment, step s300 specifically includes:

After step s301, smart machine obtain the candidate direction of Sounnd source direction, checked by direction of arrival algorithm；

Step s302, the Sounnd source direction according to checking computation results positioning voice signal.

When being embodied as, wherein beam forming concrete grammar is the multi-path voice according to aforesaid plurality of microphone input, meter Optimal synthetic speech signal.For example there is one signal of signal to noise ratio highest.There are much existing ripples at present Beam forming algorithm, the such as adaptive algorithm based on direction estimation, such as linearly constrained minimum variance, maximum-likelihood criterion and Maximum signal noise ratio principle, the method such as Multiple Signal Classification (music), ESPRIT Signal parameter estimation (esprit).These Method generally requires and Sounnd source direction is scanned for, and according to the difference of precision, these methods are required for larger computing and measure testing Calculate the sound source in different possible directions.

Taking by music (multiple signal classification) algorithm, candidate direction is checked as a example It is introduced,

When in prior art, the wave beam of search maximum possible reaches direction (doa direction of arrival), it is right to need All possible deflection scans for.Candidate angle scope 0 as shown in Figure 3, is designated as d0, and typical scope is 160 degree.Search During rope d0, every 2 degree are once calculated, and need 81 calculating.(notice that calculating here refers to, check this angle corresponding In music algorithm, estimation space spectrum parameter, consumes larger computing resource every time).

In the technical program, by precalculated candidate direction module, decrease the amount of calculation of substantial amounts of search, especially It is under the scene needing in real time to calculate Sounnd source direction, the optimal direction of the acquisition of more precise and high efficiency.

It is exactly briefly several directions according to candidate direction it is only necessary to checking candidate angle scope 1 as shown in Figure 3 (being designated as d1) and candidate angle scope 2(are designated as d2).For example, about 5 degree of d1 and d2 scope, need to calculate (altogether for 3 times every direction 6 times) calculate.Can see that comparing rudimentary algorithm needs 81 calculating, decreases a lot of amounts of calculation.

In the exemplary embodiment, device can be by one or more application specific integrated circuits (asic), digital signal Processor (dsp), digital signal processing appts (dspd), PLD (pld), field programmable gate array (fpga), controller, microcontroller, microprocessor or other electronic components are realized, for executing said method.

In the exemplary embodiment, a kind of non-provisional computer-readable recording mediums including instruction are additionally provided, for example Including the memorizer of instruction, above-mentioned instruction can be by the computing device of device to complete said method.For example, described non-transitory Computer-readable recording medium can be rom, random access memory (ram), cd-rom, tape, floppy disk and optical data storage Equipment etc..

Present invention also offers a kind of principle of work and power frame of the preferred embodiment of the speech-oriented system based on smart machine Figure, as shown in figure 5, system includes:

Voice signal acquisition module 100, after opening for smart machine, obtains voice signal in real time；Concrete such as embodiment of the method Shown.

Candidate direction computing module 200, for when smart machine detects voice signal, obtaining the current of smart machine Foreground image, obtains the candidate direction of Sounnd source direction according to current foreground image；Specifically as shown in embodiment of the method.

Locating module 300, after being calculated by location algorithm according to the candidate direction of Sounnd source direction for smart machine, according to Result of calculation positions the Sounnd source direction of voice signal；Specifically as shown in embodiment of the method.

The described speech-oriented system based on smart machine, wherein,

Described system also includes:

Background image acquisition module, the foreground image obtaining in advance without foreground object for intelligence as background image and stores； Specifically as shown in embodiment of the method.

Described candidate direction computing module specifically includes:

First foreground image acquiring unit, for when smart machine detects voice signal, smart machine passes through image or regards Frequency collecting device obtains the current foreground image of smart machine；Specifically as shown in embodiment of the method.

First candidate direction computing unit, carry out by the background image obtaining current foreground image He prestore based on Calculate, obtain the candidate direction of Sounnd source direction according to result of calculation；Specifically as shown in embodiment of the method.

The described speech-oriented system based on smart machine, wherein,

Described voice signal acquisition module is additionally operable to after smart machine opens, by several at certain intervals and arrangement form Speech transducer collection voice signal；Specifically as shown in embodiment of the method.

Described candidate direction computing module specifically includes:

Second foreground image acquiring unit, for when smart machine detects voice signal, smart machine passes through image or regards Frequency collecting device obtains the current foreground image of smart machine；Specifically as shown in embodiment of the method.

Second candidate direction computing unit, for obtaining the foreground object in current foreground image by image recognition algorithm Position, the candidate direction of the position acquisition Sounnd source direction according to foreground object；Specifically as shown in embodiment of the method.

Computing unit, is obtained after the candidate direction of Sounnd source direction for smart machine, is checked by direction of arrival algorithm；Tool Body is as shown in embodiment of the method.

Positioning unit, for positioning the Sounnd source direction of voice signal according to checking computation results；Specifically as shown in embodiment of the method.

In sum, the invention provides a kind of speech-oriented method and system based on smart machine, method includes: intelligence After energy opening of device, obtain voice signal in real time；When smart machine detects voice signal, obtain smart machine current before Scape image, obtains the candidate direction of Sounnd source direction according to current foreground image；Smart machine is according to the candidate direction of Sounnd source direction The Sounnd source direction of voice signal after calculating, is positioned by location algorithm according to result of calculation.The present invention combines image acquisition and calculates Obtain the candidate direction of voice signal in advance, by algorithm, candidate direction is checked, obtain optimal Sounnd source direction, reduce The complexity calculating, improves the location efficiency of voice signal.

It should be appreciated that the application of the present invention is not limited to above-mentioned citing, for those of ordinary skills, can To be improved according to the above description or to convert, all these modifications and variations all should belong to the guarantor of claims of the present invention Shield scope.

Claims

1. a kind of speech-oriented method based on smart machine is it is characterised in that methods described includes:

After a, smart machine are opened, obtain voice signal in real time；

2. the speech-oriented method based on smart machine according to claim 1 is it is characterised in that described a specifically includes:

3. the speech-oriented method based on smart machine according to claim 1 is it is characterised in that include before described a:

4. the speech-oriented method based on smart machine according to claim 3 is it is characterised in that described b specifically includes:

5. the speech-oriented method based on smart machine according to claim 1 is it is characterised in that described b specifically includes:

6. the speech-oriented method based on smart machine according to claim 4 or 5 is it is characterised in that described c specifically wraps Include:

7. a kind of speech-oriented system based on smart machine is it is characterised in that system includes:

8. the speech-oriented system based on smart machine according to claim 7 it is characterised in that

Described system also includes:

Described candidate direction computing module specifically includes:

9. the speech-oriented system based on smart machine according to claim 7 it is characterised in that

Described candidate direction computing module specifically includes:

10. the speech-oriented system based on smart machine according to claim 8 or claim 9 is it is characterised in that described positioning mould Block specifically includes: