CN107437418A

CN107437418A - Vehicle-mounted voice identifies electronic entertainment control system

Info

Publication number: CN107437418A
Application number: CN201710632907.3A
Authority: CN
Inventors: 韦玥
Original assignee: Shenzhen Yixin Intelligent Technology Co Ltd
Current assignee: Shenzhen Yixin Intelligent Technology Co Ltd
Priority date: 2017-07-28
Filing date: 2017-07-28
Publication date: 2017-12-05

Abstract

The invention provides a kind of vehicle-mounted voice to identify electronic entertainment control system, including natural-sounding input module, speech processing module, bluetooth module, media player, air conditioner, utility control program and car body control module, the natural-sounding input module are used for the voice signal of reception staff；The speech processing module is used for the voice signal for receiving the natural-sounding input module, and converts voice signals into executable control command；The bluetooth module is used to receive executable command, and bluetooth equipment is controlled；The media player is used to receive executable control command, controls the media renderer plays media；Air conditioner, the air conditioner are used to receive executable control command, adjust temperature and air quantity, air-flow in-car outer circulation pattern and blowing pattern.The present invention provides the speech recognition schemes of the natural language of vehicle electronics entertainment systems, facilitates user to be interacted with vehicle electronic device.

Description

Vehicle-mounted voice identifies electronic entertainment control system

Technical field

The present invention relates to vehicle intellectualized control field, particularly vehicle-mounted voice identification electronic entertainment system.

Background technology

In recent years, China Automobile Industry makes rapid progress, and many control buttons has been concentrated on automobile multifunctional steering wheel, more Convenient while numerous key distribution is scattered, and troublesome in poeration, function is also limited, does not make big breakthrough but in terms of voice system, Traditional voice system mainly concentrates voice message, Voice Navigation etc., and the functional module for completing above-mentioned function is phase Mutually independent, the corresponding independent speech chip of each functional module, this largely causes the wave of speech chip Take, lack effective allotment of the control module to each functional module.

Meanwhile in the prior art, vehicle electronics entertainment control system does not largely support speech recognition, even if there is support , also it is only capable of being identified by authoritative voice mode.For example " turn on radio " such a order is solidificated in control In system processed, user have to it is of verbatim account say this order and could activate this behavior, greatly reduce Consumer's Experience Property.

The content of the invention

In view of the above-mentioned problems, the present invention is intended to provide a kind of vehicle-mounted voice identifies electronic entertainment system.

The purpose of the present invention is realized using following technical scheme：

A kind of vehicle-mounted voice identifies electronic entertainment control system, including natural-sounding input module, speech processing module, indigo plant Tooth module, media player, air conditioner, utility control program and car body control module, the natural-sounding input module are used for The voice signal of reception staff；The speech processing module is used for the voice signal for receiving the natural-sounding input module, and Convert voice signals into executable control command；The bluetooth module is used to receive executable command, and bluetooth equipment is entered Row control；The media player is used to receive executable control command, controls the media renderer plays media；Air-conditioning Device, the air conditioner are used to receive executable control command, adjust temperature and air quantity, air-flow in-car outer circulation pattern and blowing Pattern.

The vehicle-mounted voice identification electronic entertainment control system also includes navigator, and the navigator is executable for receiving Control command, setting destination, planning guidance path, selection path and change destination.

The utility control program is used to receive executable control command, runs corresponding application program.

The car body control module is used to receive executable control command, controls the facility of in-car.

Beneficial effects of the present invention are：Vehicle-mounted voice identification electronic entertainment control system of the present invention makes up in-car electronic equipment The unfriendly property of injunctive speech recognition, ease for use, does not know so as to provide the voice of the natural language of vehicle electronics entertainment systems Other scheme, facilitates user to be interacted with vehicle electronic device, is exchanged with usual tongue.Overcome its existing other party simultaneously The defects of case has to connect internet, there is provided could be used that natural language speech identifies in the case of no networking.

Brief description of the drawings

Using accompanying drawing, the invention will be further described, but the embodiment in accompanying drawing does not form any limit to the present invention System, for one of ordinary skill in the art, on the premise of not paying creative work, can also be obtained according to the following drawings Other accompanying drawings.

The frame construction drawing of Fig. 1 vehicle-mounted voice identification electronic entertainment systems of the present invention；

Fig. 2 is the frame construction drawing of speech processing module of the present invention.

Reference：

Natural-sounding input module 1, speech processing module 2, bluetooth module 3, media player 4, air conditioner 5, using control Processing procedure sequence 6, car body control module 7, navigator 8, natural-sounding detection unit 20, natural-sounding enhancement unit 21, feature extraction Unit 22 and natural-sounding recognition unit 23

Embodiment

With reference to following application scenarios, the invention will be further described.

Referring to Fig. 1, a kind of vehicle-mounted voice identifies electronic entertainment control system, including natural-sounding input module 1, at voice Manage module 2, bluetooth module 3, media player 4, air conditioner 5, utility control program 6 and car body control module 7, the natural language Sound input module is used for the voice signal of reception staff；The speech processing module is used to receive the natural-sounding input module Voice signal, and convert voice signals into executable control command；The bluetooth module is used to receive executable command, Bluetooth equipment is controlled；The media player is used to receive executable control command, controls the media player Play media；Air conditioner, the air conditioner are used to receive executable control command, adjust temperature and air quantity, air-flow are in-car outer Circulation pattern and blowing pattern.

Further, the vehicle-mounted voice identification electronic entertainment control system also includes navigator 8, and the navigator is used for Receive executable control command, setting destination, planning guidance path, selection path and change destination.

Further, the utility control program is used to receive executable control command, runs corresponding application program.

Further, the car body control module is used to receive executable control command, controls the facility of in-car.

Preferably, the car also includes topic pattern block in speech recognition electronic entertainment control system, and the prompting module is used In receiving executable control command, the prompting of voice is sent to the personnel of in-car.

Preferably, referring to Fig. 2, the speech processing module includes natural-sounding detection unit 20, and natural-sounding enhancing is single Member 21, feature extraction unit 22 and natural-sounding recognition unit 23, the natural-sounding detection unit are used to detect and extract to connect Effective natural-sounding message part in the voice signal of receipts；The natural-sounding enhancement unit is used for natural-sounding information portion Divide and carry out enhancing processing, obtain natural-sounding message part to be identified；The feature extraction unit is used for natural language to be identified Sound message part carries out the extraction of instruction features parameter；The sound instruction recognition unit is used for according to the instruction features parameter pair It is identified, obtains corresponding control command.

The above embodiment of the present invention, there is provided the speech recognition schemes of the natural language of vehicle electronics entertainment systems, make up car The unfriendly property of the injunctive speech recognition of inner electronic equipment, ease for use, does not facilitate user to be interacted with vehicle electronic device.

Preferably, it is effective natural in voice signal of the natural-sounding detection unit 20 for detecting and extracting reception Voice messaging, including：

(1) the overlapping carry out sub-frame processing of interframe 50% is pressed to the voice signal of reception, and adds Hamming window, obtain each frame Voice signal；

Preferably, frame length U=30ms is selected during framing in this unit；

(2) obtain the logarithmic energy feature of each frame voice signal, the function used for：

In formula, D (m) represents the logarithmic energy feature of the m frames of voice signal,Represent voice signal the The short-time energy of m frames, | r_m(n)|²Represent that the m frames of voice signal represent the Hamming window in energy value at different moments, U Length, c represent the logarithmic energy factor of setting；

Preferably, c=10⁵；

(3) Short Time Fourier Transform is carried out to each frame voice signal, obtains the general K (f of energy_n), wherein f_nRepresent frequency point Amount；

(4) obtain the spectrum entropy feature of each frame voice signal, the function used for：

Wherein,

In formula, T (m) represents the spectrum entropy feature of voice signal m frames, p_g(n, m) represents voice signal m frame rate components For f_nProbability density, K_m(f_n) represent m frame voice signals the general frequency components of energy be f_nEnergy intensity, N represent it is short When Fourier transformation window length, with Hamming window equal length, i.e. N=U；

(5) obtain the behavioral characteristics of each frame voice signal, the SQL used for：

In formula, DT (m) represents the behavioral characteristics of voice signal m frames, and D (m) represents the logarithm energy of the m frames of voice signal Measure feature, T (m) represent the spectrum entropy feature of voice signal m frames, Λ_DAnd Λ_TThe logarithm energy of 10 frame voice signals before representing respectively The average value of amount and spectrum entropy feature, ω represent the proposed factors of setting, ω ∈ [1,2]；

(6) according to the behavioral characteristics of voice signal, the threshold value of each frame voice signal behavioral characteristics and setting is compared Compared with, reservation behavioral characteristics, which are more than the corresponding speech signal frame of threshold value and are designated as natural-sounding message part, to be for further processing, Remainder is designated as unvoiced section.

This preferred embodiment, in vehicle-mounted voice identifies electronic entertainment control system using the above method to receive from Right voice signal carries out speech detection, and natural-sounding signal is described with reference to logarithmic energy feature and spectrum entropy feature, can Natural-sounding message part and unvoiced section are more accurately distinguished, especially there is good effect in the case where road is noisy Fruit, accurately identify control command for vehicle electronics entertainment control system and provide guarantee.

Preferably, the natural-sounding enhancement unit 21 is used to carry out natural-sounding message part enhancing processing, obtains Natural-sounding message part to be identified, including：

(1) Fast Fourier Transform (FFT) is carried out to natural-sounding message part, obtains the amplitude spectrum C of nature voice messaging part (f)；

(2) to natural-sounding message part carry out speech enhan-cement processing, the SQL used for：

Wherein,

In formula, C ' (f) represents the amplitude spectrum of natural-sounding message part after speech enhan-cement processing, and C (f) represents natural-sounding The amplitude spectrum of message part, | C (f) |²The power spectrum of natural-sounding message part is represented, δ and μ represent adjustable gain effect Dynamic gene, whereinThe estimation to present frame noise power spectrum is represented, divides it by obtaining the natural-sounding information portion The noise power spectrum of the preceding unvoiced section obtains, and A ' (f) represents the estimation of the noise power spectrum of previous frame, and A (f) represents to work as The noise power spectrum that previous frame obtains, ω_pRepresent the weight of present frame noise power spectrum, it should be noted that noise power spectrum is only in institute State unvoiced section to be updated, in the natural-sounding message part without renewal；

(3) inverse fast fourier transform is carried out to the result of self-defined wave filter, obtains natural-sounding information portion to be identified Point.

This preferred embodiment, in vehicle-mounted voice identifies electronic entertainment control system, adopt with the aforedescribed process according to being obtained The unvoiced section of the natural-sounding signal taken in itself obtains required noise power Power estimation, then to natural-sounding information Part is strengthened, and is improved the adaptability of speech enhan-cement, can be effectively increased the signal to noise ratio of natural-sounding message part, Provided the foundation for the identification control command after electronic entertainment control system.

Preferably, the feature extraction unit 22 is used to carry out instruction features parameter to natural-sounding message part to be identified Extraction, including：

(1) framing plus Hamming window processing are carried out to natural-sounding message part to be identified；

Preferably, frame length N=30ms is selected during framing in this unit, overlapping interframe is 10ms；

(2) frame chosen successively in nature voice messaging part carries out Fast Fourier Transform (FFT), obtains frequency spectrum R (f)；

(3) frequency spectrum R (f) is converted into mel-frequency R (f '), and natural-sounding is obtained using following self-defined wave filter group The characteristic energy spectrum E of signal_b(x), it is specially：

Wherein,

In formula, Eb (x) represents characteristic energy spectrum Eb (x), x=1 corresponding to x-th of wave filter output in wave filter group, 2 ..., X, X represent the number of wave filter group median filter, and R (f ') is represented to be transformed into the frequency spectrum obtained after mel-frequency, and f ' is represented Mel-frequency,Represent the barycenter parameter of x-th of wave filter in wave filter group, V_x(f) wave filter group is represented In x-th of wave filter, jx, h_x、k_xThe upper limit of x-th of wave filter, center, wherein lower limit, h in wave filter group are represented respectively_x= j_x-1=k_x+1,

Preferably, it is X=13 to take wave filter group median filter quantity；

Wherein, the mel-frequency be it is a kind of the sense organ of equidistant change in pitch is judged based on human ear depending on it is non-linear The relation of frequency scale, mel-frequency f ' and frequency f hertz is：

(4) the characteristic energy spectrum E (x) of acquisition is taken the logarithm, then carries out discrete cosine transform, obtain discrete cosine transform Preceding X coefficient afterwards ties up speech characteristic parameter as the X of this frame natural-sounding message part；

(5) characteristic parameter of the repeat step (2) to (4) until obtaining each frame of natural-sounding message part to be identified.

This preferred embodiment, the feature extraction unit 22 is adopted carries out speech feature extraction with the aforedescribed process, in feature The barycenter parameter of corresponding different frequency wave filter is introduced in parameter extraction function, can be according to natural-sounding message part itself Frequency characteristic, accurately reflect its characteristic parameter, improve the robustness of characteristic parameter extraction, at the same improve the present invention Vehicle-mounted voice identification electronic entertainment control system is particularly the stability in the noisy environment of road.

Finally it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than the present invention is protected The limitation of scope is protected, although being explained with reference to preferred embodiment to the present invention, one of ordinary skill in the art should Work as understanding, technical scheme can be modified or equivalent substitution, without departing from the reality of technical solution of the present invention Matter and scope.

Claims

1. a kind of vehicle-mounted voice identifies electronic entertainment control system, it is characterised in that including natural-sounding input module, at voice Manage module, bluetooth module, media player, air conditioner, utility control program and car body control module, the natural-sounding input Module is used for the voice signal of reception staff；The speech processing module is used for the voice for receiving the natural-sounding input module Signal, and convert voice signals into executable control command；The bluetooth module is used to receive executable command, to bluetooth Equipment is controlled；The media player is used to receive executable control command, controls the media renderer plays matchmaker Body；Air conditioner, the air conditioner are used to receive executable control command, adjust temperature and air quantity, air-flow in-car outer circulation mould Formula and blowing pattern.

2. a kind of vehicle-mounted voice identification electronic entertainment control system according to claim 1, it is characterised in that described vehicle-mounted Speech recognition electronic entertainment control system also includes navigator, and the navigator is used to receive executable control command, set Destination, planning guidance path, selection path and change destination.

A kind of 3. vehicle-mounted voice identification electronic entertainment control system according to claim 1, it is characterised in that the application Control program is used to receive executable control command, runs corresponding application program.

A kind of 4. vehicle-mounted voice identification electronic entertainment control system according to claim 1, it is characterised in that the vehicle body Control module is used to receive executable control command, controls the facility of in-car.

5. a kind of vehicle-mounted voice identification electronic entertainment control system according to claim 1, it is characterised in that the car exists Speech recognition electronic entertainment control system also includes topic pattern block, and the prompting module is used to receive executable control command, The prompting of voice is sent to the personnel of in-car.

A kind of 6. vehicle-mounted voice identification electronic entertainment control system according to claim 1, it is characterised in that the voice Processing module includes natural-sounding detection unit, natural-sounding enhancement unit, feature extraction unit and natural-sounding recognition unit, Effective natural-sounding message part in voice signal of the natural-sounding detection unit for detecting and extracting reception；It is described Natural-sounding enhancement unit is used to carry out enhancing processing to natural-sounding message part, obtains natural-sounding information portion to be identified Point；The feature extraction unit is used for the extraction that instruction features parameter is carried out to natural-sounding message part to be identified；The sound Recognition unit is instructed to be used to obtain corresponding control command to being identified according to the instruction features parameter.

A kind of 7. vehicle-mounted voice identification electronic entertainment control system according to claim 6, it is characterised in that the nature Effective natural-sounding information in voice signal of the speech detection unit for detecting and extracting reception, including：

<mrow> <mi>D</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>log</mi> <mn>10</mn> </msub> <mrow> <mo>(</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>U</mi> </munderover> <msup> <mrow> <mo>|</mo> <mrow> <msub> <mi>r</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <mi>c</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>log</mi> <mn>10</mn> </msub> <mi>c</mi> </mrow>

In formula, D (m) represents the logarithmic energy feature of the m frames of voice signal,Represent voice signal m frames Short-time energy, | r_m(n)|²Represent that the m frames of voice signal represent the length of the Hamming window, c in energy value at different moments, U Represent the logarithmic energy factor of setting；

(3) Short Time Fourier Transform is carried out to each frame voice signal, obtains the general K (f of energy_n), wherein f_nRepresent frequency component；

<mrow> <mi>T</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>p</mi> <mi>g</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>,</mo> <mi>m</mi> <mo>)</mo> </mrow> <msub> <mi>logp</mi> <mi>g</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>,</mo> <mi>m</mi> <mo>)</mo> </mrow> </mrow> 1

Wherein,

<mrow> <msub> <mi>p</mi> <mi>g</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>,</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>K</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>n</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>&gamma;</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </msubsup> <msub> <mi>K</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>f</mi> <mi>&gamma;</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> <mi>n</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mi>N</mi> </mrow>

In formula, T (m) represents the spectrum entropy feature of voice signal m frames, p_g(n, m) represents that voice signal m frame rates component is f_n Probability density, K_m(f_n) represent m frame voice signals the general frequency components of energy be f_nEnergy intensity, N represents Fu in short-term In leaf transformation window length, with Hamming window equal length, i.e. N=U；

<mrow> <mi>D</mi> <mi>T</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mrow> <mo>(</mo> <mfrac> <mn>1</mn> <mrow> <mo>&lsqb;</mo> <mrow> <mo>(</mo> <mi>D</mi> <mo>(</mo> <mi>m</mi> <mo>)</mo> <mo>-</mo> <msub> <mi>&Lambda;</mi> <mi>D</mi> </msub> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mrow> <mo>(</mo> <mi>T</mi> <mo>(</mo> <mi>m</mi> <mo>)</mo> <mo>-</mo> <msub> <mi>&Lambda;</mi> <mi>T</mi> </msub> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> </mfrac> <mo>)</mo> </mrow> <mi>&omega;</mi> </msup> </mrow>

In formula, DT (m) represents the behavioral characteristics of voice signal m frames, and D (m) represents that the logarithmic energy of the m frames of voice signal is special Sign, T (m) represent the spectrum entropy feature of voice signal m frames, Λ_DAnd Λ_TRespectively represent before 10 frame voice signals logarithmic energy and The average value of entropy feature is composed, β represents the proposed factors of setting, ω ∈ [1,2]；

(6) according to the behavioral characteristics of voice signal, the threshold value of each frame voice signal behavioral characteristics and setting is compared, protected Stay behavioral characteristics to be more than the corresponding speech signal frame of threshold value and be designated as natural-sounding message part to be for further processing, its remaining part Minute mark is unvoiced section.

8. identify electronic entertainment control system according to a kind of vehicle-mounted voice described in claim 6, it is characterised in that the feature carries Unit is taken to be used to carry out instruction features ginseng to the natural-sounding message part to be identified obtained by the natural-sounding enhancement unit Several extractions, including：

(3) frequency spectrum R (f) is converted into mel-frequency R (f '), and nature voice signal is obtained using following self-defined wave filter group Characteristic energy spectrum E_b(x), it is specially：

Wherein,

In formula, E_b(x) characteristic energy spectrum E corresponding to x-th of wave filter output in wave filter group is represented_b(x), x=1,2 ..., X, X The number of wave filter group median filter is represented, R (f ') represents to be transformed into the frequency spectrum obtained after mel-frequency, and f ' represents Mel frequency Rate,Represent the barycenter parameter of x-th of wave filter in wave filter group, V_x(f) represent in wave filter group x-th Wave filter, j_x、h_x、k_xThe upper limit of x-th of wave filter, center, wherein lower limit, h in wave filter group are represented respectively_x=j_x-1=k_x+1,

(4) the characteristic energy spectrum E (x) of acquisition is taken the logarithm, then carries out discrete cosine transform, after obtaining discrete cosine transform Preceding X coefficient ties up speech characteristic parameter as the X of this frame natural-sounding message part；