CN114792517A

CN114792517A - Voice recognition method and device for intelligent water cup

Info

Publication number: CN114792517A
Application number: CN202210322946.4A
Authority: CN
Inventors: 蒋华强; 游青山; 毕玉; 熊于菽; 曾卓; 徐梅洪
Original assignee: Chongqing Vocational Institute of Engineering
Current assignee: Chongqing Vocational Institute of Engineering
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-07-26

Abstract

The invention discloses a voice recognition method for an intelligent water cup, and particularly relates to the technical field of voice recognition. The method comprises the following steps: firstly, reading voice information; processing the voice information by a self-adaptive high-pass filtering method; frame division: dividing the voice signal into N sections of small voice signals, wherein frames are overlapped with each other; fourthly, adding a window: windowing the data by using a Hamming window; fourier transform: carrying out Fourier transform on the data, performing modulus taking and then square taking on the matrix to obtain energy spectrum density, and adding the energy spectrum density of each frame to obtain an energy sum matrix of each frame; sixthly, triangular band-pass filtering; and seventhly, discrete cosine transform: substituting the logarithmic energy into discrete cosine transform to obtain an L-order Mel-scale Cepstrum parameter; and training the voice feature vectors through an SVM algorithm, and then recognizing. By adopting the technical scheme of the invention, the problem of low recognition accuracy rate of the existing intelligent water cup during voice interaction is solved, and the intelligent water cup can be used for voice signal recognition of intelligent voice equipment.

Description

Voice recognition method and device for intelligent water cup

Technical Field

The invention discloses a voice recognition method and device for an intelligent water cup, and particularly relates to the technical field of voice recognition.

Background

Water is the substance with the largest content in human body, and accounts for about 60% -70% of the weight of adults, most substances in blood are water, and organs such as muscles, livers, lungs, brains and the like also contain a large amount of water. Water is not only an important nutrient substance for maintaining human health, but also participates in chemical reactions, substance conversion and energy exchange of various substances in the body. The absence of water will make the nutrients unabsorbed, oxygen not transported, waste not discharged, and metabolism impossible. Supplement water in time, and is very important for human bodies.

Various cups exist in the market at present, and not only are traditional drinking cups, but also a plurality of intelligent cups which are integrated with scientific and technological elements. For example, Chinese patent (patent publication No. CN110742469A) discloses an intelligent voice water cup, and provides an intelligent voice interaction water cup which can monitor the liquid storage, the liquid temperature, the position of the water cup and the like in the water cup in real time and remind a user of drinking water in time. However, the common accuracy of the existing voice interaction is not high enough, and false triggering can be caused, so that the water cup is empty, and certain potential safety hazards exist.

The existing intelligent water cup has the defects of insufficient intelligent degree and insufficient convenience in use. The voice interaction control method is generally not high enough in recognition accuracy, and in an intelligent water cup, once a recognition error occurs, empty burning possibly exists, so that the safety problem is caused.

Disclosure of Invention

The invention aims to provide a voice recognition method for an intelligent water cup, and solves the problem that the existing intelligent water cup has low recognition accuracy during voice interaction.

In order to achieve the above purpose, one technical solution of the present invention is as follows: a voice recognition method for an intelligent water cup comprises the following steps:

s1, calculating MFCC feature vectors of a section of voice information, wherein the calculation method of the MFCC feature vectors comprises the following steps:

reading voice information: inputting voice information in wav format;

preprocessing data: an adaptive high-pass filtering method is provided for processing voice information:

y(n)＝x(n-1)+10·log ₁₀ ·(x(n)-x(n-1)+1)

wherein x (n) is an input signal, y (n) is an output signal, n is time, x (n) is the amplitude of a sound waveform at the current time, and x (n-1) is the amplitude of the waveform at the last time;

thirdly, framing: dividing a voice signal into N small voice signals, and enabling frames to have a part of mutual overlap when segmentation is carried out;

fourthly, adding a window: windowing the framed data using a Hamming window, the Hamming window formula being:

W(n)＝(1-a)-a·cos(2·π·n/N)

wherein, the value of a is 0.46, and n is a small section of voice data;

fourier transform: performing fast Fourier transform on the windowed data, performing modulus taking and then square taking on the obtained matrix to obtain energy spectrum density, and adding the energy spectrum density of each frame to obtain an energy sum matrix of each frame;

sixthly, triangular band-pass filtering: using a Mel-frequency filter bank, the Mel frequency is expressed as:

f _mel ＝2595·log ₁₀ (1+f _Hz /700)

wherein f is _Hz F (m) is the frequency value corresponding to the center of the triangle in a triangle band-pass filter, f _mel Is the cepstral frequency;

the frequency response function of the triangular band-pass filter is:

wherein f (m) is a center frequency,

seventhly, discrete cosine transform: substituting the logarithmic energy into discrete cosine transform, solving the parameter of the Mel-scale Cepstrum of the L order, and finally generating a voice feature vector of the voice information;

and S2, training the voice feature vectors through an SVM algorithm, and then recognizing.

Further, the training process of step S2 includes the following steps:

s2.1, preparing two sections of different voice information, wherein the two sections of voice information comprise instruction information and interference information, respectively obtaining an instruction information characteristic vector and an interference information characteristic vector by calculating the instruction information and the interference information, and sending the instruction information characteristic vector and the interference information characteristic vector into an SVM classifier for training;

s2.2, training meets the following classification function:

wherein, y _i (w·x _i +b)≥1-ζ _i ，ζ _i More than or equal to 0, i-1, 2. w is the hyperplane parameter, b is the hyperplane bias, ζ is the relaxation variable, and C is the penalty coefficient.

According to another technical scheme, the intelligent water cup voice recognition method is applied to an intelligent water cup.

Compared with the prior art, the beneficial effect of this scheme:

1. the scheme adopts a self-adaptive data preprocessing method to accurately extract the voice high-frequency information. Generally, the speech is different according to the speaking habits of speakers, and the definition of words is often not very same, so that the high-frequency signal in the speech signal is accurately extracted, and a very key two-point effect is played for the correct recognition of the subsequent speech. The scheme provides a self-adaptive high-frequency filtering method, which can accurately extract high-frequency information segments in the voice and is convenient for subsequent voice recognition;

2. the scheme is more accurate in detection, and can improve the identification accuracy of a specific instruction for a specific scene, for example: the heating command can be accurately recognized, the voice interaction content of the intelligent water cup is not much, and too many words do not need to be recognized generally, so that aiming at the characteristic, a voice recognition method is designed, the heating command is accurately recognized, other voice signals cannot be recognized and judged, the idea greatly simplifies the recognition difficulty, and the recognition accuracy is improved;

3. according to the scheme, the SVM is used for classifying the voice characteristics, and the calculation is more accurate. The SVM is originally a two-classification method, the classification of feature vectors is realized through a hyperplane, and the scene is a two-classification scene;

4. this scheme user convenient to use provides the route of this intelligence pronunciation drinking cup of extra convenient use for the user.

Drawings

FIG. 1 is a schematic view showing the structure of a smart cup according to embodiment 2;

fig. 2 is a flowchart of the speech recognition method for the intelligent cup in embodiment 2.

Detailed Description

The present invention will be described in further detail below by way of specific embodiments:

reference numerals in the drawings of the specification include: cup body 1, control chip 2, lithium cell 3, wireless charging coil 4.

Example 1

A voice recognition method for an intelligent water cup comprises the following steps:

reading voice information: inputting the voice information in the wav format, and converting the voice information into the wav format if the voice information is not in the wav format.

Preprocessing data: in order to highlight the high frequency part of the data and weaken the low frequency part of the data, an adaptive high-pass filtering method is provided for processing the voice information:

y(n)＝x(n-1)+10·log ₁₀ ·(x(n)-x(n-1)+1)

where x (n) is an input signal, y (n) is an output signal, n is time, x (n) is the amplitude of the sound waveform at the current time, and x (n-1) is the amplitude of the waveform at the previous time.

Frame division: for an original speech signal, if fourier transform is performed directly, frequency information of the whole speech signal will be obtained, and time domain information is lost. In order to avoid the phenomenon, a frame division method is adopted to divide a section of voice signals into N sections of small voice signals, and meanwhile, the continuity between frames is considered, and the frames are overlapped partially when the segmentation is carried out; the frame length is typically set to 25ms and the frame shift to 10 ms.

Fourthly, windowing: windowing is required after the framing of the signal so that continuity between frames increases. Windowing the framed data by using a Hamming window formula:

W(n)＝(1-a)-a·cos(2·π·n/N)

where a is 0.46 and n is a small segment of speech data.

Fourier transform: and performing fast Fourier transform on the windowed data, performing modulus and square on the obtained matrix to obtain energy spectrum density, and adding the energy spectrum densities of the frames to obtain the energy sum matrix of each frame.

Sixthly, triangular band-pass filtering: the human ear can still distinguish speech normally in a noisy and chaotic environment, in the process, the cochlea plays a very important role. The cochlea filters the received voice signals on a logarithmic frequency scale, the linear scale is generally below 1000Hz, and both scales above 1000Hz are logarithmic scales, so that the human ear has higher sensitivity to low-frequency signals. Based on this, a Mel frequency filter bank is employed, which has human cochlear perception characteristics. The Mel frequency is expressed as:

f _mel ＝2595·log ₁₀ (1+f _Hz /700)

wherein f is _Hz F (m) is the frequency value corresponding to the center of the triangle in a triangular band-pass filter, f _mel Is the cepstral frequency;

the frequency response function of the triangular band-pass filter is:

wherein f (m) is a center frequency,

seventhly, discrete cosine transform: the logarithmic energy is brought into discrete cosine transform to obtain the parameter of Mel-scale Cepstrum of L order,

through the steps, the voice feature vector of the voice information is finally generated.

S2, training and optimizing the voice feature vectors through an SVM algorithm, wherein the training process comprises the following steps:

s2.2, training meets the following classification function:

wherein, y _i (w·x _i +b)≥1-ζ _i ，ζ _i More than or equal to 0, i-1, 2. w is hyperplane parameter, b is hyperplaneSurface bias, ζ is the relaxation variable, and C is the penalty factor.

Example 2

As shown in fig. 1 and 2, the voice recognition method of embodiment 1 is applied to an intelligent water cup, and the intelligent water cup of this embodiment includes a wireless charging coil 4, a lithium battery 3, a control chip 2, and a water cup body 1. Wireless charging coil 4 is connected with lithium cell 3 through control chip 2, controls charging and the discharge heating of lithium cell 3, and the last miniature microphone that has integrateed of control chip 2 for gather the speech control instruction, integrated PTC heating plate in the drinking cup body of cup 1, be used for heating the drinking cup inner wall, and then heating drinking water.

The scene applied to the intelligent water cup mainly judges whether the voice information contains heating or not, and is a typical two-classification scene. The SVM algorithm adopted by the scheme is a classification algorithm which is very widely applied, particularly the SVM algorithm is very widely applied to two classifications, and the SVM algorithm is very suitable for two classification judgment of voice signals under the scene. The SVM training process is to find an optimal hyperplane for distinguishing two different voice data. During SVM training, the instruction information is "heating", and the interference information is any other content, such as "warming".

The working process of the embodiment:

1. saying "heat" against the cup;

2. at the moment, the miniature microphone on the intelligent water cup acquires voice information;

3. extracting MFCC characteristic vectors from the collected voice information by using the method of the embodiment;

4. classifying the MFCC feature vectors by using a pre-trained SVM model, and outputting classification information;

5. and when the heating voice information is detected, the PTC heating sheet of the water cup is controlled to be electrified to heat.

The heating is performed using a PTC heating plate instead of a material such as nichrome wire. Because the PTC heating sheet can not exceed the highest heating temperature when heated, the PTC heating sheet has good temperature protection capability, and reduces the potential safety hazard while meeting the heating condition.

The above are merely examples of the present invention, and common general knowledge of known specific structures and/or characteristics in the schemes has not been described herein. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, many variations and modifications can be made, which should also be considered as the scope of protection of the present invention, and these will not affect the effect and the exclusive utility of the implementation of the present invention. The scope of the claims of the present application shall be defined by the content of the claims, and the description of the embodiments and the like in the specification shall be used to explain the content of the claims.

Claims

1. A voice recognition method for an intelligent water cup is characterized by comprising the following steps: the method comprises the following steps:

reading voice information: inputting voice information in wav format;

y(n)＝x(n-1)+10.log ₁₀ ·(x(n)-x(n-1)+1)

fourthly, windowing: windowing the framed data by using a Hamming window formula:

W(n)＝(1-a)-a·cos(2·π·n/N)

wherein, the value of a is 0.46, and n is a segment of small voice data;

f _mel ＝2595.log ₁₀ (1+f _Hz /700)

the frequency response function of the triangular band-pass filter is:

wherein f (m) is a center frequency,

seventhly, discrete cosine transform: introducing logarithmic energy into discrete cosine transform, solving an L-order Mel-scale Cepstrum parameter, and finally generating a voice feature vector of voice information;

2. The intelligent water cup voice recognition method according to claim 1, wherein: the training process of step S2 includes the following steps:

s2.2, training to meet the following classification function:

wherein, the first and the second end of the pipe are connected with each other,y _i (w·x _i +b)≥1-ζ _i ，ζ _i more than or equal to 0, i-1, 2. w is the hyperplane parameter, b is the hyperplane bias, ζ is the relaxation variable, and C is the penalty coefficient.

3. The intelligent water cup voice recognition method of claim 1 or 2 is applied to an intelligent water cup.