CN114792517A - Voice recognition method and device for intelligent water cup - Google Patents

Voice recognition method and device for intelligent water cup Download PDF

Info

Publication number
CN114792517A
CN114792517A CN202210322946.4A CN202210322946A CN114792517A CN 114792517 A CN114792517 A CN 114792517A CN 202210322946 A CN202210322946 A CN 202210322946A CN 114792517 A CN114792517 A CN 114792517A
Authority
CN
China
Prior art keywords
voice
water cup
information
intelligent water
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210322946.4A
Other languages
Chinese (zh)
Inventor
蒋华强
游青山
毕玉
熊于菽
曾卓
徐梅洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Vocational Institute of Engineering
Original Assignee
Chongqing Vocational Institute of Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Vocational Institute of Engineering filed Critical Chongqing Vocational Institute of Engineering
Priority to CN202210322946.4A priority Critical patent/CN114792517A/en
Publication of CN114792517A publication Critical patent/CN114792517A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a voice recognition method for an intelligent water cup, and particularly relates to the technical field of voice recognition. The method comprises the following steps: firstly, reading voice information; processing the voice information by a self-adaptive high-pass filtering method; frame division: dividing the voice signal into N sections of small voice signals, wherein frames are overlapped with each other; fourthly, adding a window: windowing the data by using a Hamming window; fourier transform: carrying out Fourier transform on the data, performing modulus taking and then square taking on the matrix to obtain energy spectrum density, and adding the energy spectrum density of each frame to obtain an energy sum matrix of each frame; sixthly, triangular band-pass filtering; and seventhly, discrete cosine transform: substituting the logarithmic energy into discrete cosine transform to obtain an L-order Mel-scale Cepstrum parameter; and training the voice feature vectors through an SVM algorithm, and then recognizing. By adopting the technical scheme of the invention, the problem of low recognition accuracy rate of the existing intelligent water cup during voice interaction is solved, and the intelligent water cup can be used for voice signal recognition of intelligent voice equipment.

Description

Voice recognition method and device for intelligent water cup
Technical Field
The invention discloses a voice recognition method and device for an intelligent water cup, and particularly relates to the technical field of voice recognition.
Background
Water is the substance with the largest content in human body, and accounts for about 60% -70% of the weight of adults, most substances in blood are water, and organs such as muscles, livers, lungs, brains and the like also contain a large amount of water. Water is not only an important nutrient substance for maintaining human health, but also participates in chemical reactions, substance conversion and energy exchange of various substances in the body. The absence of water will make the nutrients unabsorbed, oxygen not transported, waste not discharged, and metabolism impossible. Supplement water in time, and is very important for human bodies.
Various cups exist in the market at present, and not only are traditional drinking cups, but also a plurality of intelligent cups which are integrated with scientific and technological elements. For example, Chinese patent (patent publication No. CN110742469A) discloses an intelligent voice water cup, and provides an intelligent voice interaction water cup which can monitor the liquid storage, the liquid temperature, the position of the water cup and the like in the water cup in real time and remind a user of drinking water in time. However, the common accuracy of the existing voice interaction is not high enough, and false triggering can be caused, so that the water cup is empty, and certain potential safety hazards exist.
The existing intelligent water cup has the defects of insufficient intelligent degree and insufficient convenience in use. The voice interaction control method is generally not high enough in recognition accuracy, and in an intelligent water cup, once a recognition error occurs, empty burning possibly exists, so that the safety problem is caused.
Disclosure of Invention
The invention aims to provide a voice recognition method for an intelligent water cup, and solves the problem that the existing intelligent water cup has low recognition accuracy during voice interaction.
In order to achieve the above purpose, one technical solution of the present invention is as follows: a voice recognition method for an intelligent water cup comprises the following steps:
s1, calculating MFCC feature vectors of a section of voice information, wherein the calculation method of the MFCC feature vectors comprises the following steps:
reading voice information: inputting voice information in wav format;
preprocessing data: an adaptive high-pass filtering method is provided for processing voice information:
y(n)=x(n-1)+10·log 10 ·(x(n)-x(n-1)+1)
wherein x (n) is an input signal, y (n) is an output signal, n is time, x (n) is the amplitude of a sound waveform at the current time, and x (n-1) is the amplitude of the waveform at the last time;
thirdly, framing: dividing a voice signal into N small voice signals, and enabling frames to have a part of mutual overlap when segmentation is carried out;
fourthly, adding a window: windowing the framed data using a Hamming window, the Hamming window formula being:
W(n)=(1-a)-a·cos(2·π·n/N)
wherein, the value of a is 0.46, and n is a small section of voice data;
fourier transform: performing fast Fourier transform on the windowed data, performing modulus taking and then square taking on the obtained matrix to obtain energy spectrum density, and adding the energy spectrum density of each frame to obtain an energy sum matrix of each frame;
sixthly, triangular band-pass filtering: using a Mel-frequency filter bank, the Mel frequency is expressed as:
f mel =2595·log 10 (1+f Hz /700)
wherein f is Hz F (m) is the frequency value corresponding to the center of the triangle in a triangle band-pass filter, f mel Is the cepstral frequency;
the frequency response function of the triangular band-pass filter is:
Figure BDA0003572436460000021
wherein f (m) is a center frequency,
Figure BDA0003572436460000022
seventhly, discrete cosine transform: substituting the logarithmic energy into discrete cosine transform, solving the parameter of the Mel-scale Cepstrum of the L order, and finally generating a voice feature vector of the voice information;
and S2, training the voice feature vectors through an SVM algorithm, and then recognizing.
Further, the training process of step S2 includes the following steps:
s2.1, preparing two sections of different voice information, wherein the two sections of voice information comprise instruction information and interference information, respectively obtaining an instruction information characteristic vector and an interference information characteristic vector by calculating the instruction information and the interference information, and sending the instruction information characteristic vector and the interference information characteristic vector into an SVM classifier for training;
s2.2, training meets the following classification function:
Figure BDA0003572436460000031
wherein, y i (w·x i +b)≥1-ζ i ,ζ i More than or equal to 0, i-1, 2. w is the hyperplane parameter, b is the hyperplane bias, ζ is the relaxation variable, and C is the penalty coefficient.
According to another technical scheme, the intelligent water cup voice recognition method is applied to an intelligent water cup.
Compared with the prior art, the beneficial effect of this scheme:
1. the scheme adopts a self-adaptive data preprocessing method to accurately extract the voice high-frequency information. Generally, the speech is different according to the speaking habits of speakers, and the definition of words is often not very same, so that the high-frequency signal in the speech signal is accurately extracted, and a very key two-point effect is played for the correct recognition of the subsequent speech. The scheme provides a self-adaptive high-frequency filtering method, which can accurately extract high-frequency information segments in the voice and is convenient for subsequent voice recognition;
2. the scheme is more accurate in detection, and can improve the identification accuracy of a specific instruction for a specific scene, for example: the heating command can be accurately recognized, the voice interaction content of the intelligent water cup is not much, and too many words do not need to be recognized generally, so that aiming at the characteristic, a voice recognition method is designed, the heating command is accurately recognized, other voice signals cannot be recognized and judged, the idea greatly simplifies the recognition difficulty, and the recognition accuracy is improved;
3. according to the scheme, the SVM is used for classifying the voice characteristics, and the calculation is more accurate. The SVM is originally a two-classification method, the classification of feature vectors is realized through a hyperplane, and the scene is a two-classification scene;
4. this scheme user convenient to use provides the route of this intelligence pronunciation drinking cup of extra convenient use for the user.
Drawings
FIG. 1 is a schematic view showing the structure of a smart cup according to embodiment 2;
fig. 2 is a flowchart of the speech recognition method for the intelligent cup in embodiment 2.
Detailed Description
The present invention will be described in further detail below by way of specific embodiments:
reference numerals in the drawings of the specification include: cup body 1, control chip 2, lithium cell 3, wireless charging coil 4.
Example 1
A voice recognition method for an intelligent water cup comprises the following steps:
s1, calculating MFCC feature vectors of a section of voice information, wherein the calculation method of the MFCC feature vectors comprises the following steps:
reading voice information: inputting the voice information in the wav format, and converting the voice information into the wav format if the voice information is not in the wav format.
Preprocessing data: in order to highlight the high frequency part of the data and weaken the low frequency part of the data, an adaptive high-pass filtering method is provided for processing the voice information:
y(n)=x(n-1)+10·log 10 ·(x(n)-x(n-1)+1)
where x (n) is an input signal, y (n) is an output signal, n is time, x (n) is the amplitude of the sound waveform at the current time, and x (n-1) is the amplitude of the waveform at the previous time.
Frame division: for an original speech signal, if fourier transform is performed directly, frequency information of the whole speech signal will be obtained, and time domain information is lost. In order to avoid the phenomenon, a frame division method is adopted to divide a section of voice signals into N sections of small voice signals, and meanwhile, the continuity between frames is considered, and the frames are overlapped partially when the segmentation is carried out; the frame length is typically set to 25ms and the frame shift to 10 ms.
Fourthly, windowing: windowing is required after the framing of the signal so that continuity between frames increases. Windowing the framed data by using a Hamming window formula:
W(n)=(1-a)-a·cos(2·π·n/N)
where a is 0.46 and n is a small segment of speech data.
Fourier transform: and performing fast Fourier transform on the windowed data, performing modulus and square on the obtained matrix to obtain energy spectrum density, and adding the energy spectrum densities of the frames to obtain the energy sum matrix of each frame.
Sixthly, triangular band-pass filtering: the human ear can still distinguish speech normally in a noisy and chaotic environment, in the process, the cochlea plays a very important role. The cochlea filters the received voice signals on a logarithmic frequency scale, the linear scale is generally below 1000Hz, and both scales above 1000Hz are logarithmic scales, so that the human ear has higher sensitivity to low-frequency signals. Based on this, a Mel frequency filter bank is employed, which has human cochlear perception characteristics. The Mel frequency is expressed as:
f mel =2595·log 10 (1+f Hz /700)
wherein f is Hz F (m) is the frequency value corresponding to the center of the triangle in a triangular band-pass filter, f mel Is the cepstral frequency;
the frequency response function of the triangular band-pass filter is:
Figure BDA0003572436460000051
wherein f (m) is a center frequency,
Figure BDA0003572436460000052
seventhly, discrete cosine transform: the logarithmic energy is brought into discrete cosine transform to obtain the parameter of Mel-scale Cepstrum of L order,
Figure BDA0003572436460000053
through the steps, the voice feature vector of the voice information is finally generated.
S2, training and optimizing the voice feature vectors through an SVM algorithm, wherein the training process comprises the following steps:
s2.1, preparing two sections of different voice information, wherein the two sections of voice information comprise instruction information and interference information, respectively obtaining an instruction information characteristic vector and an interference information characteristic vector by calculating the instruction information and the interference information, and sending the instruction information characteristic vector and the interference information characteristic vector into an SVM classifier for training;
s2.2, training meets the following classification function:
Figure BDA0003572436460000054
wherein, y i (w·x i +b)≥1-ζ i ,ζ i More than or equal to 0, i-1, 2. w is hyperplane parameter, b is hyperplaneSurface bias, ζ is the relaxation variable, and C is the penalty factor.
Example 2
As shown in fig. 1 and 2, the voice recognition method of embodiment 1 is applied to an intelligent water cup, and the intelligent water cup of this embodiment includes a wireless charging coil 4, a lithium battery 3, a control chip 2, and a water cup body 1. Wireless charging coil 4 is connected with lithium cell 3 through control chip 2, controls charging and the discharge heating of lithium cell 3, and the last miniature microphone that has integrateed of control chip 2 for gather the speech control instruction, integrated PTC heating plate in the drinking cup body of cup 1, be used for heating the drinking cup inner wall, and then heating drinking water.
The scene applied to the intelligent water cup mainly judges whether the voice information contains heating or not, and is a typical two-classification scene. The SVM algorithm adopted by the scheme is a classification algorithm which is very widely applied, particularly the SVM algorithm is very widely applied to two classifications, and the SVM algorithm is very suitable for two classification judgment of voice signals under the scene. The SVM training process is to find an optimal hyperplane for distinguishing two different voice data. During SVM training, the instruction information is "heating", and the interference information is any other content, such as "warming".
The working process of the embodiment:
1. saying "heat" against the cup;
2. at the moment, the miniature microphone on the intelligent water cup acquires voice information;
3. extracting MFCC characteristic vectors from the collected voice information by using the method of the embodiment;
4. classifying the MFCC feature vectors by using a pre-trained SVM model, and outputting classification information;
5. and when the heating voice information is detected, the PTC heating sheet of the water cup is controlled to be electrified to heat.
The heating is performed using a PTC heating plate instead of a material such as nichrome wire. Because the PTC heating sheet can not exceed the highest heating temperature when heated, the PTC heating sheet has good temperature protection capability, and reduces the potential safety hazard while meeting the heating condition.
The above are merely examples of the present invention, and common general knowledge of known specific structures and/or characteristics in the schemes has not been described herein. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, many variations and modifications can be made, which should also be considered as the scope of protection of the present invention, and these will not affect the effect and the exclusive utility of the implementation of the present invention. The scope of the claims of the present application shall be defined by the content of the claims, and the description of the embodiments and the like in the specification shall be used to explain the content of the claims.

Claims (3)

1. A voice recognition method for an intelligent water cup is characterized by comprising the following steps: the method comprises the following steps:
s1, calculating MFCC feature vectors of a section of voice information, wherein the calculation method of the MFCC feature vectors comprises the following steps:
reading voice information: inputting voice information in wav format;
preprocessing data: an adaptive high-pass filtering method is provided for processing voice information:
y(n)=x(n-1)+10.log 10 ·(x(n)-x(n-1)+1)
wherein x (n) is an input signal, y (n) is an output signal, n is time, x (n) is the amplitude of a sound waveform at the current time, and x (n-1) is the amplitude of the waveform at the last time;
thirdly, framing: dividing a voice signal into N small voice signals, and enabling frames to have a part of mutual overlap when segmentation is carried out;
fourthly, windowing: windowing the framed data by using a Hamming window formula:
W(n)=(1-a)-a·cos(2·π·n/N)
wherein, the value of a is 0.46, and n is a segment of small voice data;
fourier transform: performing fast Fourier transform on the windowed data, performing modulus taking and then square taking on the obtained matrix to obtain energy spectrum density, and adding the energy spectrum density of each frame to obtain an energy sum matrix of each frame;
sixthly, triangular band-pass filtering: using a Mel-frequency filter bank, the Mel frequency is expressed as:
f mel =2595.log 10 (1+f Hz /700)
wherein f is Hz F (m) is the frequency value corresponding to the center of the triangle in a triangular band-pass filter, f mel Is the cepstral frequency;
the frequency response function of the triangular band-pass filter is:
Figure FDA0003572436450000011
wherein f (m) is a center frequency,
Figure FDA0003572436450000012
seventhly, discrete cosine transform: introducing logarithmic energy into discrete cosine transform, solving an L-order Mel-scale Cepstrum parameter, and finally generating a voice feature vector of voice information;
and S2, training the voice feature vectors through an SVM algorithm, and then recognizing.
2. The intelligent water cup voice recognition method according to claim 1, wherein: the training process of step S2 includes the following steps:
s2.1, preparing two sections of different voice information, wherein the two sections of voice information comprise instruction information and interference information, respectively obtaining an instruction information characteristic vector and an interference information characteristic vector by calculating the instruction information and the interference information, and sending the instruction information characteristic vector and the interference information characteristic vector into an SVM classifier for training;
s2.2, training to meet the following classification function:
Figure FDA0003572436450000021
wherein, the first and the second end of the pipe are connected with each other,y i (w·x i +b)≥1-ζ i ,ζ i more than or equal to 0, i-1, 2. w is the hyperplane parameter, b is the hyperplane bias, ζ is the relaxation variable, and C is the penalty coefficient.
3. The intelligent water cup voice recognition method of claim 1 or 2 is applied to an intelligent water cup.
CN202210322946.4A 2022-03-30 2022-03-30 Voice recognition method and device for intelligent water cup Withdrawn CN114792517A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210322946.4A CN114792517A (en) 2022-03-30 2022-03-30 Voice recognition method and device for intelligent water cup

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210322946.4A CN114792517A (en) 2022-03-30 2022-03-30 Voice recognition method and device for intelligent water cup

Publications (1)

Publication Number Publication Date
CN114792517A true CN114792517A (en) 2022-07-26

Family

ID=82462312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210322946.4A Withdrawn CN114792517A (en) 2022-03-30 2022-03-30 Voice recognition method and device for intelligent water cup

Country Status (1)

Country Link
CN (1) CN114792517A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118016106A (en) * 2024-04-08 2024-05-10 山东第一医科大学附属省立医院(山东省立医院) Elderly emotion health analysis and support system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118016106A (en) * 2024-04-08 2024-05-10 山东第一医科大学附属省立医院(山东省立医院) Elderly emotion health analysis and support system

Similar Documents

Publication Publication Date Title
US11322155B2 (en) Method and apparatus for establishing voiceprint model, computer device, and storage medium
WO2019023877A1 (en) Specific sound recognition method and device, and storage medium
CN106847292A (en) Method for recognizing sound-groove and device
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
CN106571135A (en) Ear voice feature extraction method and system
CN103280220A (en) Real-time recognition method for baby cry
Fook et al. Comparison of speech parameterization techniques for the classification of speech disfluencies
Osmani et al. Machine learning approach for infant cry interpretation
CN101620853A (en) Speech-emotion recognition method based on improved fuzzy vector quantization
CN106264839A (en) Intelligent snore stopping pillow
CN112397074A (en) Voiceprint recognition method based on MFCC (Mel frequency cepstrum coefficient) and vector element learning
CN110942784A (en) Snore classification system based on support vector machine
Bhagatpatil et al. An automatic infant’s cry detection using linear frequency cepstrum coefficients (LFCC)
CN114792517A (en) Voice recognition method and device for intelligent water cup
Eray et al. An application of speech recognition with support vector machines
CN112017658A (en) Operation control system based on intelligent human-computer interaction
CN115346561A (en) Method and system for estimating and predicting depression mood based on voice characteristics
Kamaruddin et al. Features extraction for speech emotion
Vesperini et al. Snore sounds excitation localization by using scattering transform and deep neural networks
CN108564967A (en) Mel energy vocal print feature extracting methods towards crying detecting system
CN111862991A (en) Method and system for identifying baby crying
Pan et al. The Methods of Realizing Baby Crying Recognition and Intelligent Monitoring Based on DNN-GMM-HMM
CN115641839A (en) Intelligent voice recognition method and system
Mini et al. Feature vector selection of fusion of MFCC and SMRT coefficients for SVM classifier based speech recognition system
Hanifa et al. Comparative analysis on different cepstral features for speaker identification recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220726

WW01 Invention patent application withdrawn after publication