CN116229966A - Method and system for controlling intelligent electrical appliance based on voice - Google Patents

Method and system for controlling intelligent electrical appliance based on voice Download PDF

Info

Publication number
CN116229966A
CN116229966A CN202310054485.1A CN202310054485A CN116229966A CN 116229966 A CN116229966 A CN 116229966A CN 202310054485 A CN202310054485 A CN 202310054485A CN 116229966 A CN116229966 A CN 116229966A
Authority
CN
China
Prior art keywords
voice
sound
layer
recognition model
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310054485.1A
Other languages
Chinese (zh)
Inventor
董立伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Hengyan Electronics Co ltd
Original Assignee
Foshan Hengyan Electronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Hengyan Electronics Co ltd filed Critical Foshan Hengyan Electronics Co ltd
Priority to CN202310054485.1A priority Critical patent/CN116229966A/en
Publication of CN116229966A publication Critical patent/CN116229966A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The invention relates to the technical field of voice control and discloses a method and a system for controlling an intelligent electrical appliance based on voice, wherein a voice recognition model based on a CNN model is established, the voice recognition model comprises a convolution layer, a batch normalization layer, a ReLu activation function layer, a maximum pooling layer, a full connection layer and a classification layer which are sequentially connected, and voice data in a spectrogram is captured by using the voice recognition model to train the voice recognition model; inputting the collected sound data into a trained sound recognition model, and determining the voice content to be recognized; and transmitting the voice content to an interconnection gateway of the intelligent electric appliance through the server, so as to realize voice control of the intelligent electric appliance. The invention not only reduces the development cost of the embedded equipment and saves resources, but also improves the accuracy and the operation speed of voice recognition.

Description

Method and system for controlling intelligent electrical appliance based on voice
Technical Field
The application relates to the technical field of voice control, in particular to a method and a system for controlling an intelligent electrical appliance based on voice.
Background
Along with the continuous development of the economy in China, the living standard of people is continuously improved, and the transition of the traditional household living system to the intelligent household living system is a necessary trend. However, the intelligent electrical system still has some defects and needs to be improved: on the one hand, for traditional household appliances, the current intelligent electrical appliance system is lack of transition mode, so that the traditional household appliances are difficult to network, and therefore consumers selecting the intelligent electrical appliance system have to update the household appliances uniformly at great cost, the use threshold of the intelligent electrical appliance system is improved, and the development of the intelligent electrical appliance system is slow; on the other hand, for the construction of an ideal intelligent electrical appliance mode facing the future, the improvement of the understanding recall rate and the response speed of intelligent voice control is a key point, but the existing voice recognition system cannot meet the requirements, and the prepositive optimization design facing the future is required.
Therefore, a method is needed to reduce the development cost of the embedded device, save resources, and improve the accuracy and the operation speed of voice recognition.
Disclosure of Invention
The application embodiment provides a method and a system for controlling an intelligent electrical appliance based on voice, which reduce the development cost of embedded equipment, save resources and improve the accuracy and the operation speed of voice recognition.
In order to achieve the above purpose, the embodiments of the present application adopt the following technical solutions:
in a first aspect, a method for controlling a smart appliance based on voice is provided, the method comprising the steps of:
step S1, a sound acquisition module in an intelligent electrical appliance acquires an original sound signal, a controller of a control module in the intelligent electrical appliance pre-processes the original sound signal, generates a processed sound signal, and stores the processed sound signal as a model training set to a server;
s2, extracting characteristic data of the processed sound signals, and representing a sound data set by using a spectrogram;
step S3, establishing a voice recognition model based on a CNN model in a language recognition unit of the server, capturing voice data in a spectrogram by using the voice recognition model, training the voice recognition model by using a model training set, and presetting the iteration times and the learning rate to be complete;
the voice recognition model comprises a convolution layer, a batch normalization layer, a ReLu activation function layer, a maximum pooling layer, a full connection layer and a classification layer which are connected in sequence;
s4, inputting the voice data set into the voice recognition model in a data cube structure to train the voice recognition model, and generating a trained voice recognition model;
s5, the sound collection module collects the site sound data in real time to generate a corresponding sound data set represented by a spectrogram, and the sound data set is input into a sound recognition model of a training number to determine the voice content to be recognized;
and S6, transmitting the voice content to be recognized to an interconnection gateway of the intelligent electric appliance through a server to realize voice control of the intelligent electric appliance.
In a possible implementation manner, the method for performing preprocessing in step S1 includes a sound denoising process, a sound pre-emphasis process, a sound windowing framing process, and a sound endpoint detection process.
In one possible implementation manner, the step S2 includes:
and extracting the characteristic data of the processed sound signal by using a linear prediction coefficient algorithm.
In one possible implementation manner, the step S3 includes:
after inputting the sound data set into the sound recognition model, extracting local information of the sound data set by utilizing a convolution layer, a batch normalization layer and a ReLu activation function layer, sampling the sound data by utilizing the convolution layer, the batch normalization layer and the ReLu activation function layer, mining the sound data set by utilizing the depths of a maximum pooling layer, a full connection layer and a classification layer, calculating loss values of predicted points of the sound data set and real points of the sound data set in a spectrogram by utilizing a loss function after characteristic information of the sound data set is aggregated, continuously iterating and attenuating the loss values by utilizing a learning rate reduction method, optimizing weight parameters of the sound recognition model until the iteration times are equal to the maximum iteration times, stopping training, and generating the trained sound recognition model.
In one possible implementation, the loss function comprises a cross entropy loss function.
In one possible implementation manner, the step S5 includes:
and inputting the corresponding sound data set into a trained sound recognition model, comparing and matching the sound data set with sample parameters in a server sample library through deep mining analysis, and determining the voice content to be recognized according to the matching similarity.
In one possible implementation manner, the step S6 includes:
and identifying the voice content to be identified by using the attention mechanism model, and realizing instruction conversion.
In a second aspect, the present invention further provides a system for controlling an intelligent electrical appliance based on voice, including a sound collection module, a server and an interconnection gateway, wherein:
the intelligent electric appliance comprises a sound acquisition module, a server and a model training set, wherein the sound acquisition module is used for acquiring an original sound signal, and a controller of the control module in the intelligent electric appliance preprocesses the original sound signal to generate a processed sound signal and stores the processed sound signal as the model training set to the server; the method comprises the steps of acquiring live sound data in real time to generate a corresponding sound data set represented by a spectrogram, and determining voice content to be recognized;
a server for storing the feature data of the processed sound signal;
the server comprises a language identification unit, wherein the language identification unit is used for establishing a voice identification model based on a CNN model, capturing voice data in a spectrogram by using the voice identification model, training the voice identification model by using a model training set, and presetting the iteration times and the learning rate to be complete; the voice recognition model comprises a convolution layer, a batch normalization layer, a ReLu activation function layer, a maximum pooling layer, a full connection layer and a classification layer which are connected in sequence; inputting the voice data set into the voice recognition model in a data cube structure to train the voice recognition model, and generating a trained voice recognition model;
and the interconnection gateway is used for receiving the voice content to be recognized and transmitted by the server and realizing voice control of the intelligent electrical appliance.
In a third aspect, the present invention also provides an electronic device comprising a processor and a memory; the processor comprises the system for intelligent monitoring based on image recognition according to the second aspect.
In a fourth aspect, the present invention also provides a computer-readable storage medium comprising instructions; when the instructions are executed on the electronic device described in the third aspect, the electronic device is caused to perform the method described in the first aspect,
drawings
Fig. 1 is a schematic structural diagram of a sound collection module in a method and a system for controlling an intelligent electrical appliance based on voice according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a system in a method and a system for controlling an intelligent electrical appliance based on voice according to an embodiment of the present application.
The invention can improve the accuracy and stability of the intelligent electrical appliance voice control system identification.
The invention can improve the response speed of the intelligent electrical appliance voice control system.
The invention reduces the development cost of the embedded equipment, saves resources, and improves the accuracy and the operation speed of voice recognition.
Detailed Description
It should be noted that the terms "first," "second," and the like in the embodiments of the present application are used for distinguishing between the same type of feature, and not to be construed as indicating a relative importance, quantity, order, or the like.
The terms "exemplary" or "such as" and the like, as used in connection with embodiments of the present application, are intended to be exemplary, or descriptive. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
The terms "coupled" and "connected" in connection with embodiments of the present application are to be construed broadly, and may refer, for example, to a physical direct connection, or to an indirect connection via electronic devices, such as, for example, a connection via electrical resistance, inductance, capacitance, or other electronic devices.
Example 1:
in the method for performing intelligent monitoring based on image recognition in the standard data format in the neural network, the sound in reality is a continuous signal, most of the sound is stored in discrete digital signals, such as a CD and MP3 audio format, the sound is collected in a continuous signal, the discrete signal is adopted, and the collection density is represented by a sampling rate.
From the network structure, the special network structure of the CNN model enables the CNN model to extract local information of the input voice features, and the invariance of the CNN model to the translation of the input features in frequency and time domains is enhanced through the pooling layer downsampling operation, so that the robustness of the model is greatly enhanced. The CNN model serves as a deep model that can effectively model the spatial distribution of speech feature data.
As shown in fig. 1, C is a convolution layer, BN is a batch normalization layer, reLu is a ReLu activation function layer, P is a max pooling layer, FC is a fully connected layer, softmax is a classification layer, and the embodiment establishes a voice recognition model based on a CNN model, where the voice recognition model includes the convolution layer, the batch normalization layer, the ReLu activation function layer, the max pooling layer, the fully connected layer, and the classification layer, which are sequentially connected. After inputting the sound data set into the sound recognition model, extracting local information of the sound data set by utilizing a convolution layer, a batch normalization layer and a ReLu activation function layer, sampling the sound data by utilizing the convolution layer, the batch normalization layer and the ReLu activation function layer, mining the sound data set by utilizing the depths of a maximum pooling layer, a full connection layer and a classification layer, calculating loss values of predicted points of the sound data set and real points of the sound data set in a spectrogram by utilizing a loss function after characteristic information of the sound data set is aggregated, continuously iterating and attenuating the loss values by utilizing a learning rate reduction method, optimizing weight parameters of the sound recognition model until the iteration times are equal to the maximum iteration times, stopping training, and generating the trained sound recognition model.
Example 2:
the present embodiment is further optimized on the basis of embodiment 1, in which the pre-emphasis processing is signal compensation or enhancement of the high frequency component of the input speech signal. For the spectrum of the sound signal, the energy of the low frequency part is generally higher than that of the high frequency part, and the high frequency end is generally attenuated at a speed of 6 dB/frequency multiplication above 800Hz, and in order to reduce the influence of body organs, actions and the like in the sounding process, the energy of the high frequency part and the low frequency part have similar amplitude, so that the spectrum of the signal is kept flat in the whole frequency band from low frequency to high frequency, and pre-emphasis is a necessary step of preprocessing. Meanwhile, the noise in the signal is unchanged, the energy of the high-frequency part is increased, and the signal-to-noise ratio can be improved.
The sound windowing framing process is to divide the complete voice signal into a small section according to the time period, wherein each section is a frame, and then only each frame needs to be calculated, namely the framing process. To achieve framing processing requires multiplying the speech signal with a movable window weight of determined length, called windowing.
The sound endpoint detection process detects a speech segment according to two functions of short-time average energy and short-time average zero-crossing rate of the speech signal: in chinese there are unvoiced and voiced sounds, where voiced sounds contain a parent sound and there is a great energy in the parent sound, and unvoiced sounds include sub-sounds (consonants) with a high frequency, so that the voiced sounds can be detected with short-time average energy, the unvoiced sounds can be detected with short-time average zero-crossing rate (zero-crossing), and thus the whole syllable of chinese characters can be found.
After denoising, a high-frequency component is increased through a pre-emphasis process; then windowing and framing are carried out, so that the digitization processing is facilitated; and then, the effective voice segment can be detected, only the effective voice segment is processed, the data volume is reduced, and the problems of missing detection and false detection can be reduced by utilizing an improved endpoint detection algorithm, so that the method has good robustness.
Other portions of this embodiment are the same as those of embodiment 1, and thus will not be described in detail.
Example 3:
the present embodiment is further optimized on the basis of the above embodiment 1 or 2, in which the loss function includes a cross entropy loss function, and the voice recognition model is constructed based on a CNN neural network, and thus, the output is a vector, not in the form of a probability distribution. Therefore, the softmax activation function is required to "normalize" a vector into the form of a probability distribution, and then calculate loss using the cross entropy loss function. The cross entropy loss function can describe the difference of the two probability distributions, thus ultimately using the way the softmax activation function and the cross entropy loss function combine.
Example 4:
the basic principle of the present embodiment is that, for a speech signal, there is a linear relationship between the previous time and the subsequent time, i.e. the value at a certain time can describe the prediction by a plurality of sampling values at the previous time, and the value is infinitely approximated to the sampling values, in this way, a uniquely determined set of coefficient values, i.e. a characteristic parameter of the signal, can be obtained. The linear prediction coefficient algorithm has the advantage that the estimated characteristic parameters are accurate and can be used for describing the time domain characteristic and the frequency domain characteristic of the voice signal.
Other portions of this embodiment are the same as any of embodiments 1 to 3 described above, and thus will not be described again.
Example 5:
the present embodiment is further optimized based on any one of the above embodiments 1 to 4, and as shown in fig. 2, the present embodiment provides a system for controlling an intelligent electrical appliance based on voice, which is used for the intelligent electrical appliance, and includes a sound collection module, a server and an interconnection gateway. After completing the task that the voice content to be recognized is transmitted to the interconnection gateway of the intelligent electrical appliance through the server, the server side transmits the recognition result to the hardware platform of the intelligent electrical appliance for execution.
The controller for controlling the intelligent electrical appliance can be connected to the network through the interconnection gateway of the intelligent electrical appliance, the server can recognize the voice command and then package the command information, the voice recognition model TCP/IP voice recognition model protocol is used for transmitting the command information, and the interconnection gateway of the intelligent electrical appliance can analyze and recognize useful information and send specific control commands to the controller after receiving the data packet, so that the remote voice control function is completed. By transferring the voice recognition task to the server, the intelligent electrical appliance interconnection gateway only needs to upload data, execute commands and analyze protocols. The invention not only reduces the development cost of the embedded equipment and saves resources, but also improves the accuracy and the operation speed of voice recognition.
Other portions of this embodiment are the same as any of embodiments 1 to 4 described above, and thus will not be described again.
Example 6:
the invention also provides an electronic device, which comprises a processor and a memory; the processor comprises the system for intelligent monitoring based on image recognition described in the embodiment.
Example 7:
the present invention also provides a computer-readable storage medium comprising instructions; when the instructions are executed on the electronic device described in the above embodiment, the electronic device is caused to perform the method described in the above embodiment. In the alternative, the computer readable storage medium may be a memory.
The processor referred to in the embodiments of the present application may be a chip. For example, it may be a field programmable gate array (field programmable gate array, FPGA), an application specific integrated chip (application specific integrated circuit, ASIC), a system on chip (SoC), a central processing unit (central processor unit, CPU), a network processor (network processor, NP), a digital signal processing circuit (digital signal processor, DSP), a microcontroller (micro controller unit, MCU), a programmable controller (programmable logic device, PLD) or other integrated chip.
The memory to which embodiments of the present application relate may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system, apparatus and module may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, e.g., the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple modules or components may be combined or integrated into another device, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physically separate, i.e., may be located in one device, or may be distributed over multiple devices. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated in one device, or each module may exist alone physically, or two or more modules may be integrated in one device.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (Digital Subscriber Line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more servers, data centers, etc. that can be integrated with the medium. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. The method for controlling the intelligent electrical appliance based on the voice is characterized by comprising the following steps of:
step S1, a sound acquisition module in an intelligent electrical appliance acquires an original sound signal, a controller of a control module in the intelligent electrical appliance pre-processes the original sound signal, generates a processed sound signal, and stores the processed sound signal as a model training set to a server;
s2, extracting characteristic data of the processed sound signals, and representing a sound data set by using a spectrogram;
step S3, establishing a voice recognition model based on a CNN model in a language recognition unit of the server, capturing voice data in a spectrogram by using the voice recognition model, training the voice recognition model by using a model training set, and presetting the iteration times and the learning rate to be complete;
the voice recognition model comprises a convolution layer, a batch normalization layer, a ReLu activation function layer, a maximum pooling layer, a full connection layer and a classification layer which are connected in sequence;
s4, inputting the voice data set into the voice recognition model in a data cube structure to train the voice recognition model, and generating a trained voice recognition model;
s5, the sound collection module collects the site sound data in real time to generate a corresponding sound data set represented by a spectrogram, and the sound data set is input into a sound recognition model of a training number to determine the voice content to be recognized;
and S6, transmitting the voice content to be recognized to an interconnection gateway of the intelligent electric appliance through a server to realize voice control of the intelligent electric appliance.
2. The method for controlling a smart appliance according to claim 1, wherein the pre-processing in step S1 includes a sound denoising process, a sound pre-emphasis process, a sound windowing framing process, and a sound endpoint detection process.
3. The method of claim 1, wherein the step S2 includes:
and extracting the characteristic data of the processed sound signal by using a linear prediction coefficient algorithm.
4. The method of claim 1, wherein the step S3 includes:
after inputting the sound data set into the sound recognition model, extracting local information of the sound data set by utilizing a convolution layer, a batch normalization layer and a ReLu activation function layer, sampling the sound data by utilizing the convolution layer, the batch normalization layer and the ReLu activation function layer, mining the sound data set by utilizing the depths of a maximum pooling layer, a full connection layer and a classification layer, calculating loss values of predicted points of the sound data set and real points of the sound data set in a spectrogram by utilizing a loss function after characteristic information of the sound data set is aggregated, continuously iterating and attenuating the loss values by utilizing a learning rate reduction method, optimizing weight parameters of the sound recognition model until the iteration times are equal to the maximum iteration times, stopping training, and generating the trained sound recognition model.
5. The method of claim 4, wherein the loss function comprises a cross entropy loss function.
6. The method of claim 1, wherein the step S5 comprises:
and inputting the corresponding sound data set into a trained sound recognition model, comparing and matching the sound data set with sample parameters in a server sample library through deep mining analysis, and determining the voice content to be recognized according to the matching similarity.
7. The method of claim 1, wherein the step S6 includes:
and identifying the voice content to be identified by using the attention mechanism model, and realizing instruction conversion.
8. A system based on voice control intelligent electrical apparatus for intelligent electrical apparatus, its characterized in that includes sound collection module, server and interconnection gateway, wherein:
the intelligent electric appliance comprises a sound acquisition module, a server and a model training set, wherein the sound acquisition module is used for acquiring an original sound signal, and a controller of the control module in the intelligent electric appliance preprocesses the original sound signal to generate a processed sound signal and stores the processed sound signal as the model training set to the server; the method comprises the steps of acquiring live sound data in real time to generate a corresponding sound data set represented by a spectrogram, and determining voice content to be recognized;
a server for storing the feature data of the processed sound signal;
the server comprises a language identification unit, wherein the language identification unit is used for establishing a voice identification model based on a CNN model, capturing voice data in a spectrogram by using the voice identification model, training the voice identification model by using a model training set, and presetting the iteration times and the learning rate to be complete; the voice recognition model comprises a convolution layer, a batch normalization layer, a ReLu activation function layer, a maximum pooling layer, a full connection layer and a classification layer which are connected in sequence; inputting the voice data set into the voice recognition model in a data cube structure to train the voice recognition model, and generating a trained voice recognition model;
and the interconnection gateway is used for receiving the voice content to be recognized and transmitted by the server and realizing voice control of the intelligent electrical appliance.
9. An electronic device comprising a processor and a memory; the processor comprises the system for intelligent monitoring based on image recognition as claimed in claim 8.
10. A computer-readable storage medium, the computer-readable storage medium comprising instructions; the instructions, when executed on an electronic device as claimed in claim 8, cause the electronic device to perform the method as claimed in any one of claims 1-7.
CN202310054485.1A 2023-02-03 2023-02-03 Method and system for controlling intelligent electrical appliance based on voice Pending CN116229966A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310054485.1A CN116229966A (en) 2023-02-03 2023-02-03 Method and system for controlling intelligent electrical appliance based on voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310054485.1A CN116229966A (en) 2023-02-03 2023-02-03 Method and system for controlling intelligent electrical appliance based on voice

Publications (1)

Publication Number Publication Date
CN116229966A true CN116229966A (en) 2023-06-06

Family

ID=86583735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310054485.1A Pending CN116229966A (en) 2023-02-03 2023-02-03 Method and system for controlling intelligent electrical appliance based on voice

Country Status (1)

Country Link
CN (1) CN116229966A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10726858B2 (en) * 2018-06-22 2020-07-28 Intel Corporation Neural network for speech denoising trained with deep feature losses
CN113921000A (en) * 2021-08-25 2022-01-11 哈尔滨工业大学 Online instruction word voice recognition method and system in noise environment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10726858B2 (en) * 2018-06-22 2020-07-28 Intel Corporation Neural network for speech denoising trained with deep feature losses
CN113921000A (en) * 2021-08-25 2022-01-11 哈尔滨工业大学 Online instruction word voice recognition method and system in noise environment

Similar Documents

Publication Publication Date Title
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
CN107393526B (en) Voice silence detection method, device, computer equipment and storage medium
CN110491416B (en) Telephone voice emotion analysis and identification method based on LSTM and SAE
WO2021043015A1 (en) Speech recognition method and apparatus, and neural network training method and apparatus
CN110364143B (en) Voice awakening method and device and intelligent electronic equipment
CN111325095B (en) Intelligent detection method and system for equipment health state based on acoustic wave signals
CN106847281A (en) Intelligent household voice control system and method based on voice fuzzy identification technology
WO2021189642A1 (en) Method and device for signal processing, computer device, and storage medium
CN108922543B (en) Model base establishing method, voice recognition method, device, equipment and medium
CN105989836A (en) Voice acquisition method, device and terminal equipment
CN113823264A (en) Speech recognition method, speech recognition device, computer-readable storage medium and computer equipment
CN113035202B (en) Identity recognition method and device
CN113488060B (en) Voiceprint recognition method and system based on variation information bottleneck
CN111540342A (en) Energy threshold adjusting method, device, equipment and medium
CN112183107A (en) Audio processing method and device
CN113129900A (en) Voiceprint extraction model construction method, voiceprint identification method and related equipment
CN114859269A (en) Cable fault diagnosis method based on voiceprint recognition technology
CN115376526A (en) Power equipment fault detection method and system based on voiceprint recognition
CN116913258B (en) Speech signal recognition method, device, electronic equipment and computer readable medium
US20230186943A1 (en) Voice activity detection method and apparatus, and storage medium
CN116229966A (en) Method and system for controlling intelligent electrical appliance based on voice
CN116741159A (en) Audio classification and model training method and device, electronic equipment and storage medium
CN104240705A (en) Intelligent voice-recognition locking system for safe box
CN113327633A (en) Method and device for detecting noisy speech endpoint based on deep neural network model
CN106782550A (en) A kind of automatic speech recognition system based on dsp chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination