CN115938364A

CN115938364A - Intelligent identification control method, terminal equipment and readable storage medium

Info

Publication number: CN115938364A
Application number: CN202211589832.2A
Authority: CN
Inventors: 刘晨飞
Original assignee: Tianjin Jinhang Computing Technology Research Institute
Current assignee: Tianjin Jinhang Computing Technology Research Institute
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-04-07

Abstract

The application provides an intelligent identification control method, a terminal device and a readable storage medium, wherein the intelligent identification control method comprises the following steps: acquiring a voice signal; extracting voice features of the voice signals; inputting the voice features into a recognition model, and outputting instruction information corresponding to the voice features; the recognition model is obtained by training a neural network and is used for recognizing instruction information corresponding to the voice features; and outputting a corresponding control instruction based on the instruction information. In the method, the voice characteristics are taken as the recognition objects, so that the recognition accuracy of the spoken commands sent by the user is improved, and meanwhile, the recognition model is obtained by neural network training, and the recognition efficiency is improved.

Description

Intelligent identification control method, terminal equipment and readable storage medium

Technical Field

The present disclosure relates generally to the field of intelligent control technologies, and in particular, to an intelligent recognition control method, a terminal device, and a readable storage medium.

Background

Voice is one of the main ways of human communication. With the development of technologies such as internet of things and automatic control, various smart home devices appear in the lives of people. With the development and application of artificial intelligence technology, people have higher and higher control requirements on intelligent household equipment, and the intelligent household equipment cannot be manually and mechanically operated, but is used to stand at any position of a room, so that the equipment can be conveniently controlled. Human-computer interaction tools based on voice interfaces are becoming more and more a part of people's lives, including cell-phones, bracelets, smart speakers and the like. The voice recognition enables the machine to understand the information to be transmitted by human beings, so that people control the machine through voice, and great convenience is brought to life and work of people.

In the prior art, a voice command sent by a user is compared with a standard command in a standard library to obtain a keyword, and the standard command with the largest similarity is used for controlling the smart home. However, in this method, due to the limitations of the accent and the intonation of the user, the keyword comparison method has a low accuracy and a low recognition efficiency; the use requirements of different users cannot be met.

Disclosure of Invention

In view of the foregoing drawbacks and deficiencies in the prior art, it is desirable to provide an intelligent recognition control method, a terminal device and a readable storage medium that can solve the above technical problems.

A first aspect of the present application provides an intelligent recognition control method, including:

acquiring a voice signal;

extracting voice features of the voice signals;

inputting the voice features into a recognition model, and outputting instruction information corresponding to the voice features; the recognition model is obtained by training a neural network and is used for recognizing instruction information corresponding to the voice features;

and outputting a corresponding control instruction based on the instruction information.

According to the technical scheme provided by the embodiment of the application, the voice features are Mel frequency cepstrum coefficients.

According to the technical scheme provided by the embodiment of the application, the method for extracting the voice feature of the voice signal comprises the following steps:

performing framing processing on the voice signals to obtain a plurality of framing signals;

performing Fourier transform on each frame signal to obtain a plurality of frequency point data;

calculating the frequency spectrum energy of each frequency point data;

and calculating the voice characteristics of the voice signal based on all the spectrum energy.

According to the technical scheme provided by the embodiment of the application, the method for calculating the voice characteristics of the voice signal based on all the spectrum energy comprises the following steps:

performing Mel filtering processing on the frequency spectrum energy of each frequency point data to obtain Mel filtering energy corresponding to each frequency spectrum energy;

and obtaining the Mel frequency cepstrum coefficient through cepstrum operation and logarithm operation according to all the Mel filtering energies.

According to the technical scheme provided by the embodiment of the application, the construction method of the identification model comprises the following steps:

acquiring a sample set, wherein the sample set comprises a voice feature sample and instruction information corresponding to the voice feature sample;

constructing a neural network;

and taking the voice feature sample as the input of the neural network, taking the instruction information as the output of the neural network, and training the neural network to obtain the recognition model.

According to the technical scheme provided by the embodiment of the application, the neural network is a convolutional neural network; the convolutional neural network comprises a first convolutional layer and a second convolutional layer;

the first convolution layer comprises a first convolution kernel, the first convolution kernel is an F x 1 convolution kernel, and the first convolution kernel is used for performing point-by-point convolution on input data of the convolutional neural network in a time dimension to obtain intermediate convolution data; f is the number of rows and the value of F is a set integer value;

the second convolution layer comprises a second convolution kernel, the second convolution kernel is a 1 xK convolution kernel, and the second convolution kernel is used for performing point-by-point convolution on the intermediate convolution data in a characteristic dimension to obtain final convolution data; k is the number of columns and the value of K is a set integer value.

According to the technical scheme provided by the embodiment of the application, before framing the voice signal to obtain a plurality of framing signals, the method further comprises the following steps: and filtering the voice signal to improve the high-frequency energy of the voice signal.

According to the technical scheme provided by the embodiment of the application, after the voice signal is subjected to framing processing to obtain a plurality of framing signals, the method further comprises the following steps: and windowing the framing signal to smooth the edges of the framing signal.

A second aspect of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the intelligent recognition control method as described above when executing the computer program.

A third aspect of the present application provides a computer-readable storage medium having a computer program which, when executed by a processor, performs the intelligent recognition control method steps as described above.

The beneficial effect of this application lies in: the voice characteristics of the voice signals are extracted and input into the recognition model, so that the instruction information corresponding to the voice characteristics can be accurately recognized, and then the corresponding control instruction is generated according to the instruction information, so that the intelligent control of the intelligent home is realized. In the process, the voice characteristics are used as recognition objects, so that the recognition accuracy of the spoken commands sent by the user is improved, and meanwhile, the recognition model is obtained by neural network training, and the recognition efficiency is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

fig. 1 is a flowchart of an intelligent recognition control method provided in the present application;

fig. 2 is a terminal device provided in the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Example 1

Please refer to fig. 1, which is a flowchart of an intelligent recognition control method provided in the present application, including:

s100: acquiring a voice signal;

s200: extracting voice features of the voice signals;

s300: inputting the voice features into a recognition model, and outputting instruction information corresponding to the voice features; the recognition model is obtained by training a neural network and is used for recognizing instruction information corresponding to the voice features;

s400: and outputting a corresponding control instruction based on the instruction information.

In some embodiments, the intelligent recognition control method is applied to a control end, and the control end comprises:

the voice acquisition module is used for acquiring voice information and generating the voice signal;

the voice processing module is used for acquiring the voice signal and extracting the voice characteristics of the voice signal;

a control module to:

It needs to be further explained that the control instruction is used for controlling the corresponding smart home actions; the smart home, such as a smart light fixture, a smart sound, etc.;

further, the control instruction is shown in table-1:

TABLE-1

The working principle is as follows:

Under the background, the application provides an intelligent recognition control method, which can accurately recognize instruction information corresponding to voice features by extracting the voice features of the voice signals and inputting the voice features into a recognition model, and then generate corresponding control instructions according to the instruction information so as to realize intelligent control of intelligent home. In the process, the voice characteristics are used as recognition objects, so that the recognition accuracy of the spoken commands sent by the user is improved, and meanwhile, the recognition model is obtained by neural network training, and the recognition efficiency is improved.

In some embodiments, the speech feature is a mel-frequency cepstral coefficient. The Mel Frequency Cepstral Coefficients (MFCC) refer to the Mel-scale frequency domain extracted cepstral coefficients; and the Mel frequency cepstrum coefficient is used as a recognition object of the voice characteristics, so that the accuracy of recognition is further improved.

In some embodiments, the method of extracting a speech feature of the speech signal comprises:

calculating the frequency spectrum energy of each frequency point data;

and calculating the voice characteristics of the voice signal based on all the spectral energy.

In some embodiments, before performing framing processing on the speech signal to obtain a plurality of framed signals, the method further includes: and filtering the voice signal to improve the high-frequency energy of the voice signal.

Further, the method for filtering the speech signal to improve the high-frequency energy of the speech signal specifically includes: and passing the voice signal through a high-pass filter, and outputting the filtered voice signal.

Further, after performing framing processing on the voice signal to obtain a plurality of framing signals, the method further includes: and windowing the framing signal to smooth the edges of the framing signal.

In some embodiments, the frequency point data is calculated by formula (one):

wherein x is _i (n) is the ith frame framing signal, X _i (k) Performing Fourier transform on the ith frame of framing signals to obtain kth frequency point data; n is the number of points of Fourier transform;

in some embodiments, the spectral energy of the framing signal is calculated by equation (two):

E(i,k)＝[X _i (k)] ² k =0,1,. N-1 (two);

wherein, E (i, k) is the spectral energy of the kth frequency point data.

In some embodiments, the method of calculating the speech feature of the speech signal based on all of the spectral energies comprises:

Further, the frequency spectrum energy of the frequency point data is processed through a Mel filter bank to obtain Mel filtering energy, the Mel filter bank is composed of a series of triangular filters, and H is a frequency band _m (k) A transfer function (system function) of the mel filter bank whose center frequency is f (m);

the method specifically comprises the following steps:

the transfer function of each mel filter bank is calculated according to the formula (three):

/>

wherein H _m (k) Is the transfer function of the mel filter bank; f (m) is the center frequency; m denotes the mth mel filter in the mel filter bank.

Calculating the mel-frequency cepstral coefficient mfcc (i, n) according to the formula (iv):

it should be noted that the center frequency f (m) is calculated by the formula (five):

wherein M is more than or equal to 0 and less than or equal to M, f _s Is the sampling rate in Hz; f. of _l And f _h The lowest frequency and the highest frequency of the filter bank; m is the number of the triangular filters; f denotes the actual frequency and Mel (f) denotes the Mel frequency.

In some embodiments, the method for constructing the recognition model includes:

constructing a neural network;

Specifically, the voice feature samples are mel-frequency cepstrum coefficient samples;

in some embodiments, the neural network is a convolutional neural network; the convolutional neural network comprises a first convolutional layer and a second convolutional layer;

It should be noted that, in the prior art, the convolution layer of the neural network model is usually calculated by convolution with an F × K convolution kernel; in this way, the calculation efficiency is low, and the resource occupation is large; therefore, the convolution kernel is decomposed into the F multiplied by 1 convolution kernel and the 1 multiplied by K convolution kernel, so that the calculation efficiency is improved, and the resource occupation is reduced.

Further, the training process of the neural network is as follows: and (3) taking 80% of the sample set as a training set, 10% as a verification set and 10% as a test set, adding a plurality of white noise expansion test data, adjusting the MFCC dimension and decomposing the convolution layer, and performing neural network training.

Example 2

The present embodiment provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the intelligent identification control method as described above when executing the computer program.

As shown in fig. 2, the terminal device 500 includes a Central Processing Unit (CPU) 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section into a Random Access Memory (RAM) 503. In a Random Access Memory (RAM) 503, various programs and data necessary for system operation are also stored. A Central Processing Unit (CPU) 501, a Read Only Memory (ROM) 502, and a Random Access Memory (RAM) 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to an input/output (I/O) interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drives are also connected as needed to an input/output (I/O) interface 505. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted on the storage section 508 as necessary.

In particular, according to an embodiment of the invention, the process described above with reference to the flowchart of fig. 1 may be implemented as a computer software program. For example, embodiment 1 of the invention comprises a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The above-described functions defined in the system of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 501.

Example 3

The present embodiment provides a computer-readable storage medium having a computer program, which when executed by a processor implements the intelligent recognition control method steps as described above.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software or hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves. The described units or modules may also be provided in a processor, and may be described as: a processor comprises a voice acquisition module and a voice processing module;

wherein the names of such units or modules do not in some way constitute a limitation on the unit or module itself;

as another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device is enabled to implement the intelligent recognition control method in the embodiment:

s100: acquiring a voice signal;

s200: extracting voice features of the voice signals;

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, and may also be implemented by software in combination with necessary hardware.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. An intelligent identification control method is characterized by comprising the following steps:

acquiring a voice signal;

extracting voice features of the voice signals;

2. The intelligent recognition control method of claim 1, wherein the speech features are mel-frequency cepstral coefficients.

3. The smart recognition control method of claim 2, wherein the method of extracting the voice feature of the voice signal comprises:

calculating the frequency spectrum energy of each frequency point data;

4. The intelligent recognition control method of claim 3, wherein the method for calculating the speech features of the speech signal based on all the spectral energies comprises:

5. The intelligent recognition control method according to any one of claims 1 to 4, wherein the recognition model construction method comprises:

constructing a neural network;

6. The intelligent recognition control method of claim 5, wherein the neural network is a convolutional neural network; the convolutional neural network comprises a first convolutional layer and a second convolutional layer;

7. The intelligent recognition control method according to claim 3, wherein before performing framing processing on the speech signal to obtain a plurality of framed signals, the method further comprises: and filtering the voice signal to improve the high-frequency energy of the voice signal.

8. The intelligent recognition control method according to claim 3, wherein after performing framing processing on the speech signal to obtain a plurality of framed signals, the method further comprises: and windowing the framing signal to smooth the edges of the framing signal.

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the intelligent identification control method steps according to any one of claims 1 to 8 when executing the computer program.

10. A computer-readable storage medium, having a computer program, wherein the computer program, when being executed by a processor, is adapted to carry out the steps of the smart identification control method as claimed in any one of claims 1 to 8.