CN113643692A

CN113643692A - PLC voice recognition method based on machine learning

Info

Publication number: CN113643692A
Application number: CN202110319744.XA
Authority: CN
Inventors: 侯龙潇; 李建普; 赵聪; 李晓鹏; 杨成林; 雷珊珊; 范宦潼; 白保坤; 赵贤; 谢沙沙
Original assignee: Henan Machinery Design & Research Institute Co ltd
Current assignee: Henan Machinery Design & Research Institute Co ltd
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2021-11-12
Anticipated expiration: 2041-03-25
Also published as: CN113643692B

Abstract

The invention relates to a PLC voice recognition method based on machine learning, which comprises the following steps that a, voice signal samples are collected; b, voice signal endpoint detection and feature extraction; c, training an HMM-GMM model; d, establishing a mapping relation between the voice command and the PLC register data; e, collecting voice instructions; f, carrying out endpoint detection and feature extraction on the voice instruction; matching the characteristics of the voice instruction with the model; and h, modifying the register data according to the matching result and the mapping relation of the PLC register data. The method can realize the output of signals and the modification of parameters, can accurately identify voice instructions sent by operators, and also can replace industrial control means such as buttons, keys and the like with voice instructions which are more friendly to the operators, so that the operators do not need to face complex operation interfaces, and can realize remote operation equipment, thereby adding new modes and ideas for industrial control modes.

Description

PLC voice recognition method based on machine learning

Technical Field

The invention relates to the technical field of machine learning, in particular to a PLC voice recognition method based on machine learning.

Background

In the traditional industrial control, an operator inputs signals or modifies parameters to a PLC (programmable logic controller) by using equipment such as a button, a touch screen, a mouse and a keyboard, and outputs instructions to the outside after logic processing is carried out by the PLC, so that the equipment is controlled.

Based on the above premises, it is very important to provide a natural and convenient human-computer interaction mode.

Disclosure of Invention

Aiming at the situation and overcoming the defects of the prior art, the invention provides a PLC voice recognition method based on machine learning, which comprises the steps of acquiring voice data of a command required by equipment, establishing a training model after processing the voice data, matching the acquired command with the model after processing the voice data, writing a matching result into a PLC internal register to realize signal output and parameter modification, accurately recognizing a voice command sent by an operator, and carrying out corresponding operation according to command equipment.

The invention relates to a PLC voice recognition method based on machine learning, which comprises the following steps,

a, collecting a voice signal sample;

b, voice signal endpoint detection and feature extraction;

c, training an HMM-GMM model;

d, establishing a mapping relation between the voice command and the PLC register data;

e, collecting voice instructions;

f, carrying out endpoint detection and feature extraction on the voice instruction;

matching the characteristics of the voice instruction with the model;

and h, modifying the register data according to the matching result and the mapping relation of the PLC register data.

The invention has the beneficial effects that: based on machine learning, firstly, voice signal samples are collected, voice signal end point detection and feature extraction are carried out, then an HMM-GMM model is trained, then a mapping relation between a voice instruction and PLC register data is established, finally, the voice instruction is collected, end point detection and feature extraction are carried out on the voice instruction, the feature of the voice instruction is matched with the model, a matching result is written into a PLC internal register, signal output and parameter modification are achieved, the voice instruction sent by an operator can be accurately identified, namely, the fact that industrial control means such as buttons and keys are replaced by voice instructions which are friendly to the operator is achieved, the operator does not need to face a complex operation interface, meanwhile, remote operation equipment can be achieved, and new modes and ideas are added for industrial control modes.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of the general steps of the present invention.

FIG. 2 is a flow chart of step a of the present invention.

FIG. 3 is a waveform diagram of voice signal endpoint detection in step b according to the present invention.

FIG. 4 is a diagram corresponding to D1 in step D of the present invention.

FIG. 5 is a flow chart of step h of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

The embodiment one, a PLC voice recognition method based on machine learning, the concrete implementation steps are as follows,

a, collecting a voice signal sample;

b, voice signal endpoint detection and feature extraction;

c, training an HMM-GMM model;

e, collecting voice instructions;

f, carrying out endpoint detection and feature extraction on the voice instruction (the voice instruction acquisition, endpoint detection and feature extraction modes are the same as those of a voice signal sample);

matching the characteristics of the voice instruction with the model;

and h, modifying the register data according to the matching result through the mapping relation between the matching result and the PLC register data, combining the mapping relation and the voice instruction prediction result, and connecting the destination PLC by using the Sanp7 to complete the modification of the register data of the corresponding address.

In the second embodiment, on the basis of the first embodiment, the step of collecting the voice signal sample in the step a is as follows,

a1, setting the collection times of each voice signal sample;

a2, setting a saving path of the voice signal sample;

a3, setting the format as pyaudio. paInt16, the number of sound channels as 1, the sampling rate as 16000, and the recording duration of a single voice signal as 2.5 s;

a4, collecting voice by using a pyaudio module;

a5, storing the collected voice signal sample by using a wave module;

a6, denoising the voice signal samples by using spectral subtraction;

and A7, executing a loop until the set collection times are reached.

In the third embodiment, on the basis of the first embodiment, the voice signal endpoint detection in the step b is to accurately determine the starting point and the ending point of the voice in a section of signal containing the voice, and distinguish the voice section from the non-voice section. The dual threshold method has three thresholds, the first two being thresholds for speech energy and the last being a threshold for speech zero crossing rate. The energy of voiced sound is higher than that of unvoiced sound, the zero crossing rate of unvoiced sound is higher than that of unvoiced sound, the voiced sound is distinguished by using the energy, then the unvoiced sound is extracted by using the zero crossing rate, the end point detection is completed, the specific steps are as follows,

bj1, taking a higher short-time energy as a threshold MH, and using the threshold to firstly separate out a voiced part in the voice, wherein the interval from A1 to A2 is an interval;

bj2, a lower energy threshold ML is selected, the threshold is utilized to search from A1 and A2 to two ends, the voice part of the lower energy section is also added into the voice section, the range of the voice section is further expanded, and the voice section is still between B1 and B2;

bj3, using short-time zero-crossing rate to distinguish consonants and silence, the threshold value of the short-time zero-crossing rate is Zs, continuously searching the speech segment which is distinguished by using short-time energy to both ends, if the short-time zero-crossing rate is greater than 3 times Zs, the speech segment is regarded as the unvoiced part of speech, and the unvoiced part is added into the speech segment, namely the obtained speech segment, and the speech segment is between C1 and C2.

In a fourth embodiment, on the basis of the first embodiment, the step of extracting the speech signal feature in the step b is as follows,

bt1, pre-emphasis, framing and windowing the speech;

bt2, obtaining a corresponding frequency spectrum for each short-time analysis window through FFT;

bt3, getting Mel Frequency spectrum from above Frequency spectrum through Mel filter bank (human auditory system is a special nonlinear system, which responds to different Frequency signal with different sensitivity. in speech feature extraction, human auditory system is very good, it can not only extract semantic information, but also extract personal feature of speaker, which are all the best that existing speech recognition system can look at;

bt4, performing cepstrum analysis on Mel frequency spectrum, taking logarithm, and performing inverse transformation, wherein the actual inverse transformation is generally realized by DCT discrete cosine transformation, taking the 2 nd to 13 th coefficients after DCT as MFCC coefficients, and obtaining Mel frequency cepstrum coefficient MFCC, which is the feature of the frame of voice.

Example five, based on the first example, the step c of training the HMM-GMM model is as follows,

c1, using HMM-GMM (Hidden Markov Model: a Markov process with Hidden node and visible node) for the phonemes of the speech signal, respectively, Gaussian Mixture Model: Gaussian Mixture Model can be regarded as a Model formed by combining K single Gaussian models, the K sub-models are Hidden variables of the Mixture Model, generally speaking, a Mixture Model can use any probability distribution, the Gaussian Mixture Model is used because the Gaussian distribution has good mathematical property and good calculation performance, the speech recognition is divided into three steps, namely, the first step is used for recognizing the frame into states and is completed by the GMM, the second step is used for combining the states into phonemes and is completed by the HMM, the third step is used for combining the phonemes into words and is completed by the HMM, the whole HMM-GMM network is understood to be mainly used for HMM network services, and the problems to be solved for the speech recognition, firstly, the probability of recognizing the feature of the current frame as the state, namely the Likelihood in the common HMM, is also the mean vector and the covariance matrix in the GMM, namely the GMM network is used for obtaining the probability of the current state; secondly, the probability of the last state being converted into the current state is the state transition probability, the process is Decoding in an HMM, one sequence is converted into the other sequence, and an exponential conversion mode is theoretically provided, so that each frame only takes the state with the highest probability, the route selection method is called as a Viterbi algorithm) for modeling, 3-state modeling is used, and the emission probability of the HMM is modeled by a Gaussian distribution function;

c2, initializing alignment, and corresponding the frame average of the voice signal to each state;

c3, updating model parameters, counting the times of transition of each state, dividing the times by the total times of transition to obtain the transition probability of each state, and calculating the mean vector and covariance matrix of MFCC characteristics of the states, namely the emission probability;

c4, using Viterbi algorithm, according to the transition probability and emission probability obtained in the last step, re-aligning the state level of the speech signal;

c5, repeating the step C2 and the step C3 until convergence;

and C6, saving the trained model.

Sixth, based on the first embodiment, the step of establishing the mapping relationship between the voice command and the PLC register data in step d is as follows,

d1, the data storage of PLC is related to the storage section by the form of Tag, and is divided into input (I), output (O), bit storage (M) and Data Block (DB), when the program accesses the corresponding (I/O) Tag, the program operates the corresponding address by accessing the Process Image Out of CPU, the concrete corresponding relation is as shown in the following FIG. 4;

d2, establishing a link between the PC and the PLC register by using Snap7, wherein Snap7 is an open source library based on Ethernet communication with Siemens PLC of S7 series, and supports Ethernet communication comprising S7-200 of S7 series, S7-200 Smart, S7-300, S7-400, S7-1200 and S7-1500, and the communication step is as follows: 1, instantiate snap 7; set link port number, 2, call API of snap 7: connect,3, the parameters need the IP address, rack number and slot number of the target PLC, 4, call API after the operation is completed: disconnect;

d3, mapping between voice command and PLC data register data, the principle of command operation executed by PLC is to modify data in corresponding register address, through API of snap 7: write _ area and read _ area can implement writing and reading of PLC register data. The parameters need operation type address, register address, start bit and data, and the operation can complete the input and output of I/O point;

for V-zone and M-zone, the API needs to be called: db _ write and client db _ read perform read-write operation on V and M variables, the parameters require register addresses, start bits and the number of bytes of read data (where byte data is 1, word and integer are 2, and double shaping and floating point are 4), and this operation can complete the write-in and read of variable data;

the voice signal replaces a physical button or a key on the touch screen, data capable of realizing the function is written into the designated register address, and mapping between the voice signal and the PLC register data is completed, for example, if the voice signal is "motor start No. 1", and if the motor start No. 1 is assumed when the output point Q0.1 is set, a client.write _ area (0X82,0,0, struck.unpack ('B',2)) statement is corresponded in the program.

Seventh embodiment, based on the first embodiment, the step g of matching the features of the voice command with the models, importing the model group created by each phoneme of the HMM-GMM voice signal sample, and matching the features of the voice command with each model of the model group to obtain the voice sample with the highest matching rate includes the following specific steps,

g1, importing the trained model group;

g2, creating a prediction score list;

g3, matching the input speech with each model of the model group;

g4, calculating a matching score and storing the matching score in a prediction score list;

g5, selecting the model with the highest score;

g6, outputting the voice signal mark corresponding to the model.

When the invention is used specifically, based on machine learning, a voice instruction model is trained, a PLC is connected through a PC end program, industrial control means such as buttons and keys are replaced by voice instructions which are more friendly to operators, so that the operators do not need to face a complex operation interface, meanwhile, remote operation equipment can be realized, a new mode and thought are added for an industrial control mode, and the method comprises the following specific implementation steps,

a, collecting a voice signal sample;

b, voice signal endpoint detection and feature extraction;

c, training an HMM-GMM model;

e, collecting voice instructions;

matching the characteristics of the voice instruction with the model;

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. The PLC voice recognition method based on machine learning is characterized by comprising the following concrete implementation steps,

a, collecting a voice signal sample;

b, voice signal endpoint detection and feature extraction;

c, training an HMM-GMM model;

e, collecting voice instructions;

matching the characteristics of the voice instruction with the model;

2. The PLC speech recognition method based on machine learning of claim 1, wherein the step of collecting speech signal samples in step a is as follows,

a1, setting the collection times of each voice signal sample;

a2, setting a saving path of the voice signal sample;

a4, collecting voice by using a pyaudio module;

a5, storing the collected voice signal sample by using a wave module;

a6, denoising the voice signal samples by using spectral subtraction;

and A7, executing a loop until the set collection times are reached.

3. The PLC voice recognition method based on machine learning of claim 1, wherein the voice signal end point detection in step b is as follows,

4. The PLC speech recognition method based on machine learning of claim 1, wherein the step of speech signal feature extraction in the step b is as follows,

bt1, pre-emphasis, framing and windowing the speech;

bt3, passing the spectrum through a Mel filter bank to obtain a Mel spectrum;

5. The PLC speech recognition method based on machine learning of claim 1, wherein the step of training the HMM-GMM model of step c is as follows,

c1, modeling phonemes of the speech signal by using HMM-GMM and by using 3-state modeling, respectively, wherein the emission probability of the HMM is modeled by using a Gaussian distribution function;

c5, repeating the step C2 and the step C3 until convergence;

and C6, saving the trained model.

6. The PLC voice recognition method based on machine learning of claim 1, wherein the step of establishing the mapping relationship between the voice command and the PLC register data in the step d is as follows,

d1, the data storage of the PLC is related to the storage interval in the form of Tag and is divided into an input (I), an output (O), a bit storage (M) and a Data Block (DB);

d2, establishing the link between the PC and the PLC register by using snap 7;

d3, mapping of voice command and PLC data register data, the principle of command operation performed by PLC is to modify data in corresponding register address.

7. The PLC speech recognition method based on machine learning of claim 1, wherein the step of matching the features of the speech command with the model in step g is as follows,

g1, importing the trained model group;

g2, creating a prediction score list;

g3, matching the input speech with each model of the model group;

g5, selecting the model with the highest score;

g6, outputting the voice signal mark corresponding to the model.