CN113643692B

CN113643692B - PLC voice recognition method based on machine learning

Info

Publication number: CN113643692B
Application number: CN202110319744.XA
Authority: CN
Inventors: 侯龙潇; 李建普; 赵聪; 李晓鹏; 杨成林; 雷珊珊; 范宦潼; 白保坤; 赵贤; 谢沙沙
Original assignee: Henan Machinery Design & Research Institute Co ltd
Current assignee: Henan Machinery Design & Research Institute Co ltd
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2024-03-26
Anticipated expiration: 2041-03-25
Also published as: CN113643692A

Abstract

The invention relates to a PLC voice recognition method based on machine learning, which comprises the following steps of a, collecting voice signal samples; b, voice signal end point detection and feature extraction; c, training an HMM-GMM model; d, establishing a mapping relation between the voice instruction and the PLC register data; e, collecting voice instructions; f, carrying out endpoint detection and feature extraction on the voice instruction; g, matching the characteristics of the voice instruction with the model; h, modifying the register data by the mapping relation between the matching result and the PLC register data. The method can realize the output of signals and the modification of parameters, can accurately identify the voice command sent by an operator, namely, can realize the replacement of industrial control means such as buttons, keys and the like with voice commands more friendly to operators, so that the operators do not need to face complex operation interfaces, can realize remote operation equipment, and can add new modes and ideas for industrial control modes.

Description

PLC voice recognition method based on machine learning

Technical Field

The invention relates to the technical field of machine learning, in particular to a PLC voice recognition method based on machine learning.

Background

In traditional industrial control, an operator carries out signal input or parameter modification on a PLC by using equipment such as a button, a touch screen, a mouse and a keyboard, and outputs instructions to the outside after carrying out logic processing on the PLC so as to control the equipment.

Based on the above premise, it is important to provide a natural and convenient man-machine interaction mode.

Disclosure of Invention

Aiming at the situation, in order to overcome the defects of the prior art, the invention provides the PLC voice recognition method based on machine learning, which is used for establishing a training model after processing voice data by acquiring the voice data of instructions required by equipment, matching the acquired instruction voice after processing with the model when in use, writing a matching result into a PLC internal register, realizing the output of signals and the modification of parameters, accurately recognizing the voice instructions sent by operators, and carrying out corresponding operation according to instruction equipment.

The invention relates to a PLC voice recognition method based on machine learning, which comprises the following specific implementation steps,

a, collecting a voice signal sample;

b, voice signal end point detection and feature extraction;

c, training an HMM-GMM model;

d, establishing a mapping relation between the voice instruction and the PLC register data;

e, collecting voice instructions;

f, carrying out endpoint detection and feature extraction on the voice instruction;

g, matching the characteristics of the voice instruction with the model;

h, modifying the register data by the mapping relation between the matching result and the PLC register data.

The beneficial effects of the invention are as follows: based on machine learning, firstly, a voice signal sample is collected, voice signal endpoint detection and feature extraction are carried out, then an HMM-GMM model is trained, secondly, a mapping relation between voice instructions and PLC register data is established, finally, voice instructions are collected, endpoint detection and feature extraction are carried out on the voice instructions, the features of the voice instructions are matched with the model, a matching result is written into a PLC internal register, signal output and parameter modification are achieved, voice instructions sent by operators can be accurately identified, namely, the fact that industrial control means such as buttons and keys are replaced with voice instructions which are more friendly to operators is achieved, operators do not need to face complex operation interfaces, remote operation equipment can be achieved, and new modes and ideas are added for industrial control modes.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a flow chart of the general steps of the present invention.

Fig. 2 is a flow chart of step a of the present invention.

Fig. 3 is a waveform diagram of the voice signal end point detection in the step b of the present invention.

Fig. 4 is a corresponding diagram of D1 in step D of the present invention.

Fig. 5 is a flowchart of step h of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.

In the first embodiment, a machine learning-based PLC voice recognition method is realized by the following steps,

a, collecting a voice signal sample;

b, voice signal end point detection and feature extraction;

c, training an HMM-GMM model;

e, collecting voice instructions;

f, carrying out end point detection and feature extraction on the voice command (the voice command collection, end point detection and feature extraction modes are the same as the voice signal sample collection, end point detection and feature extraction);

g, matching the characteristics of the voice instruction with the model;

h, modifying the register data by the mapping relation between the matching result and the PLC register data, and combining the mapping relation and the voice instruction prediction result, and connecting the target PLC by using Sanp7 to finish the register data modification of the corresponding address.

In a second embodiment, based on the first embodiment, the step of collecting the voice signal sample in the step a is as follows,

a1, setting the collection times of each voice signal sample;

a2, setting a preservation path of the voice signal sample;

a3, setting the format as pyaudio.paInt16, the number of channels as 1, the sampling rate as 16000, and the recording duration of a single voice signal as 2.5s;

a4, collecting voice by using a pyaudio module;

a5, storing the collected voice signal sample by using a wave module;

a6, denoising the voice signal sample by using spectral subtraction;

and A7, circularly executing until the set acquisition times are reached.

In the third embodiment, on the basis of the first embodiment, the end point detection of the voice signal in the step b is to accurately determine the start point and the end point of the voice in a section of the signal containing the voice, and distinguish the voice section from the non-voice section. The double threshold method has three thresholds, the first two being the threshold for speech energy and the last being the threshold for speech zero-crossing rate. The energy of voiced sound is higher than unvoiced sound, the zero crossing rate of unvoiced sound is higher than that of unvoiced sound, the energy is firstly utilized to distinguish the voiced sound, then the zero crossing rate is utilized to extract unvoiced sound, the endpoint detection is completed, the specific steps are as follows,

bj1, taking a higher short-time energy as a threshold MH, and utilizing the threshold to firstly separate out a voiced sound part in voice, wherein the interval A1 to A2 is formed;

bj2, taking a lower energy threshold ML, searching from A1 and A2 to two ends by using the threshold, adding the voice part of the lower energy section into the voice section, and further expanding the voice section range, and the voice section between B1 and B2;

bj3, distinguishing consonants and silence by using a short-time zero-crossing rate, wherein the threshold value of the short-time zero-crossing rate is Zs, searching the voice section distinguished by using short-time energy to the two ends, and considering the part with the short-time zero-crossing rate larger than 3 times Zs as the unvoiced part of the voice, adding the part into the voice section, namely the obtained voice section, and the voice section is arranged between C1 and C2.

In a fourth embodiment, based on the first embodiment, the step of extracting the speech signal features in the step b is as follows,

bt1, pre-emphasis, framing and windowing are carried out on the voice;

bt2, obtaining a corresponding frequency spectrum through FFT for each short-time analysis window;

bt3, the above Frequency spectrum is passed through a Mel filter bank to obtain Mel Frequency spectrum (the human auditory system is a special nonlinear system, the sensitivity of which to respond to signals with different frequencies is different. In the aspect of extracting voice characteristics, the human auditory system is very good, it can extract not only semantic information, but also the personal characteristics of a speaker, which are all what is required by the existing voice recognition system;

bt4, carrying out cepstrum analysis on the Mel frequency spectrum, taking logarithm, carrying out inverse transformation, wherein the actual inverse transformation is generally realized through DCT discrete cosine transformation, taking the 2 nd to 13 th coefficients after DCT as MFCC coefficients, and obtaining Mel frequency cepstrum coefficient MFCC, wherein the MFCC is the characteristic of the frame of voice.

In a fifth embodiment, based on the first embodiment, the step of training the HMM-GMM model in the step c is as follows,

the method comprises the steps of C1, respectively using HMM-GMM (Hidden Markov Model: a Markov process with Hidden nodes unobservable and visible nodes, gaussianMixture Model: gaussian mixture model can be regarded as a model formed by combining K single Gaussian models, K submodels are Hidden variables Hidden variable of the mixture model, generally, any probability distribution can be used for one mixture model, and the Gaussian mixture model is used here because of good mathematical properties and good calculation performance of Gaussian distribution, and speech recognition is divided into three steps, namely, firstly, recognizing frames into states and being completed by the GMM; the third step, combine the phoneme into words, finish by HMM, can understand that the whole HMM-GMM network is actually used for HMM network service, for the problem that the speech recognition needs to solve, namely correctly recognize MFCC characteristic into corresponding HMM state, this process involves two probability to calculate, one is to recognize the characteristic of the current frame as the probability of this state, namely Likelihood in general HMM, mean vector and covariance matrix in GMM, namely GMM network is used for obtaining the probability of the current state, the second is to convert the probability of the last state into the current state, namely state transition probability, this process is that in HMM said Decoding, a sequence is converted into another sequence, there is exponential class conversion mode theoretically, so each frame only takes the highest probability of that state, such a route selection method is called Viterbi algorithm) is modeled, use 3 state modeling, wherein the emission probability of HMM uses Gaussian distribution modeling;

c2, initializing alignment, and averagely corresponding frames of the voice signal to each state;

updating model parameters, counting the times of obtaining the transition of each state, dividing the times by the total transition times to obtain the transition probability of each state, and calculating the mean vector and covariance matrix of the MFCC characteristics of the state, namely the emission probability;

c4, using a Viterbi algorithm, and carrying out state level alignment on the voice signal again according to the transition probability and the emission probability obtained in the last step;

step C5, repeating the step C2 and the step C3 until convergence;

and C6, saving the model after training.

In a sixth embodiment, based on the first embodiment, the step of establishing the mapping relationship between the voice command and the PLC register data in the step d is as follows,

d1, the data storage of the PLC is related to a storage section in the form of a Tag and is divided into an input (I), an output (O), a bit storage (M) and a Data Block (DB), and when a program accesses the corresponding (I/O) Tag, the program operates the corresponding address through Process Image Out of the access CPU, and the specific corresponding relation is shown in the following figure 4;

d2, establishing a link between a PC and a PLC register by using Snap7, wherein Snap7 is an open source library based on Ethernet and Siemens PLC communication of S7 series, and supports Ethernet communication comprising S7-200, S7-200 Smart, S7-300, S7-400, S7-1200 and S7-1500 of S7 series, wherein the communication steps are as follows: 1, instantiating snap7; setting a link port number, 2, calling an API of snap 7: connect,3, parameters require the IP address, rack number and slot number of the target PLC, 4, call API after operation is completed: disconnect disconnects the link;

d3, mapping the voice command and the data of the data register of the PLC, wherein the principle of the command operation executed by the PLC is to modify the data in the corresponding register address, and the API of the snap 7: the write_area and the client_read_area can implement writing and reading of PLC register data. The parameters need operation type address, register address, start bit and data, and the operation can finish the input and output of the I/O point;

for V and M regions, then API calls are required: the client db write and client db read perform read and write operations on the V and M variables, the parameters need register addresses, start bits and byte numbers of read data (wherein byte data is 1, words and integers are 2, double shaping and floating points are 4), and the operations can complete writing and reading of variable data;

the voice signal replaces a physical button or a key on the touch screen, data capable of realizing functions is written into a designated register address, mapping of the voice signal and the PLC register data is completed, for example, the voice signal is 'No. 1 motor start', and if the No. 1 motor is started when the output point Q0.1 is set, a client_area (0X 82,0, struct. Unpack ('B', 2)) statement is corresponding to the voice signal in a program.

In the seventh embodiment, based on the first embodiment, the step g of matching the features of the voice command with the models, importing a model set built by each phoneme of the HMM-GMM voice signal sample, matching the features of the voice command with each model of the model set, obtaining the voice sample with the highest matching rate, specifically comprises the following steps,

g1, importing a model group after training;

g2, creating a prediction score list;

g3, matching the input voice with each model of the model group;

g4, calculating a matching score and storing the matching score into a prediction score list;

g5, screening out the highest-scoring model;

and G6, outputting a voice signal mark corresponding to the model.

When the invention is specifically used, based on machine learning and training of a voice instruction model, the PC end program is connected with the PLC, industrial control means such as buttons, keys and the like are replaced by voice instructions which are more friendly to operators, so that the operators do not need to face a complex operation interface, remote operation equipment can be realized, new modes and ideas are added for the industrial control mode, the specific implementation steps are as follows,

a, collecting a voice signal sample;

b, voice signal end point detection and feature extraction;

c, training an HMM-GMM model;

e, collecting voice instructions;

g, matching the characteristics of the voice instruction with the model;

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. The PLC voice recognition method based on machine learning is characterized by comprising the following specific implementation steps,

a, collecting a voice signal sample;

b, voice signal end point detection and feature extraction;

c, training an HMM-GMM model;

d, establishing a mapping relation between the voice instruction and the PLC register data; the step of establishing the mapping relation comprises the following steps: d1, the data storage of the PLC is related to a storage interval in a Tag form and is divided into an input (I), an output (O), a bit storage (M) and a Data Block (DB); d2, establishing a link between the PC and the PLC register by using the snap7; d3, establishing mapping between the voice command and the PLC data register data based on the link;

e, collecting voice instructions;

g, matching the characteristics of the voice instruction with the model;

2. The machine learning based PLC speech recognition method of claim 1, wherein the step of collecting the speech signal samples in step a is as follows,

a1, setting the collection times of each voice signal sample;

a2, setting a preservation path of the voice signal sample;

a4, collecting voice by using a pyaudio module;

a5, storing the collected voice signal sample by using a wave module;

a6, denoising the voice signal sample by using spectral subtraction;

and A7, circularly executing until the set acquisition times are reached.

3. The machine learning based PLC speech recognition method of claim 1, wherein the step of speech signal endpoint detection in step b is as follows,

4. The machine learning based PLC speech recognition method of claim 1, wherein the step of extracting speech signal features in step b is as follows,

bt1, pre-emphasis, framing and windowing are carried out on the voice;

bt3, the above frequency spectrum is passed through a Mel filter bank to obtain Mel frequency spectrum;

5. The machine learning based PLC speech recognition method of claim 1, wherein the step c of training the HMM-GMM model is as follows,

c1, modeling phonemes of a voice signal by using an HMM-GMM respectively and using a 3-state modeling, wherein the emission probability of the HMM is modeled by using a Gaussian distribution function;

step C5, repeating the step C2 and the step C3 until convergence;

and C6, saving the model after training.

6. The method for recognizing PLC speech based on machine learning according to claim 1, wherein the step of matching the feature of the speech instruction with the model in the step g is as follows,

g1, importing a model group after training;

g2, creating a prediction score list;

g3, matching the input voice with each model of the model group;

g5, screening out the highest-scoring model;

and G6, outputting a voice signal mark corresponding to the model.