CN115294966A - Nuclear power plant voice recognition training method, intelligent voice control method and system - Google Patents

Nuclear power plant voice recognition training method, intelligent voice control method and system Download PDF

Info

Publication number
CN115294966A
CN115294966A CN202210936148.0A CN202210936148A CN115294966A CN 115294966 A CN115294966 A CN 115294966A CN 202210936148 A CN202210936148 A CN 202210936148A CN 115294966 A CN115294966 A CN 115294966A
Authority
CN
China
Prior art keywords
nuclear power
power plant
voice
control instruction
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210936148.0A
Other languages
Chinese (zh)
Inventor
王志敏行
陈日罡
田晖
邓士光
段鹏飞
徐云龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Nuclear Power Engineering Co Ltd
Original Assignee
China Nuclear Power Engineering Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Nuclear Power Engineering Co Ltd filed Critical China Nuclear Power Engineering Co Ltd
Priority to CN202210936148.0A priority Critical patent/CN115294966A/en
Publication of CN115294966A publication Critical patent/CN115294966A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention discloses a nuclear power plant voice recognition training method, an intelligent voice control method and a system, and relates to the technical field of nuclear power intelligent control, wherein the method comprises the following steps: performing characteristic classification extraction on a pre-established text file comprising a plurality of control instructions of a nuclear power plant to establish a special dictionary for the nuclear power plant and form a regular expression for performing characteristic limitation on words in the special dictionary; and performing voice recognition training on a plurality of pre-collected audio files comprising each control instruction, and limiting a voice recognition result to be composed of words in the special dictionary with the regular expression in the training process so as to establish a control instruction voice recognition model for the nuclear power plant. The method limits the training recognition result by adopting the regular expression and the special dictionary, improves the speech recognition training efficiency aiming at the nuclear power plant, and further increases the accuracy of speech recognition of the control instruction of the nuclear power plant.

Description

Nuclear power plant voice recognition training method, intelligent voice control method and system
Technical Field
The invention belongs to the technical field of nuclear power intelligent control, and particularly relates to a nuclear power plant voice recognition training method, an intelligent voice control method and an intelligent voice control system.
Background
Due to the particularities of nuclear power plants, nuclear safety is a vital part. As the core of a nuclear power plant, the operation of a master control room operator is complex, the workload is large, and after long-term work, due to the reduction of the efficiency of personnel, certain potential safety hazards exist in consideration of human factors. The voice control is used for replacing the traditional keyboard and mouse operation, so that the efficiency of personnel can be improved, the workload is reduced, and the digital operation level is improved.
The existing voice recognition training method based on MFCC and neural network has the advantages of low learning cost and high calculation efficiency. However, the existing training scheme has problems in application in nuclear power:
the nuclear power plant requires extremely high sentence recognition accuracy due to the particularity of the nuclear power plant, but a model obtained by a commonly used speech recognition training method does not meet the use requirement of the nuclear power plant under the limitation of off-line and Chinese-English mixing, and meanwhile, a nuclear power content corpus is insufficient, nuclear power control instructions have strict format requirements, but the logicality between words is extremely low, and the speech recognition accuracy is also low.
Disclosure of Invention
The invention aims to solve the technical problems of the prior art, and provides a nuclear power plant voice recognition model establishing method, an intelligent voice control method and a system, so as to solve the problems that when the existing voice recognition training method is applied to the voice recognition of a nuclear power plant control instruction, the voice recognition accuracy is low, and the use requirement of a nuclear power plant cannot be met.
In order to solve the technical problem, the invention adopts the following technical scheme:
in a first aspect of the present invention, a nuclear power plant speech recognition training method is provided, including:
performing feature classification extraction on a pre-established text file comprising a plurality of control instructions of the nuclear power plant to establish a special dictionary for the nuclear power plant and form a regular expression for performing feature limitation on words in the special dictionary;
and performing voice recognition training on a plurality of pre-collected audio files comprising each control instruction, and limiting a voice recognition result to be composed of words in the special dictionary with the regular expression in the training process so as to establish a control instruction voice recognition model for the nuclear power plant.
Preferably, the text file includes:
the point number of the controlled equipment of the nuclear power plant reactor type is indicated by a plurality of control command actions including Chinese expression and a plurality of control commands including English and number expression.
Preferably, the method includes performing feature classification extraction on a pre-established text file including multiple control instructions of the nuclear power plant to establish a special dictionary for the nuclear power plant, and forming a regular expression for performing feature limitation on words in the special dictionary, and specifically includes:
extracting Chinese, english and numbers which repeatedly appear for many times in the text file as Chinese keywords, english keywords and number keywords so as to establish a special dictionary for the nuclear power plant;
and forming a regular expression for performing characteristic limitation on the context of the Chinese keywords and performing characteristic limitation on the digit number and the attribute of each digit of the English keywords and the digit keywords.
Preferably, the pre-collected multiple audio files including each control instruction are specifically:
and aiming at each control instruction corresponding to the text file, a plurality of audio files with different timbres are respectively collected in advance.
Preferably, the method for establishing the voice recognition model of the control instruction for the nuclear power plant comprises the steps of performing voice recognition training on a plurality of pre-collected audio files including each control instruction, and limiting a voice recognition result to be composed of words in the special dictionary with the regular expression in the training process, wherein the steps of:
extracting voice features from the audio file based on a Mel Frequency Cepstrum Coefficient (MFCC) algorithm;
performing voice recognition training on the extracted voice features based on a bidirectional long-short term memory network Bi-LSTM;
limiting a voice recognition result to be composed of words in the special dictionary with the regular expression in the training process, and verifying the voice recognition result of each audio file by using the text file content corresponding to the control instruction corresponding to each audio file so as to adjust the weight configuration in the Bi-LSTM;
and after training is finished, obtaining a Bi-LSTM model with weight configuration meeting a preset convergence condition, and combining the MFCC algorithm, the Bi-LSTM model with weight configuration meeting the preset convergence condition, the special dictionary and the regular expression to serve as a control instruction voice recognition model for the nuclear power plant.
Preferably, the extracting the voice feature of the audio file based on the mel-frequency cepstrum coefficient MFCC algorithm specifically includes:
pre-emphasis, framing, windowing and fast Fourier transform processing are carried out on the audio file;
filtering the processed audio file by adopting a group of Mel-scale triangular filter groups consisting of 23 filters;
and carrying out logarithm operation on the filtered audio file to finish the voice feature extraction of the audio file.
The invention provides an intelligent voice control method for a nuclear power plant, which comprises the following steps:
receiving control instruction voice sent by an operator;
identifying the control instruction voice by using a control instruction voice identification model aiming at the nuclear power plant established by the nuclear power plant voice identification training method so as to obtain an identification result of the control instruction voice;
outputting the recognition result to an operator, and receiving a secondary instruction formed by confirming or modifying the recognition result by the operator;
and executing the control of the nuclear power plant according to the secondary instruction.
Preferably, outputting the recognition result to an operator specifically includes:
displaying the recognition result in characters on an interactive interface operated by an operator;
and broadcasting the recognition result to an operator through machine voice.
In a third aspect of the present invention, a nuclear power plant speech recognition training system is provided, including:
the system comprises a text module, a characteristic classification extraction module and a characteristic restriction module, wherein the text module is used for carrying out characteristic classification extraction on a pre-established text file comprising a plurality of control instructions of the nuclear power plant so as to establish a special dictionary for the nuclear power plant and form a regular expression for carrying out characteristic restriction on words in the special dictionary;
and the audio module is connected with the text module and used for carrying out voice recognition training on various pre-collected audio files comprising each control instruction, and limiting a voice recognition result to be composed of words in the special dictionary with the regular expression in the training process so as to establish a control instruction voice recognition model aiming at the nuclear power plant.
In a fourth aspect of the present invention, an intelligent voice control system for a nuclear power plant is provided, which includes:
the receiving module is used for receiving control instruction voice sent by an operator;
the recognition module is connected with the receiving module and used for recognizing the control instruction voice by using the control instruction voice recognition model aiming at the nuclear power plant established by the nuclear power plant voice recognition training method so as to obtain a recognition result of the control instruction voice;
the output module is connected with the identification module and used for outputting the identification result to an operator and receiving a secondary instruction formed by confirming or modifying the identification result by the operator;
and the control module is connected with the output module and used for executing the control of the nuclear power plant according to the secondary instruction.
According to the nuclear power plant voice recognition training method, the intelligent voice control method and the system, the special dictionary for the nuclear power plant and the regular expression for the vocabularies in the special dictionary are established, when the voice recognition training is carried out on the audio file of the control command of the nuclear power plant, the regular expression and the special dictionary are adopted to limit the training recognition result, the voice recognition training efficiency for the nuclear power plant is improved, the accuracy of the voice recognition on the control command of the nuclear power plant is increased, the obtained voice recognition model can meet the use requirement of the nuclear power plant, after the method and the system are applied to the intelligent control of the nuclear power plant, the digital operation level of the nuclear power plant is improved, the work load of a nuclear power plant operator is reduced, and the operation safety of the nuclear power plant is improved.
Drawings
FIG. 1 is a flow chart of a nuclear power plant speech recognition training method in an embodiment of the present invention;
FIG. 2 is a comparison diagram of the time required for recognition in the speech recognition training method for a nuclear power plant according to the embodiment of the present invention;
FIG. 3 is a comparison graph of recognition accuracy of the nuclear power plant speech recognition training method in the embodiment of the present invention;
FIG. 4 is a flow chart of an intelligent voice control method for a nuclear power plant according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a nuclear power plant speech recognition training system in an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an intelligent voice control system of a nuclear power plant in an embodiment of the present invention.
Detailed Description
The technical solutions in the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the scope of the present invention.
In the description of the present invention, it should be noted that the indication of orientation or positional relationship, such as "on" or the like, is based on the orientation or positional relationship shown in the drawings, and is only for convenience and simplicity of description, and does not indicate or imply that the device or element referred to must be provided with a specific orientation, constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "connected," "disposed," "mounted," "fixed," and the like are to be construed broadly, e.g., as being fixedly or removably connected, or integrally connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in a specific case to those skilled in the art.
In the description of the present invention, each unit and module referred to may correspond to only one physical structure, or may be composed of multiple physical structures, or multiple units and modules may be integrated into one physical structure; the units and modules may be implemented by software or hardware, for example, the units and modules may be located in a processor.
In the description of the present invention, functions, steps, etc., which are noted in the flowcharts and block diagrams of the present invention may occur in different orders from those noted in the drawings without conflict.
Example 1:
as shown in fig. 1, an embodiment 1 of the present invention provides a nuclear power plant speech recognition training method, including:
s01, carrying out feature classification extraction on pre-established text files containing various control instructions of the nuclear power plant to establish a special dictionary for the nuclear power plant and form a regular expression for carrying out feature limitation on words in the special dictionary.
Specifically, an existing speech recognition model is obtained by training a large number of audio files in a corpus, in order to obtain a better recognition result, a mode of connecting to the internet is usually adopted to improve a training effect, since nuclear power plant equipment is forbidden to connect to an external network, the training of the speech recognition model can only be performed in an offline state, the equipment point number of a nuclear power plant heap type has uniqueness, and training materials for the speech recognition of the nuclear power plant are relatively lacked, which results in that when the control and operation of the nuclear power plant point by using speech are attempted, if a traditional speech recognition training mode is used, the accuracy of the speech recognition cannot meet the safety requirement, therefore, in the embodiment, a text file including a plurality of control instructions of the nuclear power plant is established in advance for the nuclear power plant heap type, keywords commonly used in the control instructions are extracted according to the text file and stored in a special dictionary in a form of vocabularies, and meanwhile, since the frequently used keywords of the nuclear power plant have fixed characteristics, regular expressions are also established to perform characteristic limitation on the vocabularies in the special dictionary, regular expressions are also obtained according to characteristics of the vocabularies appearing in the text file, and personnel working in the nuclear power plant can determine the text file, such as the text range of the text of the process of the text file, and the text of the text file.
Optionally, the text file includes:
the point number of the controlled equipment of the nuclear power plant reactor type is indicated by various control instructions including Chinese.
Optionally, the method includes performing feature classification and extraction on a pre-established text file including multiple control instructions of the nuclear power plant to establish a special dictionary for the nuclear power plant, and forming a regular expression for performing feature limitation on words in the special dictionary, and specifically includes:
extracting Chinese, english and numbers which repeatedly appear for many times in the text file as Chinese keywords, english keywords and number keywords so as to establish a special dictionary for the nuclear power plant;
and forming a regular expression for performing characteristic limitation on the context of the Chinese keywords and performing characteristic limitation on the digit number and the attribute of each digit of the English keywords and the digit keywords.
Specifically, in this embodiment, because the control instruction of the nuclear power plant is mainly used to control the controlled device to perform a certain operation, the controlled device of the nuclear power plant is named in a point number manner, the point number is generally a specific combination of english and numbers, the device point numbers of different heap types have uniqueness, and the device point numbers of the same heap type have high similarity, so that a text file for a specific heap type is established by checking the heap type system code of the nuclear power plant to be processed and analyzing according to the heap type, and thus the established regular expression of the special dictionary is also for the specific heap type, exemplarily, a hualong heap type is used, and the device point number coding format is a combination of 3-digit letters, 3-digit numbers, and 2-digit letters according to the requirement of the heap type.
Therefore, in the embodiment, a text file containing a large number of control commands for the warong heap type is first established, the content of the text file is the text content of the control commands, because in the tested nuclear power plant heap type, the dot position number of the device to be operated is a combination mode of three english letters, three digits and two english letters, a regular expression is set to perform characteristic limitation on the digit number of the english keywords and the numeric keywords and the attribute of each digit, for example, using [ a-Z ] {3} to define 3 english letters, using [0-9] {3} to locate 3 arabic digits, using [ a-Z ] {2} to locate 2 english letters, such as "RCV001VP", and in the control commands, a plurality of possible chinese actions and digits are added at the top and at the last of the device, such as "open RCV082VP", "adjust RCV001VP opening 100", so that, when designing the content of the dictionary, chinese keywords (such as adjustment, opening and closing) repeatedly appearing for multiple times in the content of a text file, english keywords (such as combination of English letters used in point number codes such as RCV, VP and the like, and do not need to correspond to English words), and digital keywords (such as Arabic numerals used in point number codes such as 001, 082 and the like) are added into a dictionary, 100 in the opening 100 can be other numerical values according to actual operation control requirements, repeated appearance can be intentionally avoided when the text file is established, so that the words added into the dictionary are generally the keywords in the nuclear power plant pile type equipment point number or the more commonly used nuclear power plant operation instruction action, and the Chinese keywords representing the nuclear power plant operation instruction action are subjected to characteristic limitation on the context connected by using a regular expression, for example, the object for limiting actions such as adjustment, opening and closing is an equipment point number, the equipment point number is a combination of 3-digit letters, 3-digit numbers and 2-digit letters, chinese, english and digit combination characters related to nuclear power equipment are added to form a special dictionary for a nuclear power plant, and the special vocabularies are limited by using a regular expression, so that targeted searching can be performed during recognition training, and compared with a traditional speech recognition system in which G2P (graph-to-phone) Grapheme is used as a Phoneme, the system is more suitable for Chinese and English combination parts required by a main control room of the nuclear power plant, and has higher sentence recognition accuracy for combined sentences of Chinese, english letters and digits.
S02, performing voice recognition training on a plurality of pre-collected audio files including each control instruction, and limiting a voice recognition result to be composed of words in the special dictionary with the regular expression in the training process so as to establish a control instruction voice recognition model for the nuclear power plant.
Specifically, in this embodiment, a control instruction speech recognition model is obtained by training a plurality of pre-collected audio files including each control instruction, and since the speech recognition results have a plurality of combination possibilities in the training process, the speech recognition results are limited by using words in a special dictionary with regular expressions, so that the accuracy of the speech recognition model is improved, the training speed can be increased, and the control instruction speech recognition model more specific to the nuclear power plant can be obtained.
Optionally, the pre-collected multiple audio files including each control instruction specifically include:
and aiming at each control instruction corresponding to the text file, a plurality of audio files with different timbres are respectively collected in advance.
Optionally, performing speech recognition training on a plurality of pre-collected audio files including each control instruction, and limiting a speech recognition result to be composed of words in the special dictionary with the regular expression in the training process, so as to establish a speech recognition model of the control instruction for the nuclear power plant, specifically including:
extracting voice features from the audio file based on a Mel Frequency Cepstrum Coefficient (MFCC) algorithm;
performing voice recognition training on the extracted voice features based on a bidirectional long-short term memory network Bi-LSTM;
limiting a voice recognition result to be composed of words in the special dictionary with the regular expression in the training process, and verifying the voice recognition result of each audio file by utilizing the text file content corresponding to the control instruction corresponding to each audio file so as to adjust the weight configuration in the Bi-LSTM;
and after training is finished, obtaining a Bi-LSTM model with weight configuration meeting a preset convergence condition, and combining the MFCC algorithm, the Bi-LSTM model with weight configuration meeting the preset convergence condition, the special dictionary and the regular expression to serve as a control instruction voice recognition model for the nuclear power plant.
Optionally, extracting a speech feature from the audio file based on a mel-frequency cepstrum coefficient MFCC algorithm specifically includes:
pre-emphasis, framing, windowing and fast Fourier transform processing are carried out on the audio file;
filtering the processed audio file by adopting a group of Mel-scale triangular filter groups consisting of 23 filters;
and carrying out logarithm operation on the filtered audio file to finish the voice feature extraction of the audio file.
Specifically, in this embodiment, a nuclear power plant dedicated corpus is formed by recording a large number of audio files, formats of code rates, sound channels and the like of all the audio files need to be strictly consistent to ensure a training effect, the audio is consistent with the content of each corresponding control instruction in the text file, and for each control instruction, a plurality of audio files containing different timbres of male voice, female voice, young and middle-aged are recorded, for example, in different audio files, "open RCV082VP" uttered by a plurality of recorders is included, and the trained model matches the utterances of all the different recorders to the content.
In the embodiment, when the speech model training is carried out, the special dictionary configured by the regular expression is added to limit the recognition result, and if the result does not meet the requirement in the dictionary, the machine automatically gives up the recognition result, so that when the machine configures the model matching weight, more constraints are provided, and the model can be converged more quickly. In the existing speech recognition training process, because the search volume of the recognition result is very large, although the recognition result can also be limited based on the existing dictionary containing daily vocabulary, when background noise and speech features are not obvious, relatively obvious misjudgment can be caused, and the number of optional contents can be effectively reduced by adding the special dictionary configured by the regular expression, so that some common error options can be eliminated, the embodiment can also adopt the dictionary containing the daily vocabulary to combine with the special dictionary configured by adding the regular expression to limit the recognition result, for example, after 3 times of number recognition, the recognition result is more inclined to the recognition of letters when recognition is carried out again, for example, when the recognition result is 'yi (one sound'), the recognition result is more inclined to the pronunciation of an english letter e instead of the pronunciation of a number 1; if the device is identified as being opened, the selectable options comprise 3, and if the regular expression is added, if the device point position number which is identified as being opened and can only be formed by numbers and letters is specified, the selectable options only comprise the RCV082VP and the like.
The embodiment designs a voice training scheme based on Mel-scale Frequency Cepstral Coefficients (MFCC) and a Bidirectional Long-Short Term Memory network (Bi-LSTM), and a complete nuclear power plant voice recognition training system can be established based on the scheme.
In the embodiment, on the basis of a traditional MFCC speech training system, speech signals are processed through modules such as pre-emphasis, framing, windowing, fast Fourier Transform (FFT) processing and the like; pre-emphasis selecting a high-pass filter; the frame is divided into frames by adopting an overlapping and segmenting method, so that the frames are smoothly transited, the continuity of the frames is kept, and zero filling operation is carried out if an unstable frequency spectrum occurs; windowing is to calculate each short segment and change the corresponding element in each frame into the product of the corresponding element and the window sequence; the fast fourier transform decomposes the signal into two sub-signals: odd and even signals, and summing the two sub-signals; a group of Mel-scale triangular filter groups with 23 filters is selected to filter the processed voice signals, triangular filtering can simulate the characteristic of high resolution of human ears at low frequency, fine structures can be eliminated by integrating in each triangle, only tone information is reserved, data size can be reduced, and the system is more suitable for 23 filters after testing; finally, the voice characteristics of the audio file are obtained through logarithm operation, the logarithm operation comprises absolute value taking and log operation, the absolute value taking only uses amplitude values, the influence of phases is ignored, because the phase information has little effect in voice recognition, human perception is in direct proportion to the logarithm of the frequency, log simulation is used, convolution becomes multiplication after FFT conversion, multiplication becomes addition after logarithm taking, and convolution signals are converted into additive signals.
The Bi-LSTM, namely the forward and reverse double LSTM, can better capture the context associated information in the sentence, which is very important in the instruction used in the main control room with fixed sentence format but extremely low content logic, the Bi-LSTM scheme is used for performing voice recognition training on the extracted voice features, the MFCC and dictionary module are additionally added in the training process on the basis of the traditional artificial intelligence learning, voice recognition training is performed after the MFCC parameters are extracted, and a special dictionary added with the regular expression configuration is additionally introduced as an aid for each training result, so that the voice recognition training has better sparsity and efficiency, meanwhile, the redundancy can be effectively reduced, and the training efficiency and the final effect are greatly increased.
In the whole training process of this embodiment, compared with the conventional speech recognition training method, the recognition accuracy is improved by 25%, the time consumed in the training process is reduced by 15%, and the number of required persons and the corpus are reduced by about 30%, for example, as shown in fig. 2, the time required for recognition in this embodiment is compared with that in the prior art, as shown in fig. 2, no boundary finger is set, no specialized dictionary is used, a specialized dictionary is normally set, a specialized dictionary added with a regular expression configuration is used for targeted setting, and no feature, a small number of features, a large number of features, and a large number of features are set for targeted setting, which means that different vocabularies are recorded in the specialized dictionary through text files.
When the verification test is performed on the control instruction voice recognition model for the nuclear power plant established in the embodiment, an operator checks the system operation rules, sends 20 continuous instructions in the rules by using the voice control instruction, after each instruction is sent out, the machine performs analysis processing in a time less than 3s, all 20 instructions are completely run, the recognition speed is improved by 20% compared with the traditional G2P voice recognition speed, meanwhile, the recognition success rate in the simulation process is 100%, and the recognition accuracy of the embodiment and the prior art is as shown in fig. 3.
Example 2:
as shown in fig. 4, an embodiment 2 of the present invention provides an intelligent voice control method for a nuclear power plant, including:
s1, receiving control instruction voice sent by an operator;
s2, identifying the control instruction voice by using a control instruction voice identification model aiming at the nuclear power plant, which is established by the nuclear power plant voice identification training method in the embodiment 1, so as to obtain an identification result of the control instruction voice;
s3, outputting the recognition result to an operator, and receiving a secondary instruction formed by confirming or modifying the recognition result by the operator;
and S4, executing control on the nuclear power plant according to the secondary instruction.
Optionally, outputting the recognition result to an operator, specifically including:
displaying the recognition result in characters on an interactive interface operated by an operator;
and broadcasting the recognition result to an operator through machine voice.
Specifically, in this embodiment, a nuclear power plant voice recognition training system has been first built by the method according to embodiment 1, a nuclear power plant control instruction voice recognition model is trained and obtained in the nuclear power plant voice recognition training system, then a nuclear power plant intelligent voice control system is built, and the method according to this embodiment is applied to the nuclear power plant intelligent voice control system. Illustratively, an operator uses a voice command (control command voice) to operate the nuclear power plant equipment according to a rule, and after receiving the voice command, the system correspondingly analyzes and processes the voice command according to the control command voice recognition model trained in embodiment 1 to obtain a recognition result, specifically including: extracting speech features from the received speech instruction based on MFCC, recognizing the extracted speech features using a Bi-LSTM model having a weight configuration satisfying a preset convergence condition, and restricting the recognition result to be composed of words in the private dictionary with the regular expression. After the recognition result is obtained through the steps, the recognized character content is given on an interactive picture operated by an operator, and voice interaction of whether the instruction is executed or not is given (namely, the machine broadcasts the recognized character content by using a Text to Speech technology). If the operator confirms, the system executes the instruction to ensure the correctness of the instruction execution; if the operator modifies the instruction, the system executes the modified instruction, records the modified content, and places the modified content into the nuclear power plant voice recognition training system, specifically, the text content of the secondary instruction obtained after the modification by the operator is correspondingly placed into a text file of the nuclear power plant voice recognition training system, and the corresponding voice instruction of the operator is used as one of the audio files of the nuclear power plant voice recognition training system, and if the number of times of modifying the recognition result by the operator is large, which indicates that the model recognition accuracy obtained by the previous training does not meet the requirement, the new control instruction voice recognition model is obtained by retraining according to the method of embodiment 1 after the examination and reconfiguration by the developer based on the text file and the audio file expanded in the actual operation process.
Example 3:
as shown in fig. 5, an embodiment 3 of the present invention provides a nuclear power plant speech recognition training system, including:
the text module 01 is used for carrying out feature classification extraction on pre-established text files comprising various control instructions of the nuclear power plant so as to establish a special dictionary for the nuclear power plant and form a regular expression for carrying out feature limitation on words in the special dictionary;
and the audio module 02 is connected with the text module 01 and is used for performing voice recognition training on various pre-collected audio files comprising each control instruction, and limiting a voice recognition result to be composed of words in the special dictionary with the regular expression in the training process so as to establish a control instruction voice recognition model for the nuclear power plant.
Optionally, the text file includes:
the point number of the controlled equipment of the nuclear power plant reactor type is indicated by a plurality of control command actions including Chinese expression and a plurality of control commands including English and number expression.
Optionally, the text module 01 specifically includes:
the dictionary unit is used for extracting Chinese, english and numbers which repeatedly appear for many times in the text file to serve as Chinese keywords, english keywords and number keywords so as to establish a special dictionary for the nuclear power plant;
and the regular expression unit is connected with the dictionary unit and is used for forming a regular expression for carrying out characteristic limitation on the context of the Chinese keywords and a regular expression for carrying out characteristic limitation on the digits and the attributes of each digit of the English keywords and the digital keywords.
Optionally, the pre-collected multiple audio files including each control instruction specifically include:
and aiming at each control instruction corresponding to the text file, acquiring multiple audio files with different timbres in advance.
Optionally, the audio module 02 specifically includes:
an extraction unit, configured to extract a voice feature from the audio file based on a mel-frequency cepstrum coefficient MFCC algorithm;
the training unit is connected with the extraction unit and is used for carrying out voice recognition training on the extracted voice features based on a bidirectional long-short term memory network Bi-LSTM;
the limiting unit is connected with the training unit and used for limiting the voice recognition result to be composed of words in the special dictionary with the regular expression in the training process, and verifying the voice recognition result of each audio file by using the text file content corresponding to the control instruction corresponding to each audio file so as to adjust the weight configuration in the Bi-LSTM;
and the combination unit is connected with the limiting unit and is used for obtaining the Bi-LSTM model with the weight configuration meeting the preset convergence condition after the training is finished, and combining the MFCC algorithm, the Bi-LSTM model with the weight configuration meeting the preset convergence condition, the special dictionary and the regular expression to serve as a control instruction voice recognition model for the nuclear power plant.
Optionally, the extraction unit specifically includes:
the processing subunit is used for carrying out pre-emphasis, framing, windowing and fast Fourier transform processing on the audio file;
the filtering subunit is connected with the processing subunit and is used for filtering the processed audio file by adopting a group of Mel-scale triangular filter groups consisting of 23 filters;
and the operation subunit is connected with the filtering subunit and is used for carrying out logarithm operation on the filtered audio file so as to complete the extraction of the voice characteristics of the audio file.
Example 4:
as shown in fig. 6, an embodiment 4 of the present invention provides an intelligent voice control system for a nuclear power plant, including:
the receiving module 1 is used for receiving control instruction voice sent by an operator;
the recognition module 2 is connected to the receiving module 1, and is configured to recognize the control instruction voice by using the control instruction voice recognition model for the nuclear power plant, which is established by using the nuclear power plant voice recognition training method according to embodiment 1, so as to obtain a recognition result of the control instruction voice;
the output module 3 is connected with the identification module 2 and used for outputting the identification result to an operator and receiving a secondary instruction formed by confirming or modifying the identification result by the operator;
and the control module 4 is connected with the output module 3 and is used for executing the control of the nuclear power plant according to the secondary instruction.
Optionally, the output module 3 specifically includes:
the display unit is used for displaying the identification result in characters on an interactive interface operated by an operator;
and the broadcasting unit is used for broadcasting the recognition result to an operator through machine voice.
According to the nuclear power plant voice recognition training method, the intelligent voice control method and the system provided by the embodiments 1-4 of the invention, by establishing the special dictionary for the nuclear power plant and the regular expression for the vocabularies in the special dictionary, when the voice recognition training is performed on the audio file of the control instruction of the nuclear power plant, the regular expression and the special dictionary are adopted to limit the training recognition result, the voice recognition training efficiency for the nuclear power plant is improved, the accuracy of the voice recognition on the control instruction of the nuclear power plant is further increased, the obtained voice recognition model can meet the use requirement of the nuclear power plant, after the method is applied to the intelligent control of the nuclear power plant, the digital operation level of the nuclear power plant is improved, the work load of a nuclear power plant operator is reduced, and the operation safety of the nuclear power plant is improved.
It will be understood that the above embodiments are merely exemplary embodiments adopted to illustrate the principles of the present invention, and the present invention is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (10)

1. A nuclear power plant voice recognition training method is characterized by comprising the following steps:
performing characteristic classification extraction on a pre-established text file comprising a plurality of control instructions of a nuclear power plant to establish a special dictionary for the nuclear power plant and form a regular expression for performing characteristic limitation on words in the special dictionary;
and performing voice recognition training on a plurality of pre-collected audio files comprising each control instruction, and limiting a voice recognition result to be composed of words in the special dictionary with the regular expression in the training process so as to establish a control instruction voice recognition model for the nuclear power plant.
2. The method of claim 1, wherein the text file comprises:
the point number of the controlled equipment of the nuclear power plant reactor type is indicated by a plurality of control command actions including Chinese expression and a plurality of control commands including English and number expression.
3. The method according to claim 2, wherein the step of performing feature classification extraction on a pre-established text file including a plurality of control instructions of the nuclear power plant to establish a special dictionary for the nuclear power plant, and the step of forming a regular expression for performing feature limitation on words in the special dictionary specifically comprises the steps of:
extracting Chinese, english and numbers which repeatedly appear for many times in the text file as Chinese keywords, english keywords and number keywords so as to establish a special dictionary for the nuclear power plant;
and forming a regular expression for performing characteristic limitation on the context of the Chinese keywords and performing characteristic limitation on the digit number and the attribute of each digit of the English keywords and the digit keywords.
4. A method according to any one of claims 1 to 3, wherein the pre-captured plurality of audio files comprising each control instruction are in particular:
and aiming at each control instruction corresponding to the text file, acquiring multiple audio files with different timbres in advance.
5. The method according to claim 4, wherein a plurality of pre-collected audio files including each control instruction are subjected to speech recognition training, and a speech recognition result is limited to be composed of words in the special dictionary with the regular expression in the training process so as to establish a control instruction speech recognition model for the nuclear power plant, and the method specifically comprises the following steps:
extracting voice features from the audio file based on a Mel Frequency Cepstrum Coefficient (MFCC) algorithm;
performing voice recognition training on the extracted voice features based on a bidirectional long-short term memory network Bi-LSTM;
limiting a voice recognition result to be composed of words in the special dictionary with the regular expression in the training process, and verifying the voice recognition result of each audio file by utilizing the text file content corresponding to the control instruction corresponding to each audio file so as to adjust the weight configuration in the Bi-LSTM;
and after training is finished, obtaining a Bi-LSTM model with weight configuration meeting a preset convergence condition, and combining the MFCC algorithm, the Bi-LSTM model with weight configuration meeting the preset convergence condition, the special dictionary and the regular expression to serve as a control instruction voice recognition model for the nuclear power plant.
6. The method as claimed in claim 5, wherein extracting speech features from the audio file based on Mel Frequency Cepstral Coefficients (MFCC) algorithm comprises:
carrying out pre-emphasis, framing, windowing and fast Fourier transform processing on the audio file;
filtering the processed audio file by adopting a group of Mel-scale triangular filter groups consisting of 23 filters;
and carrying out logarithm operation on the filtered audio file to finish the voice feature extraction of the audio file.
7. An intelligent voice control method for a nuclear power plant is characterized by comprising the following steps:
receiving control instruction voice sent by an operator;
recognizing the control instruction voice by using a control instruction voice recognition model for the nuclear power plant, which is established by using the nuclear power plant voice recognition training method according to any one of claims 1 to 6, so as to obtain a recognition result of the control instruction voice;
outputting the recognition result to an operator, and receiving a secondary instruction formed by confirming or modifying the recognition result by the operator;
and executing the control of the nuclear power plant according to the secondary instruction.
8. The method according to claim 7, wherein outputting the recognition result to an operator specifically comprises:
displaying the recognition result in characters on an interactive interface operated by an operator;
and broadcasting the recognition result to an operator through machine voice.
9. A nuclear power plant speech recognition training system, comprising:
the system comprises a text module, a characteristic classification and extraction module and a characteristic classification and extraction module, wherein the text module is used for carrying out characteristic classification and extraction on a pre-established text file comprising a plurality of control instructions of a nuclear power plant so as to establish a special dictionary for the nuclear power plant and form a regular expression for carrying out characteristic limitation on words in the special dictionary;
and the audio module is connected with the text module and used for carrying out voice recognition training on various pre-collected audio files comprising each control instruction, and limiting a voice recognition result to be composed of words in the special dictionary with the regular expression in the training process so as to establish a control instruction voice recognition model aiming at the nuclear power plant.
10. An intelligent voice control system of a nuclear power plant, comprising:
the receiving module is used for receiving control instruction voice sent by an operator;
a recognition module, connected to the receiving module, for recognizing the control instruction voice by using a control instruction voice recognition model for a nuclear power plant, which is established by the nuclear power plant voice recognition training method according to any one of claims 1 to 6, so as to obtain a recognition result of the control instruction voice;
the output module is connected with the recognition module and used for outputting the recognition result of the control instruction voice to an operator and receiving a secondary instruction formed by confirming or modifying the recognition result by the operator;
and the control module is connected with the output module and used for executing the control of the nuclear power plant according to the secondary instruction.
CN202210936148.0A 2022-08-05 2022-08-05 Nuclear power plant voice recognition training method, intelligent voice control method and system Pending CN115294966A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210936148.0A CN115294966A (en) 2022-08-05 2022-08-05 Nuclear power plant voice recognition training method, intelligent voice control method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210936148.0A CN115294966A (en) 2022-08-05 2022-08-05 Nuclear power plant voice recognition training method, intelligent voice control method and system

Publications (1)

Publication Number Publication Date
CN115294966A true CN115294966A (en) 2022-11-04

Family

ID=83828641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210936148.0A Pending CN115294966A (en) 2022-08-05 2022-08-05 Nuclear power plant voice recognition training method, intelligent voice control method and system

Country Status (1)

Country Link
CN (1) CN115294966A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095377A (en) * 2022-12-30 2023-05-09 无锡威达智能电子股份有限公司 Remote controller control method and device based on voice recognition and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095377A (en) * 2022-12-30 2023-05-09 无锡威达智能电子股份有限公司 Remote controller control method and device based on voice recognition and electronic equipment

Similar Documents

Publication Publication Date Title
EP4016526B1 (en) Sound conversion system and training method for same
CN107993665B (en) Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system
CN110827801B (en) Automatic voice recognition method and system based on artificial intelligence
US6836760B1 (en) Use of semantic inference and context-free grammar with speech recognition system
JP4393494B2 (en) Machine translation apparatus, machine translation method, and machine translation program
CN109523989A (en) Phoneme synthesizing method, speech synthetic device, storage medium and electronic equipment
CN115019776A (en) Voice recognition model, training method thereof, voice recognition method and device
CN115910066A (en) Intelligent dispatching command and operation system for regional power distribution network
CN114566189A (en) Speech emotion recognition method and system based on three-dimensional depth feature fusion
CN115294966A (en) Nuclear power plant voice recognition training method, intelligent voice control method and system
CN111090726A (en) NLP-based electric power industry character customer service interaction method
CN114944150A (en) Dual-task-based Conformer land-air communication acoustic model construction method
Ling An acoustic model for English speech recognition based on deep learning
CN115249479A (en) BRNN-based power grid dispatching complex speech recognition method, system and terminal
White Speech recognition: A tutorial overview
CN113327585A (en) Automatic voice recognition method based on deep neural network
Kamble et al. Emotion recognition for instantaneous Marathi spoken words
CN113393841A (en) Training method, device and equipment of speech recognition model and storage medium
CN115019787B (en) Interactive homonym disambiguation method, system, electronic equipment and storage medium
CN115455136A (en) Intelligent digital human marketing interaction method and device, computer equipment and storage medium
CN115424618A (en) Electronic medical record voice interaction equipment based on machine learning
CN113658582A (en) Voice-video cooperative lip language identification method and system
CN110085212A (en) A kind of audio recognition method for CNC program controller
Huang et al. Latent discriminative representation learning for speaker recognition
Tomar et al. CNN-MFCC Model for Speaker Recognition using Emotive Speech

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination