CN113808587B - Voice instruction autonomous recognition algorithm - Google Patents

Voice instruction autonomous recognition algorithm Download PDF

Info

Publication number
CN113808587B
CN113808587B CN202111364061.2A CN202111364061A CN113808587B CN 113808587 B CN113808587 B CN 113808587B CN 202111364061 A CN202111364061 A CN 202111364061A CN 113808587 B CN113808587 B CN 113808587B
Authority
CN
China
Prior art keywords
instruction
fluctuation
audio
voice
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111364061.2A
Other languages
Chinese (zh)
Other versions
CN113808587A (en
Inventor
付俊生
陶阳
靳凯丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Lonrec Electric Technology Co ltd
Original Assignee
Nanjing Lonrec Electric Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Lonrec Electric Technology Co ltd filed Critical Nanjing Lonrec Electric Technology Co ltd
Priority to CN202111364061.2A priority Critical patent/CN113808587B/en
Publication of CN113808587A publication Critical patent/CN113808587A/en
Application granted granted Critical
Publication of CN113808587B publication Critical patent/CN113808587B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

The invention discloses a voice instruction self-recognition algorithm, which belongs to the technical field of voice instruction recognition and solves the problem of complex operation of the existing voice instruction, and the voice instruction self-recognition algorithm comprises the following steps: s1: setting a voice instruction library, and adding instructions and corresponding operations thereof in the voice instruction library by a user; s2: collecting audio through a microphone, and filtering audio noise; s3: carrying out voice recognition on the audio, matching the audio with a voice instruction library after recognition, and forming an instruction set; s4: after a complete instruction set is collected, coding is carried out according to each instruction; s5: and after the coding is finished, executing a corresponding program through the coding, and further displaying a corresponding response to the user. The user can complete all the user-defined operations under the instruction at one time only by one instruction, thereby effectively simplifying the operation of the user and improving the use convenience of the equipment.

Description

Voice instruction autonomous recognition algorithm
Technical Field
The invention relates to the technical field of voice instruction recognition, in particular to a voice instruction autonomous recognition algorithm.
Background
It has been a long-standing research direction to communicate voice with a machine to make the machine understand what you say, and voice recognition technology is a highly new technology to make the machine convert voice signals into corresponding texts or commands through a recognition and understanding process, and has been widely used in various fields so far.
Patent number is CN 201410470891.7's patent discloses a speech recognition intelligence LED bulb, its characterized in that: the lamp is internally integrated with a microphone for receiving sound, the microphone continuously receives external sound signals, the indicator lamp is turned on when the sound signals are received, and the indicator lamp is turned off when no sound signals exist; the intelligent voice recognition system is integrated in the lamp, external voice signals received by the microphone can be processed in real time, and lamp switch control instructions of turning on and off the lamp spoken by a user are recognized through the operation of an internal recognition algorithm; when a user speaks a 'light on' voice command, the LED lamp panel is lightened; when a user speaks a voice command of turning off the lamp, the LED lamp panel is turned off. The operation of starting and stopping the lamp by pressing the switch by hand is not needed to be directly carried out before the switch. The lamp can be controlled by only speaking a voice command without any remote control equipment. The voice control brightness function which is not possessed by the traditional lamp is expanded.
However, the conventional speech recognition has the following problems: the relatively simple instruction is realized through the complex operation of the user, wherein one instruction corresponds to one conventional operation, so that the operation is complicated, the corresponding speed of the high-frequency instruction commonly used by the user is low, the existing voice recognition equipment is in a normally-started state, and the danger is improved when the dangerous situations such as water leakage, fire and the like are met.
Disclosure of Invention
The invention aims to provide a voice instruction self-recognition algorithm, which is programmed by a self-defined voice instruction, and a user can complete all self-defined operations under the instruction at one time only by one instruction, thereby effectively simplifying the operation of the user, improving the use convenience of equipment and solving the problems in the background art.
In order to achieve the purpose, the invention provides the following technical scheme: an autonomous recognition algorithm for voice commands, comprising the steps of:
s1: setting a voice instruction library, and adding instructions and corresponding operations thereof in the voice instruction library by a user;
s2: collecting audio through a microphone, converting the audio into an audio oscillogram, correcting the fluctuation amplitude between adjacent fluctuation points, and converting the audio oscillogram into the audio to realize noise filtration;
s3: performing voice recognition on the audio, matching the audio with a voice instruction library after recognition, and forming an instruction set, if the instruction does not exist, repeating the operation of S1 to add the instruction and the corresponding operation thereof to the instruction library;
s4: after a complete instruction set is collected, coding is carried out according to each instruction;
s5: after the coding is finished, executing a corresponding program through the coding, and further displaying a corresponding response to the user;
in S2, I (t) is set to represent the audio waveform received at time t, I (x)t,yt) Representing coordinates (x) in an audio waveform diagramt,yt) The fluctuation point of (c) is set to L (x)t,yt) Representing the point of fluctuation I (x)t,yt) And L (x)t,yt) To form a fluctuation point I (x)t,yt) A local region of 2n ± 1 as the center, where n is a given positive integer; let S (x)t,yt) Representing the point of fluctuation I (x)t,yt) In the local neighborhood L (x)t,yt) Setting a smoothness threshold H (t) and I (a) in the similar fluctuation point sett,bt) Representing a local neighborhood L (x)t,yt) Middle coordinate (a)t,bt) The fluctuation point of (c);
when the fluctuation point I (a)t,bt) Satisfies the following equation [ h (a)t,bt)-h(xt,yt)]When the ratio is less than or equal to H (t), the fluctuation point I (a)t,bt) Join into set S (x)t,yt) Performing the following steps;
when the fluctuation point I (a)t,bt) Satisfies the following equation [ h (a)t,bt)-h(xt,yt)]>H (t), the fluctuation point I (a) is not sett,bt) Join into set S (x)t,yt) Performing the following steps;
wherein, h (a)t,bt) Represents the fluctuation point I (a)t,bt) A smoothed value of h (x)t,yt) Representing the point of fluctuation I (x)t,yt) A smoothed value of (d);
let s (x)t,yt) Representing the point of fluctuation I (x)t,yt) In the local neighborhood L (x)t,yt) And s (x) ist,yt) Is equal to Ns(xt,yt) Divided by NL(xt,yt) Wherein N iss(xt,yt) Represents the set S (x)t,yt) Number of fluctuation points in, NL(xt,yt) Representing a local neighborhood L (x)t,yt) The number of fluctuation points in;
let s1(xt,yt) Representing a local neighborhood L (x)t,yt) Middle pixel in local neighborhood L (x)t,yt) When the fluctuation point is I (x) is the median of the similarity detection coefficientst,yt) Satisfies s (x)t,yt) Is greater than or equal to s1(xt,yt) When it is, the fluctuation point I (x) is determinedt,yt) For normal fluctuation amplitude, the fluctuation point I (x) is not changedt,yt) Is a value h (x)t,yt) When the fluctuation point I (x)t,yt) Satisfies s (x)t,yt) Is less than s1(xt,yt) Then, the fluctuation point is determined as a noise fluctuation point, and the following method is adopted to the fluctuation point I (x)t,yt) Amplitude value h (x) oft,yt) And (5) correcting:
s(xt,yt)=I(xt,yt)/I(at,bt) (1)
wherein
I(xt,yt)∈S(xt,yt) (2)
I(at,bt)∈S(xt,yt) (3)
In the formula (1), s (x)t,yt) Representing the point of fluctuation I (x)t,yt) In the local neighborhood L (x)t,yt) The similarity detection coefficient of (1), I (x)t,yt) Representing a local neighborhood L (x)t,yt) Middle coordinate (x)t,yt) A point of fluctuation of (a)t,bt) Representing a local neighborhood L (x)t,yt) Middle coordinate (a)t,bt) The fluctuation point of (2) and (3) is S (x)t,yt) Representing a local neighborhood L (x)t,yt) A set of medium fluctuation points;
I(xt,yt)/I(at,bt)=(xt-at)(yt-bt) (4)
then
s(xt,yt)=(xt-at)(yt-bt) (5)
Let the fluctuation point before correction be I (x)t,yt) The fluctuation point to be corrected is I(x t,y t),I(at,bt) Representing the point of fluctuation I (x)t,yt) Fluctuation points in the neighborhood which do not need to be corrected;
I(x t,y t)=I(at,bt)/s(xt,yt) (6)
x t=at/s(xt,yt) (7)
y t=bt/s(xt,yt) (8)
I(at,bt)∈S(xt,yt) (9)
wherein s (x) in the formulas (7) and (8)t,yt) Representing the point of fluctuation I (x)t,yt) In the local neighborhood L (x)t,yt) The similarity detection coefficient of (1), x tIs the X-axis coordinate value, y, of the modified fluctuation point tIs the Y-axis coordinate value of the modified fluctuation point.
Further, a plurality of groups of instruction data packets are arranged in the voice instruction library in S1, each group of instruction data packets is provided with text fields, instructions and operations, or each group of instruction data packets is provided with audio segments, instructions and operations, wherein the text fields are words and phrases frequently used by the user and obtained by collecting big data, the operations of the words and phrases are frequently used by the user, and the audio segments are underwater sounds and fire alarm sounds obtained by collecting big data.
Furthermore, the text field, the instruction and the operation in each group of instruction data packets are in one-to-one correspondence, the audio sections, the instructions and the operations are also in one-to-one correspondence in each group of instruction data packets, and one audio can trigger multiple groups of instruction data packets simultaneously.
Furthermore, in the S2, there are two audio acquisition modes of the microphone, one is to set the microphone to record audio at a fixed frequency, and the second is to set an instruction of the awakening degree, and the microphone records voice after awakening.
Further, the step S3 includes the following steps:
s301: converting the denoised audio data into an audio signal:
s302: performing voice feature recognition on the audio signal, comparing the audio signal with an audio segment in a voice instruction library if the voice feature does not exist to obtain a matched instruction and operation, and performing S303 operation if the voice feature exists;
s303: extracting voice features, and performing voice recognition on the voice features according to a voice recognition algorithm to obtain character data;
s304: and screening and analyzing the character data, and matching the character data with a voice instruction library after word segmentation to obtain matched instructions and operations.
Further, the voice recognition algorithm is a DTW algorithm.
Further, the instruction encoding method in S4 is as follows:
s401: acquiring audio and entering a programming mode;
s402: starting to collect voice commands issued by the user, and carrying out the next step after the user stops issuing the specified pause time;
s403: giving a prompt to the user whether instructions exist, if so, repeatedly executing S402, otherwise, executing S404;
s404: converting the collected voice into characters, displaying the characters on a screen, giving a prompt, and if the characters are correct, carrying out the next step;
s405: analyzing the grammar structure and the language entity of the text instruction after word segmentation by respectively using the trained grammar structure deep neural network model and the language entity deep neural network model, then searching and extracting specific instruction elements from the analyzed text instruction according to an instruction library, and collecting the specific instruction elements into an instruction set;
s406: playing prompt to trigger shortcut voice of the instruction set;
s407: collecting the voice of the user, playing whether the prompt is confirmed, if so, storing the voice into a user-defined instruction set library, otherwise, repeating S405;
s408: and prompting the user that the programming mode is finished.
Further, in S404, if the instruction is wrong, the user is prompted which character is to be changed, and the user manually inputs a changed field or re-enters the audio until the instruction is correct, and the user performs the next step after confirming the instruction.
Further, the instruction set and the corresponding audio are stored after the instruction set is executed in S405, and the microphone directly triggers the instruction set to operate again after acquiring the same audio.
Compared with the prior art, the invention has the beneficial effects that:
1. the voice instruction self-recognition algorithm provided by the invention is programmed by a self-defined voice instruction, and a user can complete all self-defined operations under the instruction at one time only by one instruction, so that the operation of the user is effectively simplified, and the use convenience of equipment is improved;
2. according to the voice instruction autonomous recognition algorithm, high-frequency words, phrases, underwater sounds, fire alarm sounds and the like used by a user are collected through big data and used as data of comparison parameters, and the data are arranged into an instruction library, so that the efficiency of manual input is improved;
3. the invention provides a voice instruction autonomous recognition algorithm, which collects indoor natural sound at a fixed frequency, judges whether underwater sound, fire alarm sound or other alarm sound exists in audio after analysis and processing, sends a warning to a user after recognition, and reduces the harm brought by the power equipment in an dangerous case by adopting operations of turning off the power equipment, turning off a power supply, automatically shutting down the power supply and the like.
Drawings
FIG. 1 is an overall flow chart of the voice command autonomous recognition algorithm of the present invention;
FIG. 2 is a schematic diagram of the operation of the voice command autonomous recognition algorithm of the present invention;
FIG. 3 is a diagram of a voice command library structure of the voice command autonomous recognition algorithm of the present invention;
FIG. 4 is a flow chart of a speech recognition method of the voice command autonomous recognition algorithm of the present invention;
FIG. 5 is a flowchart of an instruction encoding method of the speech instruction autonomous recognition algorithm of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1 to 3, an autonomous voice command recognition algorithm includes the following steps:
s1: setting a voice instruction library, and adding instructions and corresponding operations thereof in the voice instruction library by a user;
the voice instruction library is provided with a plurality of groups of instruction data packets, each group of instruction data packets is provided with text fields, instructions and operations, or each group of instruction data packets is provided with audio sections, instructions and operations, wherein the text fields are user commonly-used words and phrases obtained by big data collection, the operations are user commonly-used operations, and the audio sections are underwater sounds and fire alarm sounds obtained by big data collection, so that the efficiency of manual entry is improved; the text field, the instruction and the operation in each group of instruction data packets are in one-to-one correspondence, the audio frequency section, the instruction and the operation are also in one-to-one correspondence in each group of instruction data packets, and one audio frequency can trigger a plurality of groups of instruction data packets simultaneously;
s2: collecting audio by a microphone, converting the audio into an audio waveform diagram, and setting I (t) to represent the audio waveform diagram received at the time t, I (x)t,yt) Representing coordinates (x) in an audio waveform diagramt,yt) Correcting the fluctuation amplitude between adjacent fluctuation points, and converting the audio oscillogram into audio to realize noise filtration; the microphone acquires audio frequency in two modes, one mode is that the microphone is set to record audio frequency at a fixed frequency, indoor natural sound is acquired at the fixed frequency, whether underwater sound, fire alarm sound or other alarm sound exists in the audio frequency is judged after analysis and processing, a warning is sent to a user after recognition, and then the harm of the power equipment in a dangerous case is reduced by adopting operations of turning off the power equipment, turning off a power supply, automatically shutting down the power supply and the like; the second is an instruction for setting the awakening degree, and after awakening, the microphone records voice;
setting L (x)t,yt) Representing the point of fluctuation I (x)t,yt) And L (x)t,yt) To form a fluctuation point I (x)t,yt) A local region of 2n ± 1 as the center, where n is a given positive integer; let S (x)t,yt) Representing the point of fluctuation I (x)t,yt) In the local neighborhood L (x)t,yt) Setting a smoothness threshold H (t) and I (a) in the similar fluctuation point sett,bt) Representing a local neighborhood L (x)t,yt) Middle coordinate (a)t,bt) The fluctuation point of (c);
when the fluctuation point I (a)t,bt) Satisfies the following equation [ h (a)t,bt)-h(xt,yt)]When the ratio is less than or equal to H (t), the fluctuation point I (a)t,bt) Join into set S (x)t,yt) Performing the following steps;
when the fluctuation point I (a)t,bt) Satisfies the following equation [ h (a)t,bt)-h(xt,yt)]>H (t), the fluctuation point I (a) is not sett,bt) Join into set S (x)t,yt) Performing the following steps;
wherein, h (a)t,bt) Represents the fluctuation point I (a)t,bt) A smoothed value of h (x)t,yt) Representing the point of fluctuation I (x)t,yt) A smoothed value of (d);
let s (x)t,yt) Representing the point of fluctuation I (x)t,yt) In the local neighborhood L (x)t,yt) And s (x) ist,yt) Is equal to Ns(xt,yt) Divided by NL(xt,yt) Wherein N iss(xt,yt) Represents the set S (x)t,yt) Number of fluctuation points in, NL(xt,yt) Representing a local neighborhood L (x)t,yt) The number of fluctuation points in;
let s1(xt,yt) Representing a local neighborhood L (x)t,yt) Middle pixel in local neighborhood L (x)t,yt) When the fluctuation point is I (x) is the median of the similarity detection coefficientst,yt) Satisfies s (x)t,yt) Is greater than or equal to s1(xt,yt) When it is, the fluctuation point I (x) is determinedt,yt) For normal fluctuation amplitude, the fluctuation point I (x) is not changedt,yt) Is a value h (x)t,yt) When the fluctuation point I (x)t,yt) Satisfies s (x)t,yt) Is less than s1(xt,yt) Then, the fluctuation point is determined as a noise fluctuation point, and the following method is adopted to the fluctuation point I (x)t,yt) Amplitude value h (x) oft,yt) And (5) correcting:
s(xt,yt)=I(xt,yt)/I(at,bt) (1)
wherein
I(xt,yt)∈S(xt,yt) (2)
I(at,bt)∈S(xt,yt) (3)
In the formula (1), s (x)t,yt) Representing the point of fluctuation I (x)t,yt) In the local neighborhood L (x)t,yt) The similarity detection coefficient of (1), I (x)t,yt) Representing a local neighborhood L (x)t,yt) Middle coordinate (x)t,yt) A point of fluctuation of (a)t,bt) Representing a local neighborhood L (x)t,yt) Middle coordinate (a)t,bt) The fluctuation point of (2) and (3) is S (x)t,yt) Representing a local neighborhood L (x)t,yt) A set of medium fluctuation points;
I(xt,yt)/I(at,bt)=(xt-at)(yt-bt) (4)
then
s(xt,yt)=(xt-at)(yt-bt) (5)
Let the fluctuation point before correction be I (x)t,yt) The fluctuation point to be corrected is I(x t,y t),I(at,bt) Representing the point of fluctuation I (x)t,yt) Fluctuation points in the neighborhood which do not need to be corrected;
I(x t,y t)=I(at,bt)/s(xt,yt) (6)
x t=at/s(xt,yt) (7)
y t=bt/s(xt,yt) (8)
I(at,bt)∈S(xt,yt) (9)
wherein s (x) in the formulas (7) and (8)t,yt) Representing the point of fluctuation I (x)t,yt) In the local neighborhood L (x)t,yt) The similarity detection coefficient of (1), x tIs the X-axis coordinate value, y, of the modified fluctuation point tA Y-axis coordinate value of the corrected fluctuation point;
noise is eliminated, noise interference in the voice recognition process is small, and the error is reduced when underwater sound and fire alarm sound are recognized;
s3: performing voice recognition on the audio, matching the audio with a voice instruction library after recognition, and forming an instruction set, if the instruction does not exist, repeating the operation of S1 to add the instruction and the corresponding operation thereof to the instruction library;
s4: after a complete instruction set is collected, coding is carried out according to each instruction;
s5: after the coding is completed, corresponding programs are executed through the coding, and then corresponding responses are displayed for users, so that more concise operation is facilitated, the problem of complex operation existing in intelligent equipment is mainly solved through the setting of the coding programs, and most of users can use the intelligent equipment more conveniently in daily life.
Referring to fig. 4, the speech recognition in S3 includes the following steps:
s301: converting the denoised audio data into an audio signal:
s302: performing voice feature recognition on the audio signal, comparing the audio signal with an audio segment in a voice instruction library if the voice feature does not exist to obtain a matched instruction and operation, and performing S303 operation if the voice feature exists;
s303: extracting voice features, and performing voice recognition on the voice features according to a voice recognition algorithm to obtain the character data, wherein the voice recognition algorithm is a DTW algorithm;
s304: and screening and analyzing the character data, and matching the character data with a voice instruction library after word segmentation to obtain matched instructions and operations.
Referring to fig. 5, the instruction encoding method in S4 is as follows:
s401: the method comprises the steps of obtaining audio, entering a programming mode, wherein the obtained audio can be actively recorded or recorded after a microphone is started by adopting a wake-up program instruction, wherein the wake-up program instruction is adopted, for example: the intelligent floor sweeping robot is applied to an intelligent floor sweeping robot, the default name of the robot is 'wisdom', and therefore when the floor sweeping robot receives the 'wisdom' instruction, the operation corresponding to the instruction is an answer: "what is on, what is on", etc.; one instruction performs multiple operations, such as: "I want to sleep", its corresponding operation: the method has the advantages that the method firstly executes the operations of turning off the lamp, then executes the operations of operating the air conditioner in the sleep mode, pulling the curtain and the like, realizes a plurality of operations with one instruction, reduces the complex operation of users, improves the happiness index and makes the life more efficient;
s402: starting to collect voice commands issued by the user, and carrying out the next step after the user stops issuing the specified pause time;
s403: giving a prompt to the user whether instructions exist, if so, repeatedly executing S402, otherwise, executing S404;
s404: converting the collected voice into characters, displaying the characters on a screen, giving a prompt, and if the characters are correct, carrying out the next step; if the instruction is wrong, prompting the user which character to replace, wherein the user can manually input a changed field or re-input the audio until the instruction is correct, and then the user carries out the next step after confirming;
s405: analyzing the grammar structure and the language entity of the text instruction after word segmentation by respectively using the trained grammar structure deep neural network model and the language entity deep neural network model, then searching and extracting specific instruction elements from the analyzed text instruction according to an instruction library, and collecting the specific instruction elements into an instruction set; the instruction set and the corresponding audio frequency are stored after the instruction set is executed, and the microphone directly triggers the instruction set to operate again after the same audio frequency is collected;
s406: playing prompt to trigger shortcut voice of the instruction set;
s407: collecting the voice of the user, playing whether the prompt is confirmed, if so, storing the voice into a user-defined instruction set library, otherwise, repeating S405;
s408: and prompting the user that the programming mode is finished.
In summary, the following steps: the voice instruction self-recognition algorithm provided by the invention is programmed by a self-defined voice instruction, and a user can complete all self-defined operations under the instruction at one time only by one instruction, so that the operation of the user is effectively simplified, and the use convenience of equipment is improved; high-frequency words, phrases, underwater sounds, fire alarm sounds and the like used by a user are collected through big data and are used as data of comparison parameters, and the data are arranged into an instruction library, so that the manual input efficiency is improved; the method comprises the steps of collecting indoor natural sound with fixed frequency, judging whether underwater sound, fire alarm sound or other alarm sound exists in audio after analysis and processing, sending a warning to a user after identification, and reducing harm to the power equipment under a dangerous condition by adopting operations of turning off the power equipment, turning off a power supply, automatically turning off the power supply and the like.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be able to cover the technical solutions and the inventive concepts of the present invention within the technical scope of the present invention.

Claims (9)

1. An autonomous voice command recognition algorithm, comprising the steps of:
s1: setting a voice instruction library, and adding instructions and corresponding operations thereof in the voice instruction library by a user;
s2: collecting audio through a microphone, converting the audio into an audio oscillogram, correcting the fluctuation amplitude between adjacent fluctuation points, and converting the audio oscillogram into the audio to realize noise filtration;
s3: performing voice recognition on the audio, matching the audio with a voice instruction library after recognition, and forming an instruction set, if the instruction does not exist, repeating the operation of S1 to add the instruction and the corresponding operation thereof to the instruction library;
s4: after a complete instruction set is collected, coding is carried out according to each instruction;
s5: after the coding is finished, executing a corresponding program through the coding, and further displaying a corresponding response to the user;
in S2, I (t) is set to represent the audio waveform received at time t, I (x)t,yt) Representing coordinates (x) in an audio waveform diagramt,yt) The fluctuation point of (c) is set to L (x)t,yt) Representing the point of fluctuation I (x)t,yt) And L (x)t,yt) To form a fluctuation point I (x)t,yt) A local region of 2n ± 1 as the center, where n is a given positive integer; let S (x)t,yt) Representing the point of fluctuation I (x)t,yt) In the local neighborhood L (x)t,yt) Setting a smoothness threshold H (t) and I (a) in the similar fluctuation point sett,bt) Representing a local neighborhood L (x)t,yt) Middle coordinate (a)t,bt) The fluctuation point of (c);
when the fluctuation point I (a)t,bt) Satisfies the following equation [ h (a)t,bt)-h(xt,yt)]When the ratio is less than or equal to H (t), the fluctuation point I (a)t,bt) Join into set S (x)t,yt) Performing the following steps;
when the fluctuation point I (a)t,bt) Satisfies the following equation [ h (a)t,bt)-h(xt,yt)]>H (t), the fluctuation point I (a) is not sett,bt) Join into set S (x)t,yt) Performing the following steps;
wherein, h (a)t,bt) Represents the fluctuation point I (a)t,bt) A smoothed value of h (x)t,yt) Representing the point of fluctuation I (x)t,yt) A smoothed value of (d);
let s (x)t,yt) Representing the point of fluctuation I (x)t,yt) In the local neighborhood L (x)t,yt) And s (x) ist,yt) Is equal to Ns(xt,yt) Divided by NL(xt,yt) Wherein N iss(xt,yt) Represents the set S (x)t,yt) Number of fluctuation points in, NL(xt,yt) Representing a local neighborhood L (x)t,yt) The number of fluctuation points in;
let s1(xt,yt) Representing a local neighborhood L (x)t,yt) Middle pixel in local neighborhood L (x)t,yt) When the fluctuation point is I (x) is the median of the similarity detection coefficientst,yt) Satisfies s (x)t,yt) Is greater than or equal to s1(xt,yt) When it is, the fluctuation point I (x) is determinedt,yt) For normal fluctuation amplitude, the fluctuation point I (x) is not changedt,yt) Is a value h (x)t,yt) When the fluctuation point I (x)t,yt) Satisfies s (x)t,yt) Is less than s1(xt,yt) Then, the fluctuation point is determined as a noise fluctuation point, and the following method is adopted to the fluctuation point I (x)t,yt) Amplitude value h (x) oft,yt) And (5) correcting:
s(xt,yt)=I(xt,yt)/I(at,bt) (1)
wherein
I(xt,yt)∈S(xt,yt) (2)
I(at,bt)∈S(xt,yt) (3)
In the formula (1), s (x)t,yt) Representing the point of fluctuation I (x)t,yt) In the local neighborhood L (x)t,yt) The similarity detection coefficient of (1), I (x)t,yt) Representing a local neighborhood L (x)t,yt) Middle coordinate (x)t,yt) A point of fluctuation of (a)t,bt) Representing a local neighborhood L (x)t,yt) Middle coordinate (a)t,bt) The fluctuation point of (2) and (3) is S (x)t,yt) Representing a local neighborhood L (x)t,yt) A set of medium fluctuation points;
I(xt,yt)/I(at,bt)=(xt-at)(yt-bt) (4)
then
s(xt,yt)=(xt-at)(yt-bt) (5)
Let the fluctuation point before correction be I (x)t,yt) The fluctuation point to be corrected is I(x t,y t),I(at,bt) Representing the point of fluctuation I (x)t,yt) Fluctuation points in the neighborhood which do not need to be corrected;
I(x t,y t)=I(at,bt)/s(xt,yt) (6)
x t=at/s(xt,yt) (7)
y t=bt/s(xt,yt) (8)
I(at,bt)∈S(xt,yt) (9)
wherein s (x) in the formulas (7) and (8)t,yt) Representing the point of fluctuation I (x)t,yt) In the local neighborhood L (x)t,yt) The similarity detection coefficient of (1), x tIs the X-axis coordinate value, y, of the modified fluctuation point tIs the Y-axis coordinate value of the modified fluctuation point.
2. The algorithm for autonomous speech instruction recognition according to claim 1, wherein a plurality of groups of instruction data packets are provided in the speech instruction library in S1, each group of instruction data packets is provided with text fields, instructions and operations, or each group of instruction data packets is provided with audio segments, instructions and operations, wherein the text fields are words and phrases commonly used by the user and obtained by big data collection, and the operations are operations commonly used by the user, and the audio segments are underwater sounds and fire alarms obtained by big data collection.
3. The algorithm for autonomous speech instruction recognition according to claim 2, wherein text fields, instructions and operations in each group of instruction packets are in one-to-one correspondence, each group of instruction packets has the same one-to-one correspondence of audio fields, instructions and operations, and one audio can simultaneously trigger multiple groups of instruction packets.
4. The algorithm of claim 1, wherein the microphone in S2 collects audio in two ways, one is configured to record audio at a fixed frequency, and the second is configured to set the waking degree, and the microphone records audio after waking.
5. The algorithm for autonomous recognition of voice commands according to claim 2, wherein said S3 comprises the steps of:
s301: converting the denoised audio data into an audio signal:
s302: performing voice feature recognition on the audio signal, comparing the audio signal with an audio segment in a voice instruction library if the voice feature does not exist to obtain a matched instruction and operation, and performing S303 operation if the voice feature exists;
s303: extracting voice features, and performing voice recognition on the voice features according to a voice recognition algorithm to obtain character data;
s304: and screening and analyzing the character data, and matching the character data with a voice instruction library after word segmentation to obtain matched instructions and operations.
6. The voice command autonomous recognition algorithm of claim 1 wherein the voice recognition algorithm is a DTW algorithm.
7. The algorithm of claim 1, wherein the instruction in S4 is encoded as follows:
s401: acquiring audio and entering a programming mode;
s402: starting to collect voice commands issued by the user, and carrying out the next step after the user stops issuing the specified pause time;
s403: giving a prompt to the user whether instructions exist, if so, repeatedly executing S402, otherwise, executing S404;
s404: converting the collected voice into characters, displaying the characters on a screen, giving a prompt, and if the characters are correct, carrying out the next step;
s405: analyzing the grammar structure and the language entity of the text instruction after word segmentation by respectively using the trained grammar structure deep neural network model and the language entity deep neural network model, then searching and extracting specific instruction elements from the analyzed text instruction according to an instruction library, and collecting the specific instruction elements into an instruction set;
s406: playing prompt to trigger shortcut voice of the instruction set;
s407: collecting the voice of the user, playing whether the prompt is confirmed, if so, storing the voice into a user-defined instruction set library, otherwise, repeating S405;
s408: and prompting the user that the programming mode is finished.
8. The algorithm for autonomous speech command recognition according to claim 7, wherein the user is prompted as to which word to replace if the command is incorrect in S404, the user manually enters a field to be changed, or the user re-enters audio until the command is correct and the user confirms that the next step is to be performed.
9. The algorithm as claimed in claim 7, wherein the instruction set and the corresponding audio are stored after the instruction set is executed in S405, and the microphone directly triggers the instruction set to operate again after the same audio is collected.
CN202111364061.2A 2021-11-17 2021-11-17 Voice instruction autonomous recognition algorithm Active CN113808587B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111364061.2A CN113808587B (en) 2021-11-17 2021-11-17 Voice instruction autonomous recognition algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111364061.2A CN113808587B (en) 2021-11-17 2021-11-17 Voice instruction autonomous recognition algorithm

Publications (2)

Publication Number Publication Date
CN113808587A CN113808587A (en) 2021-12-17
CN113808587B true CN113808587B (en) 2022-04-12

Family

ID=78898659

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111364061.2A Active CN113808587B (en) 2021-11-17 2021-11-17 Voice instruction autonomous recognition algorithm

Country Status (1)

Country Link
CN (1) CN113808587B (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902193A (en) * 2012-12-30 2014-07-02 青岛海尔软件有限公司 System and method for operating computers to change slides by aid of voice
CN105407588A (en) * 2014-09-16 2016-03-16 何庆沐 Speech recognition intelligent LED bulb
CN106887227A (en) * 2015-12-16 2017-06-23 芋头科技(杭州)有限公司 A kind of voice awakening method and system
CN105744326A (en) * 2016-02-03 2016-07-06 广东长虹电子有限公司 Editable voice intelligent control method and system for television
CN108074565A (en) * 2016-11-11 2018-05-25 上海诺悦智能科技有限公司 Phonetic order redirects the method and system performed with detailed instructions
CN107734193A (en) * 2017-11-22 2018-02-23 深圳悉罗机器人有限公司 Smart machine system and smart machine control method
CN109758716B (en) * 2019-03-26 2020-12-01 林叶蓁 Rope skipping counting method based on sound information
CN113128228A (en) * 2021-04-07 2021-07-16 北京大学深圳研究院 Voice instruction recognition method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113808587A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN108962255B (en) Emotion recognition method, emotion recognition device, server and storage medium for voice conversation
CN105206271A (en) Intelligent equipment voice wake-up method and system for realizing method
CN109410952B (en) Voice awakening method, device and system
US6959276B2 (en) Including the category of environmental noise when processing speech signals
CN107393555B (en) Detection system and detection method for abnormal sound signal with low signal-to-noise ratio
CN111294258A (en) Voice interaction system and method for controlling intelligent household equipment
CN106847281A (en) Intelligent household voice control system and method based on voice fuzzy identification technology
CN109949808A (en) The speech recognition appliance control system and method for compatible mandarin and dialect
CN205943457U (en) Acoustic control machine of sweeping floor
CN110992932B (en) Self-learning voice control method, system and storage medium
CN103198829A (en) Method, device and equipment of reducing interior noise and improving voice recognition rate
CN105788596A (en) Speech recognition television control method and system
CN106971714A (en) A kind of speech de-noising recognition methods and device applied to robot
CN109783049A (en) Method of controlling operation thereof, device, equipment and storage medium
CN114582318B (en) Intelligent home control method and system based on voice recognition
CN110970020A (en) Method for extracting effective voice signal by using voiceprint
CN113808587B (en) Voice instruction autonomous recognition algorithm
CN111105798B (en) Equipment control method based on voice recognition
CN109859752A (en) A kind of sound control method, device, storage medium and voice joint control system
CN102693721A (en) Simple and easy voice and gender detection device and method
CN109979446A (en) Sound control method, storage medium and device
CN115331658B (en) Voice recognition method
CN113516975A (en) Intelligent household voice-operated switch system and control method
CN110197663A (en) A kind of control method, device and electronic equipment
CN116386623A (en) Voice interaction method of intelligent equipment, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: An Autonomous Speech Instruction Recognition Algorithm

Granted publication date: 20220412

Pledgee: Bank of Nanjing Co.,Ltd. Nanjing Chengnan sub branch

Pledgor: NANJING LONREC ELECTRIC TECHNOLOGY CO.,LTD.

Registration number: Y2024980008006