CN111968628A - Signal accuracy adjusting system and method for voice instruction capture - Google Patents

Signal accuracy adjusting system and method for voice instruction capture Download PDF

Info

Publication number
CN111968628A
CN111968628A CN202010852699.XA CN202010852699A CN111968628A CN 111968628 A CN111968628 A CN 111968628A CN 202010852699 A CN202010852699 A CN 202010852699A CN 111968628 A CN111968628 A CN 111968628A
Authority
CN
China
Prior art keywords
voice
instruction
voice instruction
matching
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010852699.XA
Other languages
Chinese (zh)
Other versions
CN111968628B (en
Inventor
彭玲玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NANJING GUIJI INTELLIGENT TECHNOLOGY Co.,Ltd.
Original Assignee
彭玲玲
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 彭玲玲 filed Critical 彭玲玲
Priority to CN202010852699.XA priority Critical patent/CN111968628B/en
Priority to CN202110561900.3A priority patent/CN113436618A/en
Publication of CN111968628A publication Critical patent/CN111968628A/en
Application granted granted Critical
Publication of CN111968628B publication Critical patent/CN111968628B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a signal accuracy adjusting system and method for voice instruction capture, wherein the system comprises a voice instruction sample library real-time updating module, an instruction segmentation acquisition unit, an acquisition unit identification instruction analysis matching module, a sample library instruction intelligent matching module and a non-limited voice instruction signal manual capture training module, the voice instruction sample library real-time updating module uploads a newly updated voice instruction to a voice instruction sample library in real time for storage, the instruction segmentation acquisition unit is used for performing segmentation acquisition on a voice instruction input by a user and intelligently identifying the voice instruction identified by the current user, the acquisition unit identification instruction analysis matching module is used for performing identification matching analysis on the voice instructions acquired by different acquisition modules, the sample library instruction intelligent matching module matches the acquired and screened voice instruction with a sample library, and the non-limited voice instruction signal manual capture training module is used for performing acquisition on the acquired voice instruction which is not in the sample library So as to perform capture training.

Description

Signal accuracy adjusting system and method for voice instruction capture
Technical Field
The invention relates to the technical field of voice recognition, in particular to a signal accuracy adjusting system and method for voice instruction capture.
Background
Speech recognition technology is a high technology that allows machines to convert speech signals into corresponding text or commands through a recognition and understanding process. The voice recognition technology mainly comprises three aspects of a feature extraction technology, a pattern matching criterion and a model training technology. The car networking of the voice recognition technology is also fully quoted, for example, in the car networking of the wing card, the destination can be set for direct navigation only by speaking the customer service personnel through push-to-talk, and the car networking of the wing card is safe and convenient.
The speech recognition tasks can be broadly classified into 3 types, i.e., isolated word recognition, keyword recognition, and continuous speech recognition, according to the objects to be recognized. The task of the isolated word recognition is to recognize isolated words known in advance, such as 'power on', 'power off', and the like; the task of continuous speech recognition is to recognize any continuous speech, such as a sentence or a segment of speech; keyword detection in a continuous speech stream is for continuous speech, but it does not recognize all words, but only detects where several keywords are known to occur.
Speech recognition techniques can be divided into person-specific speech recognition, which can only recognize the speech of one or a few persons, and person-unspecific speech recognition, which can be used by anyone, depending on the speaker in question. Clearly, a non-human specific speech recognition system is more practical, but it is much more difficult to recognize than for a specific human. In addition, according to voice devices and channels, desktop (PC) voice recognition, telephone voice recognition, and embedded device (cell phone, PDA, etc.) voice recognition can be classified. Different acquisition channels distort the acoustic properties of the human voice and therefore require the construction of separate recognition systems.
The application field of speech recognition is very wide, and common application systems are as follows: compared with a keyboard input method, the voice input system is more in line with the daily habits of people, and is more natural and more efficient; the voice control system, namely, the operation of the equipment is controlled by voice, is more rapid and convenient compared with manual control, and can be used in a plurality of fields such as industrial control, voice dialing system, intelligent household appliances, voice control intelligent toys and the like; the intelligent dialogue inquiry system operates according to the voice of the client and provides natural and friendly database retrieval service for the user.
Speech recognition has five major problems: recognition and understanding of natural language. Firstly, continuous speech must be decomposed into units such as words, phonemes and the like, and secondly, a rule for understanding semantics is established; the amount of speech information is large. The speech patterns are different not only for different speakers but also for the same speaker, for example, speech information of a speaker is different between voluntary speaking and careful speaking. The way a person speaks varies over time. Ambiguity of speech. When a speaker speaks, different words may sound similar. This is common in english and chinese. The phonetic characteristics of a single letter or word, etc. are influenced by the context so as to change accents, pitch, volume, speed of articulation, etc. Environmental noise and interference have a serious influence on speech recognition, resulting in a low recognition rate.
At present, the situation that the voice command is not clearly recognized is easily caused when the voice command is recognized, but at present, equipment only recognizes voice frequently when the voice is input, and the method aims to recognize and analyze the voice and the lip language of a user through sound recording and video recording segmentation, so that the accuracy of a voice command signal is improved.
Disclosure of Invention
The present invention is directed to a system and method for adjusting signal accuracy for capturing voice commands, so as to solve the above-mentioned problems.
In order to solve the technical problems, the invention provides the following technical scheme: a signal accuracy adjusting system and method for voice instruction capture, the system includes voice instruction sample bank real-time updating module, instruction sectional acquisition unit, acquisition unit identification instruction analysis matching module, sample bank instruction intelligent matching module and non-limited voice instruction signal manual capture training module, wherein, the voice instruction sample bank real-time updating module, the instruction sectional acquisition unit, the acquisition unit identification instruction analysis matching module, the sample bank instruction intelligent matching module and the non-limited voice instruction signal manual capture training module are connected through the intranet in sequence, the sample bank instruction intelligent matching module and the non-limited voice instruction signal manual capture training module are respectively connected with the voice instruction sample bank real-time updating module through the intranet;
the voice instruction real-time updating module is used for uploading a newly updated voice instruction to the voice instruction sample library in real time for storage, the stored voice instruction is fed back to a system platform and is convenient for a user to check, the instruction segmentation acquisition unit is used for performing segmentation acquisition on the voice instruction input by the user, the voice instruction identified by the current user is intelligently identified, the acquisition unit identification instruction analysis matching module is used for performing identification matching on the voice instructions acquired by different acquisition modules, the matching rate of the acquired different instructions is analyzed, the sample library instruction intelligent matching module is used for matching the acquired and screened voice instruction with the sample library to determine whether the voice instruction exists in the current sample library, and the non-limited voice instruction signal manual capturing training module is used for capturing and training the acquired voice instruction which is not in the sample library.
By adopting the technical scheme: the voice instruction sample library real-time updating module comprises an updating instruction sample key word input submodule and an instruction sample key word collecting feedback submodule, the updating instruction sample key word input submodule is used for inputting a voice instruction output by training into the sample library in real time to update and expand a voice instruction template in the sample library, the instruction sample key word collecting feedback submodule is used for collecting voice instructions in the voice instruction sample library and feeding the collected voice instructions back to the system platform, and a user sends the voice instructions to corresponding equipment according to the collected voice instruction set to control the corresponding equipment.
By adopting the technical scheme: the voice recognition and lip language identification device is characterized in that the instruction segmentation acquisition unit comprises an instruction first voice acquisition unit and an instruction second video acquisition unit, the instruction first voice acquisition unit is used for recording voice instructions sent by a user, segmenting and cutting a recording file, performing voice recognition on each segment, the instruction second video acquisition unit is used for recording when the voice instructions are sent by the user, segmenting and cutting a video, performing lip language recognition on each segmented video, summarizing instruction information obtained through voice recognition and lip language recognition according to different segments, wherein the recording file and the video file are cut according to the same time segment, marking segmented data obtained through voice recognition and lip language recognition respectively, and sending the marked data to the acquisition unit recognition instruction analysis and matching module.
By adopting the technical scheme: the acquisition unit identification instruction analysis matching module comprises a fragmentation identification instruction matching rate analysis submodule and a secondary identification adjustment matching submodule, the fragmentation identification instruction matching rate analysis submodule identifies sectional type recording and video files acquired by an instruction first voice acquisition unit and an instruction second video acquisition unit and matches voice identification data and lip language identification data segmented at the same time, the matching rate analysis of the voice and lip language identification data of each segment, the secondary identification adjustment matching submodule is used for adjusting the matching rate of the voice and lip language identification data of each segment when the primary matching rate does not meet the requirement, segmenting the voice file and the video file collected by the first voice collecting unit and the second video collecting unit according to time again, and respectively carrying out voice recognition and lip language recognition on the segmented audio and video files again, and carrying out matching analysis on the re-recognized data.
By adopting the technical scheme: the segmentation recognition instruction matching rate analysis submodule is used for respectively performing segmentation voice recognition and lip language recognition on the collected voice file and video file, matching the voice recognition data and the lip language recognition data of the same time segment according to key words and segmentation explanatory contents in the segment, setting the keyword matching rate of the voice recognition data and the lip language recognition data of different current segments to be F1%, the segmentation explanatory content matching rate to be F2%, setting the keyword matching rate to be Pm%, setting the segmentation explanatory content matching rate to be Pn%, setting the comprehensive matching rate of the voice recognition data and the lip language recognition data in a certain time segment to be F0, and satisfying the formula:
F0=F1%*Pm%+F2%*Pn%
calculating the comprehensive matching degree of the voice recognition data and the lip language recognition data in the current time segment, and calculating the comprehensive matching degree of the collected voice file and the video file in different time segments one by one to be F01、F02、F03、…、F0n-1、F0nSetting the total matching degree of the collected voice commands to meet the following formula:
Figure BDA0002645275700000041
when the total matching degree of different segmented sets of the collected voice instruction meets the formula, judging that the matching degree of the voice instruction is qualified, sending the voice instruction to a sample library for matching, when the total matching degree of different segmented sets of the collected voice instruction does not meet the formula, judging that the matching degree of the voice instruction is unqualified, sending the voice instruction to a secondary recognition adjustment matching submodule, re-segmenting the voice file and the video file collected by a first voice collection unit and a second video collection unit according to time, respectively re-performing voice recognition and lip language recognition on the segmented recording file and video file, performing matching analysis on the re-recognized data, when the formula is met after the secondary recognition, sending the voice instruction to the sample library, and when the formula is still not met after the secondary recognition, judging that the voice instruction does not meet the voice input standard, and feeding back to the user for re-entry.
By adopting the technical scheme: the intelligent matching module of the sample library instruction comprises a limited voice instruction signal matching and marking sub-module and an undefined voice instruction signal manual feedback sub-module, wherein the limited voice instruction signal matching and marking sub-module is used for matching the voice instruction of the user with the instruction signal stored in the sample library, when the voice instruction input by the user exists in the sample library, the instruction is screened out after being marked, the voice instruction is marked as a limited voice instruction signal, the equipment is controlled according to the originally set equipment processing method of the instruction signal, the manual feedback sub-module of the non-limited voice instruction signal is used for matching the voice instruction of the user with the instruction signal stored in the sample library, and when the voice instruction input by the user does not exist in the sample library, judging that the current voice instruction is an undefined voice instruction signal, and sending the instruction to the undefined voice instruction signal artificial capturing training module for artificial training.
By adopting the technical scheme: the non-limited voice instruction signal manual capturing training module comprises a non-limited voice instruction signal simulation device training submodule and a training detection output probability analysis submodule, the non-limited voice instruction signal simulation device training submodule is used for performing simulation device training on non-limited voice instruction signals which do not exist in a sample library, the simulation device is trained for a plurality of times through voice training, a training result is sent to the training detection output probability analysis submodule, and the training detection output probability analysis submodule is used for monitoring and analyzing the results of the voice training of the simulation devices for the plurality of times and judging whether the current voice instruction can be successfully recorded.
By adopting the technical scheme: the training detection output probability analysis submodule trains the non-limited voice command signal for a plurality of times through the simulation equipment, and the operational coefficient of the non-limited voice command signal under the training of the simulation equipment for a plurality of times is set to be Y1、Y2、Y3、…、Yn-1、YnWherein, the operational coefficient is 1-100, the standard coefficient of the current simulation equipment training is set as Yj, the maximum value of the operational data of the simulation equipment training is set as CO, the minimum value of the operational data of the simulation equipment training is set as C1, the training pass rate H of the simulation equipment is monitored, and the formula is satisfied:
Figure BDA0002645275700000061
calculating the qualification rate of the current simulation equipment for training the non-limited voice instruction signal, judging that the current non-limited voice instruction signal can be implemented on the equipment when the qualification rate is greater than a set threshold value, sending the current voice instruction signal to a sample library for storage, judging that the current non-limited voice instruction signal cannot be implemented on the equipment when the qualification rate is less than or equal to the set threshold value, and not processing the voice instruction signal.
A signal accuracy adjustment method for voice instruction capture:
s1: a voice instruction sample library real-time updating module is used for uploading the newly updated voice instruction to a voice instruction sample library in real time for storage, and the stored voice instruction is fed back to a system platform for being convenient for a user to check;
s2: the voice command input by the user is acquired in a fragmentation mode by using a command segmentation acquisition unit, and the voice command identified by the current user is intelligently identified;
s3: the voice commands collected by different collecting modules are identified and matched by using a collecting unit identification command analysis and matching module, and the matching rate of the collected different commands is analyzed;
s4: matching the collected and screened voice instruction with a sample library by using a sample library instruction intelligent matching module, and determining whether the voice instruction exists in the current sample library;
s5: and utilizing an undefined voice instruction signal manual capturing training module to perform capturing training on the collected voice instructions which are not in the sample library.
By adopting the technical scheme: the adjustment method further comprises the following steps:
s1-1: the voice instruction output by training is input into the sample library in real time for updating by utilizing an updating instruction sample key vocabulary input submodule, a voice instruction template in the sample library is expanded, an instruction sample key vocabulary summarizing feedback submodule summarizes the voice instruction in the voice instruction sample library, the summarized voice instruction is fed back to a system platform, and a user sends the voice instruction to corresponding equipment for control according to the summarized voice instruction set;
s2-1: recording a voice instruction sent by a user by utilizing an instruction first voice acquisition unit, segmenting and cutting a recording file, performing voice recognition on each segment, instructing a second video acquisition unit to record a video when the voice instruction is sent by the user, segmenting and cutting the video, performing lip language recognition on each segmented video, summarizing instruction information obtained by the voice recognition and the lip language recognition according to different segments, wherein the recording file and the video file are cut according to the same time segment, marking segmented data obtained by the voice recognition and the lip language recognition respectively, and sending the marked data to an acquisition unit recognition instruction analysis matching module;
s3-1: the method comprises the steps that a segmented voice recording and video recording file acquired by a first voice acquisition unit and a second video acquisition unit is identified by a segmented identification instruction matching rate analysis submodule according to an instruction, segmented voice identification and lip language identification data at the same time are matched, the matching rate of the voice and lip language identification data of each segment is analyzed, when the first matching rate does not meet requirements, a secondary identification adjustment matching submodule conducts segmentation on the voice recording file and the video recording file acquired by the first voice acquisition unit and the second video acquisition unit again according to time, voice identification and lip language identification are conducted on the segmented voice recording file and video recording file again respectively, and matching analysis is conducted on the newly identified data;
s4-1: matching a user voice instruction with an instruction signal stored in a sample library by using a limited voice instruction signal matching and marking sub-module, screening out the instruction after marking the instruction when a voice instruction input by a user exists in the sample library, marking the voice instruction as a limited voice instruction signal, controlling the equipment according to an equipment processing method originally set by the instruction signal, matching the user voice instruction with the instruction signal stored in the sample library by using a non-limited voice instruction signal manual feedback sub-module, judging that the current voice instruction is the non-limited voice instruction signal when the voice instruction input by the user does not exist in the sample library, and sending the instruction to a non-limited voice instruction signal manual capturing and training module for manual training;
s5-1: the method comprises the steps of utilizing a non-limited voice instruction signal simulation device training submodule to carry out simulation device training on non-limited voice instruction signals which do not exist in a sample library, carrying out a plurality of times of training on the simulation device through voice training, sending a training result to a training detection output probability analysis submodule, monitoring and analyzing the result of voice training of a plurality of times of simulation devices through the training detection output probability analysis submodule, and judging whether a current voice instruction can be successfully input or not.
Compared with the prior art, the invention has the following beneficial effects: the invention aims to carry out recognition analysis on voice and user lip language by recording and video segmentation, thereby improving the accuracy of voice command signals;
the voice instruction real-time updating module is used for uploading a newly updated voice instruction to the voice instruction sample library in real time for storage, the stored voice instruction is fed back to a system platform and is convenient for a user to check, the instruction segmentation acquisition unit is used for performing segmentation acquisition on the voice instruction input by the user, the voice instruction identified by the current user is intelligently identified, the acquisition unit identification instruction analysis matching module is used for performing identification matching on the voice instructions acquired by different acquisition modules, the matching rate of the different acquired instructions is analyzed, the sample library instruction intelligent matching module is used for matching the acquired and screened voice instruction with the sample library to determine whether the voice instruction exists in the current sample library, and the non-limited voice instruction signal manual capturing training module is used for capturing and training the acquired voice instruction which is not in the sample library.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a block diagram of a signal accuracy adjustment system for voice command capture;
FIG. 2 is a schematic diagram of the steps of a signal accuracy adjustment method for voice command capture;
FIG. 3 is a diagram illustrating the steps of a signal accuracy adjustment method for voice command capture;
FIG. 4 is a schematic diagram of an implementation of a signal accuracy adjustment system for voice command capture.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-4, the present invention provides the following technical solutions:
the working principle of the invention is as follows:
a signal accuracy adjusting system and method for voice instruction capture, the system includes voice instruction sample bank real-time updating module, instruction sectional acquisition unit, acquisition unit identification instruction analysis matching module, sample bank instruction intelligent matching module and non-limited voice instruction signal manual capture training module, wherein, the voice instruction sample bank real-time updating module, the instruction sectional acquisition unit, the acquisition unit identification instruction analysis matching module, the sample bank instruction intelligent matching module and the non-limited voice instruction signal manual capture training module are connected through the intranet in sequence, the sample bank instruction intelligent matching module and the non-limited voice instruction signal manual capture training module are respectively connected with the voice instruction sample bank real-time updating module through the intranet;
the voice instruction real-time updating module is used for uploading a newly updated voice instruction to the voice instruction sample library in real time for storage, the stored voice instruction is fed back to a system platform and is convenient for a user to check, the instruction segmentation acquisition unit is used for performing segmentation acquisition on the voice instruction input by the user, the voice instruction identified by the current user is intelligently identified, the acquisition unit identification instruction analysis matching module is used for performing identification matching on the voice instructions acquired by different acquisition modules, the matching rate of the acquired different instructions is analyzed, the sample library instruction intelligent matching module is used for matching the acquired and screened voice instruction with the sample library to determine whether the voice instruction exists in the current sample library, and the non-limited voice instruction signal manual capturing training module is used for capturing and training the acquired voice instruction which is not in the sample library.
By adopting the technical scheme: the voice instruction sample library real-time updating module comprises an updating instruction sample key word input submodule and an instruction sample key word collecting feedback submodule, the updating instruction sample key word input submodule is used for inputting a voice instruction output by training into the sample library in real time to update and expand a voice instruction template in the sample library, the instruction sample key word collecting feedback submodule is used for collecting voice instructions in the voice instruction sample library and feeding the collected voice instructions back to the system platform, and a user sends the voice instructions to corresponding equipment according to the collected voice instruction set to control the corresponding equipment.
By adopting the technical scheme: the voice recognition and lip language identification device is characterized in that the instruction segmentation acquisition unit comprises an instruction first voice acquisition unit and an instruction second video acquisition unit, the instruction first voice acquisition unit is used for recording voice instructions sent by a user, segmenting and cutting a recording file, performing voice recognition on each segment, the instruction second video acquisition unit is used for recording when the voice instructions are sent by the user, segmenting and cutting a video, performing lip language recognition on each segmented video, summarizing instruction information obtained through voice recognition and lip language recognition according to different segments, wherein the recording file and the video file are cut according to the same time segment, marking segmented data obtained through voice recognition and lip language recognition respectively, and sending the marked data to the acquisition unit recognition instruction analysis and matching module.
By adopting the technical scheme: the acquisition unit identification instruction analysis matching module comprises a fragmentation identification instruction matching rate analysis submodule and a secondary identification adjustment matching submodule, the fragmentation identification instruction matching rate analysis submodule identifies sectional type recording and video files acquired by an instruction first voice acquisition unit and an instruction second video acquisition unit and matches voice identification data and lip language identification data segmented at the same time, the matching rate analysis of the voice and lip language identification data of each segment, the secondary identification adjustment matching submodule is used for adjusting the matching rate of the voice and lip language identification data of each segment when the primary matching rate does not meet the requirement, segmenting the voice file and the video file collected by the first voice collecting unit and the second video collecting unit according to time again, and respectively carrying out voice recognition and lip language recognition on the segmented audio and video files again, and carrying out matching analysis on the re-recognized data.
By adopting the technical scheme: the segmentation recognition instruction matching rate analysis submodule is used for respectively performing segmentation voice recognition and lip language recognition on the collected voice file and video file, matching the voice recognition data and the lip language recognition data of the same time segment according to key words and segmentation explanatory contents in the segment, setting the keyword matching rate of the voice recognition data and the lip language recognition data of different current segments to be F1%, the segmentation explanatory content matching rate to be F2%, setting the keyword matching rate to be Pm%, setting the segmentation explanatory content matching rate to be Pn%, setting the comprehensive matching rate of the voice recognition data and the lip language recognition data in a certain time segment to be F0, and satisfying the formula:
F0=F1%*Pm%+F2%*Pn%
calculating the comprehensive matching degree of the voice recognition data and the lip language recognition data in the current time segment, and calculating the comprehensive matching degree of the collected voice file and the video file in different time segments one by one to be F01、F02、F03、…、F0n-1、F0nSetting the total matching degree of the collected voice commands to meet the following formula:
Figure BDA0002645275700000111
when the total matching degree of different segmented sets of the collected voice instruction meets the formula, judging that the matching degree of the voice instruction is qualified, sending the voice instruction to a sample library for matching, when the total matching degree of different segmented sets of the collected voice instruction does not meet the formula, judging that the matching degree of the voice instruction is unqualified, sending the voice instruction to a secondary recognition adjustment matching submodule, re-segmenting the voice file and the video file collected by a first voice collection unit and a second video collection unit according to time, respectively re-performing voice recognition and lip language recognition on the segmented recording file and video file, performing matching analysis on the re-recognized data, when the formula is met after the secondary recognition, sending the voice instruction to the sample library, and when the formula is still not met after the secondary recognition, judging that the voice instruction does not meet the voice input standard, and feeding back to the user for re-entry.
By adopting the technical scheme: the intelligent matching module of the sample library instruction comprises a limited voice instruction signal matching and marking sub-module and an undefined voice instruction signal manual feedback sub-module, wherein the limited voice instruction signal matching and marking sub-module is used for matching the voice instruction of the user with the instruction signal stored in the sample library, when the voice instruction input by the user exists in the sample library, the instruction is screened out after being marked, the voice instruction is marked as a limited voice instruction signal, the equipment is controlled according to the originally set equipment processing method of the instruction signal, the manual feedback sub-module of the non-limited voice instruction signal is used for matching the voice instruction of the user with the instruction signal stored in the sample library, and when the voice instruction input by the user does not exist in the sample library, judging that the current voice instruction is an undefined voice instruction signal, and sending the instruction to the undefined voice instruction signal artificial capturing training module for artificial training.
By adopting the technical scheme: the non-limited voice instruction signal manual capturing training module comprises a non-limited voice instruction signal simulation device training submodule and a training detection output probability analysis submodule, the non-limited voice instruction signal simulation device training submodule is used for performing simulation device training on non-limited voice instruction signals which do not exist in a sample library, the simulation device is trained for a plurality of times through voice training, a training result is sent to the training detection output probability analysis submodule, and the training detection output probability analysis submodule is used for monitoring and analyzing the results of the voice training of the simulation devices for the plurality of times and judging whether the current voice instruction can be successfully recorded.
By adopting the technical scheme: the training detection output probability analysis submodule trains the non-limited voice command signal for a plurality of times through the simulation equipment, and the operational coefficient of the non-limited voice command signal under the training of the simulation equipment for a plurality of times is set to be Y1、Y2、Y3、…、Yn-1、YnWherein, the operational coefficient is 1-100, the standard coefficient of the current simulation equipment training is set as Yj, the maximum value of the operational data of the simulation equipment training is set as CO, the minimum value of the operational data of the simulation equipment training is set as C1, and the training box of the simulation equipment is monitoredThe lattice rate H satisfies the formula:
Figure BDA0002645275700000121
calculating the qualification rate of the current simulation equipment for training the non-limited voice instruction signal, judging that the current non-limited voice instruction signal can be implemented on the equipment when the qualification rate is greater than a set threshold value, sending the current voice instruction signal to a sample library for storage, judging that the current non-limited voice instruction signal cannot be implemented on the equipment when the qualification rate is less than or equal to the set threshold value, and not processing the voice instruction signal.
A signal accuracy adjustment method for voice instruction capture:
s1: a voice instruction sample library real-time updating module is used for uploading the newly updated voice instruction to a voice instruction sample library in real time for storage, and the stored voice instruction is fed back to a system platform for being convenient for a user to check;
s2: the voice command input by the user is acquired in a fragmentation mode by using a command segmentation acquisition unit, and the voice command identified by the current user is intelligently identified;
s3: the voice commands collected by different collecting modules are identified and matched by using a collecting unit identification command analysis and matching module, and the matching rate of the collected different commands is analyzed;
s4: matching the collected and screened voice instruction with a sample library by using a sample library instruction intelligent matching module, and determining whether the voice instruction exists in the current sample library;
s5: and utilizing an undefined voice instruction signal manual capturing training module to perform capturing training on the collected voice instructions which are not in the sample library.
By adopting the technical scheme: the adjustment method further comprises the following steps:
s1-1: the voice instruction output by training is input into the sample library in real time for updating by utilizing an updating instruction sample key vocabulary input submodule, a voice instruction template in the sample library is expanded, an instruction sample key vocabulary summarizing feedback submodule summarizes the voice instruction in the voice instruction sample library, the summarized voice instruction is fed back to a system platform, and a user sends the voice instruction to corresponding equipment for control according to the summarized voice instruction set;
s2-1: recording a voice instruction sent by a user by utilizing an instruction first voice acquisition unit, segmenting and cutting a recording file, performing voice recognition on each segment, instructing a second video acquisition unit to record a video when the voice instruction is sent by the user, segmenting and cutting the video, performing lip language recognition on each segmented video, summarizing instruction information obtained by the voice recognition and the lip language recognition according to different segments, wherein the recording file and the video file are cut according to the same time segment, marking segmented data obtained by the voice recognition and the lip language recognition respectively, and sending the marked data to an acquisition unit recognition instruction analysis matching module;
s3-1: the method comprises the steps that a segmented voice recording and video recording file acquired by a first voice acquisition unit and a second video acquisition unit is identified by a segmented identification instruction matching rate analysis submodule according to an instruction, segmented voice identification and lip language identification data at the same time are matched, the matching rate of the voice and lip language identification data of each segment is analyzed, when the first matching rate does not meet requirements, a secondary identification adjustment matching submodule conducts segmentation on the voice recording file and the video recording file acquired by the first voice acquisition unit and the second video acquisition unit again according to time, voice identification and lip language identification are conducted on the segmented voice recording file and video recording file again respectively, and matching analysis is conducted on the newly identified data;
s4-1: matching a user voice instruction with an instruction signal stored in a sample library by using a limited voice instruction signal matching and marking sub-module, screening out the instruction after marking the instruction when a voice instruction input by a user exists in the sample library, marking the voice instruction as a limited voice instruction signal, controlling the equipment according to an equipment processing method originally set by the instruction signal, matching the user voice instruction with the instruction signal stored in the sample library by using a non-limited voice instruction signal manual feedback sub-module, judging that the current voice instruction is the non-limited voice instruction signal when the voice instruction input by the user does not exist in the sample library, and sending the instruction to a non-limited voice instruction signal manual capturing and training module for manual training;
s5-1: the method comprises the steps of utilizing a non-limited voice instruction signal simulation device training submodule to carry out simulation device training on non-limited voice instruction signals which do not exist in a sample library, carrying out a plurality of times of training on the simulation device through voice training, sending a training result to a training detection output probability analysis submodule, monitoring and analyzing the result of voice training of a plurality of times of simulation devices through the training detection output probability analysis submodule, and judging whether a current voice instruction can be successfully input or not.
Example 1: the method comprises the steps of limiting conditions, matching voice recognition data and lip language recognition data segmented at the same time according to key words and segmented interpretation contents in the segments, setting the keyword matching rate of the voice recognition data and the lip language recognition data of different current segments to be 96%, the segmented interpretation content matching rate to be 98%, the keyword matching rate to be 40%, the segmented interpretation content matching rate to be 60%, and setting the comprehensive matching degree of the voice recognition data and the lip language recognition data in a certain time segment to be F0, so that the formula is met:
F0=96%*40%+98%*60%=97.2%
calculating the comprehensive matching degree of the voice recognition data and the lip language recognition data in the current time segment, calculating the comprehensive matching degrees of the collected voice file and the video file in different time segments one by one to be 97.2%, 99.4%, 98.9%, 100% and 99.8%, and setting the total matching degree of the collected voice command to meet the following formula:
Figure BDA0002645275700000141
and when the total matching degree of the collected different segment sets of the voice instruction meets the formula, judging that the matching degree of the voice instruction is qualified, and sending the voice instruction to a sample library for matching.
Example 2: the method comprises the following steps that under the condition of limiting, a training detection output probability analysis submodule trains a non-limited voice command signal for a plurality of times through a simulation device, operational coefficients of the non-limited voice command signal under the training of the simulation device for the plurality of times are set to be 78, 84, 77, 88 and 92, a standard coefficient of current simulation device training is set to be 80, the maximum value of operational data of the simulation device training is set to be 99, the minimum value of the operational data of the simulation device training is set to be 60, the training qualification rate H of the simulation device is monitored, and the formula is met:
Figure BDA0002645275700000151
and calculating that the qualification rate of the current simulation equipment for training the non-limited voice instruction signal is 12.8%, the set threshold value is 15%, and the qualification rate of 12.8% is less than the set threshold value 15%, judging that the current non-limited voice instruction signal cannot be implemented on the equipment, and not processing the voice instruction signal.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A signal accuracy adjustment system for voice command capture, characterized by: the system comprises a voice instruction sample library real-time updating module, an instruction segmented acquisition unit, an acquisition unit identification instruction analysis matching module, a sample library instruction intelligent matching module and a non-limited voice instruction signal artificial capturing training module, wherein the voice instruction sample library real-time updating module, the instruction segmented acquisition unit, the acquisition unit identification instruction analysis matching module, the sample library instruction intelligent matching module and the non-limited voice instruction signal artificial capturing training module are sequentially connected through an intranet;
the voice instruction real-time updating module is used for uploading a newly updated voice instruction to the voice instruction sample library in real time for storage, the stored voice instruction is fed back to a system platform and is convenient for a user to check, the instruction segmentation acquisition unit is used for performing segmentation acquisition on the voice instruction input by the user, the voice instruction identified by the current user is intelligently identified, the acquisition unit identification instruction analysis matching module is used for performing identification matching on the voice instructions acquired by different acquisition modules, the matching rate of the acquired different instructions is analyzed, the sample library instruction intelligent matching module is used for matching the acquired and screened voice instruction with the sample library to determine whether the voice instruction exists in the current sample library, and the non-limited voice instruction signal manual capturing training module is used for capturing and training the acquired voice instruction which is not in the sample library.
2. A signal accuracy adjustment system for voice instruction capture as claimed in claim 1, wherein: the voice instruction sample library real-time updating module comprises an updating instruction sample key word input submodule and an instruction sample key word collecting feedback submodule, the updating instruction sample key word input submodule is used for inputting a voice instruction output by training into the sample library in real time to update and expand a voice instruction template in the sample library, the instruction sample key word collecting feedback submodule is used for collecting voice instructions in the voice instruction sample library and feeding the collected voice instructions back to the system platform, and a user sends the voice instructions to corresponding equipment according to the collected voice instruction set to control the corresponding equipment.
3. A signal accuracy adjustment system for voice instruction capture as claimed in claim 1, wherein: the voice recognition and lip language identification device is characterized in that the instruction segmentation acquisition unit comprises an instruction first voice acquisition unit and an instruction second video acquisition unit, the instruction first voice acquisition unit is used for recording voice instructions sent by a user, segmenting and cutting a recording file, performing voice recognition on each segment, the instruction second video acquisition unit is used for recording when the voice instructions are sent by the user, segmenting and cutting a video, performing lip language recognition on each segmented video, summarizing instruction information obtained through voice recognition and lip language recognition according to different segments, wherein the recording file and the video file are cut according to the same time segment, marking segmented data obtained through voice recognition and lip language recognition respectively, and sending the marked data to the acquisition unit recognition instruction analysis and matching module.
4. A signal accuracy adjustment system for voice instruction capture as claimed in claim 1, wherein: the acquisition unit identification instruction analysis matching module comprises a fragmentation identification instruction matching rate analysis submodule and a secondary identification adjustment matching submodule, the fragmentation identification instruction matching rate analysis submodule identifies sectional type recording and video files acquired by an instruction first voice acquisition unit and an instruction second video acquisition unit and matches voice identification data and lip language identification data segmented at the same time, the matching rate analysis of the voice and lip language identification data of each segment, the secondary identification adjustment matching submodule is used for adjusting the matching rate of the voice and lip language identification data of each segment when the primary matching rate does not meet the requirement, segmenting the voice file and the video file collected by the first voice collecting unit and the second video collecting unit according to time again, and respectively carrying out voice recognition and lip language recognition on the segmented audio and video files again, and carrying out matching analysis on the re-recognized data.
5. The system of claim 4, wherein the signal accuracy adjustment system comprises: the segmentation recognition instruction matching rate analysis submodule is used for respectively performing segmentation voice recognition and lip language recognition on the collected voice file and video file, matching the voice recognition data and the lip language recognition data of the same time segment according to key words and segmentation explanatory contents in the segment, setting the keyword matching rate of the voice recognition data and the lip language recognition data of different current segments to be F1%, the segmentation explanatory content matching rate to be F2%, setting the keyword matching rate to be Pm%, setting the segmentation explanatory content matching rate to be Pn%, setting the comprehensive matching rate of the voice recognition data and the lip language recognition data in a certain time segment to be F0, and satisfying the formula:
F0=F1%*Pm%+F2%*Pn%
calculating the comprehensive matching degree of the voice recognition data and the lip language recognition data in the current time segment, and calculating the comprehensive matching degree of the collected voice file and the video file in different time segments one by one to be F01、F02、F03、…、F0n-1、F0nSetting the total matching degree of the collected voice commands to meet the following formula:
Figure FDA0002645275690000031
when the total matching degree of different segmented sets of the collected voice instruction meets the formula, judging that the matching degree of the voice instruction is qualified, sending the voice instruction to a sample library for matching, when the total matching degree of different segmented sets of the collected voice instruction does not meet the formula, judging that the matching degree of the voice instruction is unqualified, sending the voice instruction to a secondary recognition adjustment matching submodule, re-segmenting the voice file and the video file collected by a first voice collection unit and a second video collection unit according to time, respectively re-performing voice recognition and lip language recognition on the segmented recording file and video file, performing matching analysis on the re-recognized data, when the formula is met after the secondary recognition, sending the voice instruction to the sample library, and when the formula is still not met after the secondary recognition, judging that the voice instruction does not meet the voice input standard, and feeding back to the user for re-entry.
6. A signal accuracy adjustment system for voice instruction capture as claimed in claim 1, wherein: the intelligent matching module of the sample library instruction comprises a limited voice instruction signal matching and marking sub-module and an undefined voice instruction signal manual feedback sub-module, wherein the limited voice instruction signal matching and marking sub-module is used for matching the voice instruction of the user with the instruction signal stored in the sample library, when the voice instruction input by the user exists in the sample library, the instruction is screened out after being marked, the voice instruction is marked as a limited voice instruction signal, the equipment is controlled according to the originally set equipment processing method of the instruction signal, the manual feedback sub-module of the non-limited voice instruction signal is used for matching the voice instruction of the user with the instruction signal stored in the sample library, and when the voice instruction input by the user does not exist in the sample library, judging that the current voice instruction is an undefined voice instruction signal, and sending the instruction to the undefined voice instruction signal artificial capturing training module for artificial training.
7. A signal accuracy adjustment system for voice instruction capture as claimed in claim 1, wherein: the non-limited voice instruction signal manual capturing training module comprises a non-limited voice instruction signal simulation device training submodule and a training detection output probability analysis submodule, the non-limited voice instruction signal simulation device training submodule is used for performing simulation device training on non-limited voice instruction signals which do not exist in a sample library, the simulation device is trained for a plurality of times through voice training, a training result is sent to the training detection output probability analysis submodule, and the training detection output probability analysis submodule is used for monitoring and analyzing the results of the voice training of the simulation devices for the plurality of times and judging whether the current voice instruction can be successfully recorded.
8. The system of claim 7, wherein the signal accuracy adjustment system comprises: the training detection output probability analysis submodule trains the non-limited voice command signal for a plurality of times through the simulation equipment, and the operational coefficient of the non-limited voice command signal under the training of the simulation equipment for a plurality of times is set to be Y1、Y2、Y3、…、Yn-1、YnWherein, the operational coefficient is 1-100, the standard coefficient of the current simulation equipment training is set as Yj, the maximum value of the operational data of the simulation equipment training is set as CO, the minimum value of the operational data of the simulation equipment training is set as C1, the training pass rate H of the simulation equipment is monitored, and the formula is satisfied:
Figure FDA0002645275690000041
calculating the qualification rate of the current simulation equipment for training the non-limited voice instruction signal, judging that the current non-limited voice instruction signal can be implemented on the equipment when the qualification rate is greater than a set threshold value, sending the current voice instruction signal to a sample library for storage, judging that the current non-limited voice instruction signal cannot be implemented on the equipment when the qualification rate is less than or equal to the set threshold value, and not processing the voice instruction signal.
9. A signal accuracy adjustment method for voice command capture, characterized by:
s1: a voice instruction sample library real-time updating module is used for uploading the newly updated voice instruction to a voice instruction sample library in real time for storage, and the stored voice instruction is fed back to a system platform for being convenient for a user to check;
s2: the voice command input by the user is acquired in a fragmentation mode by using a command segmentation acquisition unit, and the voice command identified by the current user is intelligently identified;
s3: the voice commands collected by different collecting modules are identified and matched by using a collecting unit identification command analysis and matching module, and the matching rate of the collected different commands is analyzed;
s4: matching the collected and screened voice instruction with a sample library by using a sample library instruction intelligent matching module, and determining whether the voice instruction exists in the current sample library;
s5: and utilizing an undefined voice instruction signal manual capturing training module to perform capturing training on the collected voice instructions which are not in the sample library.
10. The signal accuracy adjusting method for voice instruction capturing as claimed in claim 9, wherein: the adjustment method further comprises the following steps:
s1-1: the voice instruction output by training is input into the sample library in real time for updating by utilizing an updating instruction sample key vocabulary input submodule, a voice instruction template in the sample library is expanded, an instruction sample key vocabulary summarizing feedback submodule summarizes the voice instruction in the voice instruction sample library, the summarized voice instruction is fed back to a system platform, and a user sends the voice instruction to corresponding equipment for control according to the summarized voice instruction set;
s2-1: recording a voice instruction sent by a user by utilizing an instruction first voice acquisition unit, segmenting and cutting a recording file, performing voice recognition on each segment, instructing a second video acquisition unit to record a video when the voice instruction is sent by the user, segmenting and cutting the video, performing lip language recognition on each segmented video, summarizing instruction information obtained by the voice recognition and the lip language recognition according to different segments, wherein the recording file and the video file are cut according to the same time segment, marking segmented data obtained by the voice recognition and the lip language recognition respectively, and sending the marked data to an acquisition unit recognition instruction analysis matching module;
s3-1: the method comprises the steps that a segmented voice recording and video recording file acquired by a first voice acquisition unit and a second video acquisition unit is identified by a segmented identification instruction matching rate analysis submodule according to an instruction, segmented voice identification and lip language identification data at the same time are matched, the matching rate of the voice and lip language identification data of each segment is analyzed, when the first matching rate does not meet requirements, a secondary identification adjustment matching submodule conducts segmentation on the voice recording file and the video recording file acquired by the first voice acquisition unit and the second video acquisition unit again according to time, voice identification and lip language identification are conducted on the segmented voice recording file and video recording file again respectively, and matching analysis is conducted on the newly identified data;
s4-1: matching a user voice instruction with an instruction signal stored in a sample library by using a limited voice instruction signal matching and marking sub-module, screening out the instruction after marking the instruction when a voice instruction input by a user exists in the sample library, marking the voice instruction as a limited voice instruction signal, controlling the equipment according to an equipment processing method originally set by the instruction signal, matching the user voice instruction with the instruction signal stored in the sample library by using a non-limited voice instruction signal manual feedback sub-module, judging that the current voice instruction is the non-limited voice instruction signal when the voice instruction input by the user does not exist in the sample library, and sending the instruction to a non-limited voice instruction signal manual capturing and training module for manual training;
s5-1: the method comprises the steps of utilizing a non-limited voice instruction signal simulation device training submodule to carry out simulation device training on non-limited voice instruction signals which do not exist in a sample library, carrying out a plurality of times of training on the simulation device through voice training, sending a training result to a training detection output probability analysis submodule, monitoring and analyzing the result of voice training of a plurality of times of simulation devices through the training detection output probability analysis submodule, and judging whether a current voice instruction can be successfully input or not.
CN202010852699.XA 2020-08-22 2020-08-22 Signal accuracy adjusting system and method for voice instruction capture Active CN111968628B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010852699.XA CN111968628B (en) 2020-08-22 2020-08-22 Signal accuracy adjusting system and method for voice instruction capture
CN202110561900.3A CN113436618A (en) 2020-08-22 2020-08-22 Signal accuracy adjusting system for voice instruction capture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010852699.XA CN111968628B (en) 2020-08-22 2020-08-22 Signal accuracy adjusting system and method for voice instruction capture

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202110561900.3A Division CN113436618A (en) 2020-08-22 2020-08-22 Signal accuracy adjusting system for voice instruction capture

Publications (2)

Publication Number Publication Date
CN111968628A true CN111968628A (en) 2020-11-20
CN111968628B CN111968628B (en) 2021-06-25

Family

ID=73390149

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010852699.XA Active CN111968628B (en) 2020-08-22 2020-08-22 Signal accuracy adjusting system and method for voice instruction capture
CN202110561900.3A Withdrawn CN113436618A (en) 2020-08-22 2020-08-22 Signal accuracy adjusting system for voice instruction capture

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202110561900.3A Withdrawn CN113436618A (en) 2020-08-22 2020-08-22 Signal accuracy adjusting system for voice instruction capture

Country Status (1)

Country Link
CN (2) CN111968628B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742687A (en) * 2021-08-31 2021-12-03 深圳时空数字科技有限公司 Internet of things control method and system based on artificial intelligence
CN116347134A (en) * 2023-03-29 2023-06-27 深圳市联合信息技术有限公司 Set top box audio processing system and method based on artificial intelligence teaching classroom

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62239231A (en) * 1986-04-10 1987-10-20 Kiyarii Rabo:Kk Speech recognition method by inputting lip picture
CN102945074A (en) * 2011-10-12 2013-02-27 微软公司 Population of lists and tasks from captured voice and audio content
CN104166724A (en) * 2014-08-26 2014-11-26 四川亿信信用评估有限公司 Method for Chinese speech capable of capturing key words to be applied to browser
CN104834900A (en) * 2015-04-15 2015-08-12 常州飞寻视讯信息科技有限公司 Method and system for vivo detection in combination with acoustic image signal
CN108292500A (en) * 2015-12-22 2018-07-17 英特尔公司 Technology for using the sentence tail of syntactic consistency to detect
US20180204568A1 (en) * 2017-01-13 2018-07-19 Alicia J. Ginsberg System for filtering potential immigration threats through speech analysis
CN108304072A (en) * 2018-02-09 2018-07-20 北京北行科技有限公司 A kind of VR virtual worlds role's expression implanted device and method for implantation
CN109271915A (en) * 2018-09-07 2019-01-25 北京市商汤科技开发有限公司 False-proof detection method and device, electronic equipment, storage medium
CN109410924A (en) * 2017-08-14 2019-03-01 三星电子株式会社 Recognition methods and identification equipment
CN109599105A (en) * 2018-11-30 2019-04-09 广州富港万嘉智能科技有限公司 Dish method, system and storage medium are taken based on image and the automatic of speech recognition
US20190244623A1 (en) * 2018-02-02 2019-08-08 Max T. Hall Method of translating and synthesizing a foreign language
CN110221693A (en) * 2019-05-23 2019-09-10 南京双路智能科技有限公司 A kind of intelligent retail terminal operating system based on human-computer interaction
CN110570862A (en) * 2019-10-09 2019-12-13 三星电子(中国)研发中心 voice recognition method and intelligent voice engine device
CN111191544A (en) * 2019-12-20 2020-05-22 恒银金融科技股份有限公司 Active mobile service method and system for somatosensory motion recognition equipment
CN111326152A (en) * 2018-12-17 2020-06-23 南京人工智能高等研究院有限公司 Voice control method and device

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62239231A (en) * 1986-04-10 1987-10-20 Kiyarii Rabo:Kk Speech recognition method by inputting lip picture
CN102945074A (en) * 2011-10-12 2013-02-27 微软公司 Population of lists and tasks from captured voice and audio content
CN104166724A (en) * 2014-08-26 2014-11-26 四川亿信信用评估有限公司 Method for Chinese speech capable of capturing key words to be applied to browser
CN104834900A (en) * 2015-04-15 2015-08-12 常州飞寻视讯信息科技有限公司 Method and system for vivo detection in combination with acoustic image signal
CN108292500A (en) * 2015-12-22 2018-07-17 英特尔公司 Technology for using the sentence tail of syntactic consistency to detect
US20180204568A1 (en) * 2017-01-13 2018-07-19 Alicia J. Ginsberg System for filtering potential immigration threats through speech analysis
CN109410924A (en) * 2017-08-14 2019-03-01 三星电子株式会社 Recognition methods and identification equipment
US20190244623A1 (en) * 2018-02-02 2019-08-08 Max T. Hall Method of translating and synthesizing a foreign language
CN108304072A (en) * 2018-02-09 2018-07-20 北京北行科技有限公司 A kind of VR virtual worlds role's expression implanted device and method for implantation
CN109271915A (en) * 2018-09-07 2019-01-25 北京市商汤科技开发有限公司 False-proof detection method and device, electronic equipment, storage medium
CN109599105A (en) * 2018-11-30 2019-04-09 广州富港万嘉智能科技有限公司 Dish method, system and storage medium are taken based on image and the automatic of speech recognition
CN111326152A (en) * 2018-12-17 2020-06-23 南京人工智能高等研究院有限公司 Voice control method and device
CN110221693A (en) * 2019-05-23 2019-09-10 南京双路智能科技有限公司 A kind of intelligent retail terminal operating system based on human-computer interaction
CN110570862A (en) * 2019-10-09 2019-12-13 三星电子(中国)研发中心 voice recognition method and intelligent voice engine device
CN111191544A (en) * 2019-12-20 2020-05-22 恒银金融科技股份有限公司 Active mobile service method and system for somatosensory motion recognition equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PAN ZHOU ET AL.: "MODALITY ATTENTION FOR END-TO-END AUDIO-VISUAL SPEECH RECOGNITION", 《ICASSP 2019》 *
袁长海 等: "基于关键词捕捉的中文语音网页浏览器", 《计算机工程与应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742687A (en) * 2021-08-31 2021-12-03 深圳时空数字科技有限公司 Internet of things control method and system based on artificial intelligence
CN116347134A (en) * 2023-03-29 2023-06-27 深圳市联合信息技术有限公司 Set top box audio processing system and method based on artificial intelligence teaching classroom
CN116347134B (en) * 2023-03-29 2024-01-30 深圳市联合信息技术有限公司 Set top box audio processing system and method based on artificial intelligence teaching classroom

Also Published As

Publication number Publication date
CN113436618A (en) 2021-09-24
CN111968628B (en) 2021-06-25

Similar Documents

Publication Publication Date Title
CN103700370B (en) A kind of radio and television speech recognition system method and system
US8793127B2 (en) Method and apparatus for automatically determining speaker characteristics for speech-directed advertising or other enhancement of speech-controlled devices or services
CN105957531B (en) Speech content extraction method and device based on cloud platform
CN110211594B (en) Speaker identification method based on twin network model and KNN algorithm
CN112735383A (en) Voice signal processing method, device, equipment and storage medium
Akbacak et al. Environmental sniffing: noise knowledge estimation for robust speech systems
CN111968628B (en) Signal accuracy adjusting system and method for voice instruction capture
CN112397054B (en) Power dispatching voice recognition method
Ghai et al. Emotion recognition on speech signals using machine learning
Kaushik et al. Automatic audio sentiment extraction using keyword spotting.
US11776532B2 (en) Audio processing apparatus and method for audio scene classification
Baranwal et al. A speaker invariant speech recognition technique using HFCC features in isolated Hindi words
Nyodu et al. Automatic identification of Arunachal language using K-nearest neighbor algorithm
Kamble et al. Emotion recognition for instantaneous Marathi spoken words
Raghib et al. Emotion analysis and speech signal processing
Singh et al. Speaker Recognition Assessment in a Continuous System for Speaker Identification
CN108520740B (en) Audio content consistency analysis method and analysis system based on multiple characteristics
CN110807370B (en) Conference speaker identity noninductive confirmation method based on multiple modes
EP0177854A1 (en) Keyword recognition system using template-concatenation model
Zhou et al. Environmental sound classification of western black-crowned gibbon habitat based on spectral subtraction and VGG16
Iswarya et al. Speech query recognition for Tamil language using wavelet and wavelet packets
Olteanu et al. Fusion of speech techniques for automatic environmental sound recognition
Mansoor et al. Keyword identification framework for speech communication on construction sites
Sardar Compensation of variability using median and i-vector+ PLDA for speaker identification of whispering sound
Gubka et al. Universal approach for sequential audio pattern search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210603

Address after: 210000 4th floor, building C, Wanbo Science Park, 20 Fengxin Road, Yuhuatai District, Nanjing City, Jiangsu Province

Applicant after: NANJING GUIJI INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: No. 556, Changjiang Road, high tech Zone, Suzhou City, Jiangsu Province

Applicant before: Peng Lingling

GR01 Patent grant
GR01 Patent grant