CN109616111A - A kind of scene interactivity control method based on speech recognition - Google Patents

A kind of scene interactivity control method based on speech recognition Download PDF

Info

Publication number
CN109616111A
CN109616111A CN201811581756.4A CN201811581756A CN109616111A CN 109616111 A CN109616111 A CN 109616111A CN 201811581756 A CN201811581756 A CN 201811581756A CN 109616111 A CN109616111 A CN 109616111A
Authority
CN
China
Prior art keywords
snapshot
speech
institute
speech recognition
option
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811581756.4A
Other languages
Chinese (zh)
Other versions
CN109616111B (en
Inventor
钱苏晋
门涛
刘鹏
董杰
周金涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jibei Electric Power Co Ltd Intelligent Distribution Network Center
BEIJING TECHSTAR TECHNOLOGY Co Ltd
Original Assignee
State Grid Jibei Electric Power Co Ltd Intelligent Distribution Network Center
BEIJING TECHSTAR TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jibei Electric Power Co Ltd Intelligent Distribution Network Center, BEIJING TECHSTAR TECHNOLOGY Co Ltd filed Critical State Grid Jibei Electric Power Co Ltd Intelligent Distribution Network Center
Priority to CN201811581756.4A priority Critical patent/CN109616111B/en
Publication of CN109616111A publication Critical patent/CN109616111A/en
Application granted granted Critical
Publication of CN109616111B publication Critical patent/CN109616111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present invention provides a kind of scene interactivity control method based on speech recognition, comprising: the pre-established snapshot library of central control system;After the speech recognition controlled program of central control system is changed into state of activation, the speech recognition controlled program of central control system starts voice recording module, records the voice command from user;Speech recognition module carries out intention assessment to institute's speech commands, identifies one of following four type: snapshot type, option type, confirmation/cancellation type and other types, and execute respectively.Advantage are as follows: a kind of scene interactivity control method based on speech recognition provided by the invention combines central control system and speech recognition technology, realizes the function of replacing conventional input device control central control system with language, has the advantages that user experience is good.

Description

A kind of scene interactivity control method based on speech recognition
Technical field
The invention belongs to scene interactivity control technology fields, and in particular to a kind of scene interactivity control based on speech recognition Method.
Background technique
In recent years, as China's economy shows the situation of rapid growth, the application demand in government and enterprise meeting-place also by Gradually change from single to diversification;It is each that meeting-place application is related to meeting, scheduling controlling, emergency command, daily operation, Centralized Monitoring etc. Kind function;Meeting-place device category is various, it may for example comprise lamp, speaker, mosaic screen, TV, video camera, projector, lifting display The equipment such as device, video disc player, matrix, mosaic screen processor.
Currently, meeting-place controls main method are as follows: according to each meeting field mode, controlled manually to all kinds of meeting field devices System, for example, the brightness of lamp under field mode, can be controlled respectively at certain, speaker is opened, open video camera and by display register To a certain height, to meet meeting demand.Another kind can under field mode, then control respectively the brightness of lamp, open video disc player, will Display register is to another height, to meet meeting demand.
It is above-mentioned can field control method have the problem that using by the way of manual, to each controlled device carry out control with It adjusts, low, worker's heavy workload the deficiency with control efficiency.
Summary of the invention
In view of the defects existing in the prior art, the present invention provides a kind of scene interactivity control method based on speech recognition, It can effectively solve the above problems.
The technical solution adopted by the invention is as follows:
The present invention provides a kind of scene interactivity control method based on speech recognition, comprising the following steps:
Step 1, the pre-established snapshot library of central control system;The snapshot library stores several snapshot titles and snapshot field The exectorial corresponding relationship of scape;Order is executed by the snapshot scene, the central control system carries out meeting field device Control;
Speech recognition controlled program is to avoid maloperation, is usually in unwakened dormant state;At this point, waking up language prison Program is listened persistently to be in an open state;The continuously off state of main speech oracle listener;
Step 2, the wake-up language oracle listener is monitored in real time, judges whether to listen to wake-up word;If do not listened to Word is waken up, then is persistently monitored;If listening to wake-up word, 3 are thened follow the steps;
Step 3, central control system closes the wake-up language oracle listener, opens the main speech oracle listener, in turn The speech recognition controlled program of the central control system is waken up, at this time the speech recognition controlled program of the central control system It is changed into state of activation;
Step 4, the speech recognition controlled program of the central control system starts voice recording module, passes through the voice Module recording is recorded from the voice command of user, and stores the institute's speech commands being recorded to;Meanwhile in the Speech Record During molding block records institute's speech commands, speech volume waveform is shown by display module;
Step 5, the institute's speech commands being recorded to are transferred to speech recognition module by the voice recording module;
Step 6, the speech recognition module carries out preliminary voice validity identification to institute's speech commands, if be identified as Function thens follow the steps 7;If unidentified success, to the prompt information of user feedback recognition failures;
Step 7, the speech recognition module carries out intention assessment to institute's speech commands, identifies in following four type One kind: snapshot type, option type, confirmation/cancellation type and other types;
If institute's speech commands are snapshot type, step 8 is executed by snapshot type submodule;If the voice Order is option type, then executes step 9 by option type submodule;If institute's speech commands are confirmation/cancellation type, Step 10 is then executed by confirmation/cancellation type submodule;If institute's speech commands are other types, pass through other types Submodule executes step 11;
Step 8: voice command corresponding with snapshot type is executed by snapshot type submodule, comprising the following steps:
Step 8.1, if institute's speech commands are snapshot type, the resolution score of institute's speech commands is obtained, and sentence Whether the resolution score that breaks is more than threshold value, if be not above, shows that institute's speech commands are not clear enough, thens follow the steps 8.2;If it does, showing that institute's speech commands are clear, 8.3 are thened follow the steps;
Step 8.2, it carries out further recognition result uniqueness to institute's speech commands to judge, it may be assumed that judge recognition result It whether is multitone situation, if it is not, then showing that voice command corresponds to the snapshot title of unique pronunciation in snapshot library, and by aobvious Show whether module output executes the prompt information of snapshot further confirmed that, meanwhile, remember in confirmation/cancellation context allocation list This confirmation/cancellation object information is recorded, subsequent step is then executed by confirmation/cancellation type submodule;If it is, showing Voice command corresponds to the snapshot title of multitone in snapshot library, and each snapshot title of multitone is formed multitone snapshot results collection It closes, and the multitone snapshot results set is shown by display module, meanwhile, this choosing is recorded in option context allocation list Item information, and subsequent step is executed by the option type submodule;
Step 8.3, it carries out further recognition result uniqueness to institute's speech commands to judge, it may be assumed that judge recognition result Whether it is multitone situation, if it is not, then showing that voice command corresponds to the snapshot title of unique pronunciation in snapshot library, and directly holds Row snapshot scene command corresponding with the snapshot title;If it is, showing that voice command corresponds to the fast of multitone in snapshot library Multitone snapshot results set is formed according to title, and by each snapshot title of multitone, and the multitone is shown by display module Snapshot results set, meanwhile, the minute book occurrences option information in option configuration table, and by the option type submodule execute after Continuous step;
After option context allocation list the Save option information, when any next voice from user is ordered After order is performed, that is, empty the option context allocation list;
After the confirmation/cancellation context allocation list storage confirmation/cancellation object information, come when any next After being performed from the voice command in user, that is, empty the confirmation/cancellation context allocation list;
Step 9, voice command corresponding with option type is executed by option type submodule, comprising the following steps:
If institute's speech commands are option type, the option context allocation list is first looked for, judges the option Whether context allocation list is empty, if being not sky, shows there is information above corresponding with institute's speech commands, then according to institute The option information stored in option context allocation list and institute's speech commands are stated, corresponding snapshot scene is directly executed Order;If the option context allocation list is sky, shows that there is no information above, then voice is exported by display module The prompt information of recognition failures;
Step 10, voice command corresponding with confirmation/cancellation type is executed by confirmation/cancellation type submodule, including Following steps:
If institute's speech commands are confirmation/cancellation type, the confirmation/cancellation context allocation list is first looked for, is sentenced Whether the confirmation/cancellation context the allocation list that breaks is empty, if being not sky, shows exist on corresponding with institute's speech commands Literary information, then according to the confirmation/cancellation object information and the voice stored in the confirmation/cancellation context allocation list Order, directly executes corresponding snapshot scene command;If the confirmation/cancellation context allocation list is sky, show not deposit Information above then exports the prompt information of speech recognition failure by display module;
Step 11, voice command corresponding with other types is executed by other types submodule, comprising the following steps: logical Cross the prompt information of display module output speech recognition failure.
Preferably, configuration wakes up button, when the wake-up button is clicked, manually by the language of the central control system It is state of activation that sound identification control program wakes up from dormant state.
Preferably, the prompt information of speech recognition failure is exported by display module specifically: play voice prompting apology Sentence, while exporting the prompt quasi-sentence of replacement voice command.
Preferably, in step 1, snapshot library dynamic real-time update that the central control system is established.
Preferably, the central control system to can field device control mode include: touch click screen, remote control pen by Key triggering and speech recognition controlled.
A kind of scene interactivity control method based on speech recognition provided by the invention has the advantage that
A kind of scene interactivity control method based on speech recognition provided by the invention knows central control system and voice Other technology combines, and realizes the function of replacing conventional input device control central control system with language, has user experience good The advantages of.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the scene interactivity control method based on speech recognition provided by the invention.
Specific embodiment
In order to which the technical problems, technical solutions and beneficial effects solved by the present invention is more clearly understood, below in conjunction with Accompanying drawings and embodiments, the present invention will be described in further detail.It should be appreciated that specific embodiment described herein only to It explains the present invention, is not intended to limit the present invention.
Key Term is explained:
Controlled device: support live traffic hall function infrastructure device, as large screen system, central air-conditioning, work station, Sound reinforcement system, light management module etc..
Scene mode (hereinafter referred to as snapshot): the name that control is defined in service layer is combined for each controlled device in scene Claim, such as contingency mode, daily monitoring mode.
Central control system: control system, which refers to, carries out central controlled equipment to the various equipment such as sound, light, electricity.It is answered For multi-media classroom, the multifunctional conference Room, command and control center, intelligent family etc., user's available buttons formula control panel, The equipment such as computer display, touch screen and wireless remote control pass through computer and central control system software control projector, exhibition Show the equipment such as platform, video disc player, video recorder.
In the latest 20 years, speech recognition technology obtains marked improvement, starts to move towards market from laboratory.It is contemplated that not Come in 10 years, speech recognition technology will enter industry, household electrical appliances, communication, automotive electronics, medical treatment, home services, consumption electronic product Etc. every field.
Central control system constantly derives plurality of application scenes according to business demand in practical applications, and such as demonstration converges Report, daily monitoring, production scheduling etc..User with the terminal of system interaction gradually from desktop computer toward more small and exquisite, mobile Aspect direction is developed.
Therefore, the present invention provides a kind of scene interactivity control method based on speech recognition, by speech recognition technology in Centre control system combines, and is related to a kind of scene interactivity control method for using voice as instruction media, the scene interactivity control Method processed is applied to the halls such as business monitoring, scheduling control scene, as grid dispatching center, public security monitor hall, fight of army Command hall etc..
The present invention utilizes speech recognition technology, and the speech message of speaker is identified, and is analyzed and processed, and such as knows It is clipped to and transfers to central control system to carry out execution concrete operations comprising effective snapshot name, realize liberation both hands control equipment Purpose.
The realization of this method is divided into hardware and software and handles two parts composition.
1, hardware is disposed
Need audio collecting device to be connected to central control system controlling terminal: the mobile terminal with Bluetooth function uses Bluetooth headset connection, language acquisition and voice feedback are completed by bluetooth headset;Not having the controlling terminal use of Bluetooth function has Line microphone, voice feedback need additional configuration loudspeaker (microphone sound integral machine or the on demand expansion at intervention scene to can be used Sound system).
2, software realization process
This system is divided into speech recognition module and central control system, and wherein speech recognition module is this system emphasis application The technical point of protection, central control system are controlled according to the equipment of snapmap after the instruction for receiving speech recognition module and are assisted Negotiation physical link configures to target device and issues control instruction, quickly obtains required scene mode, improves control efficiency.This The emphasis of invention is how to identify to the voice from user, and correspond to the snapshot title in snapshot library;And After being fitted on the snapshot title in snapshot library, it is only necessary to execute corresponding control instruction, control instruction can be issued to target device.
In the present invention, link is waken up to avoid maloperation from being added, therefore business scenario is divided into wake-up and speech recognition two Point, and the corresponding logical process method of G- Design is disagreed according to user, specific business judges that process is as shown in Figure 1. For system, the sole operation path of user is exactly to say voice, and system devises unique service entrance for monitoring The voice that user says.
With reference to Fig. 1, a kind of scene interactivity control method based on speech recognition, comprising the following steps:
Step 1, the pre-established snapshot library of central control system;The snapshot library stores several snapshot titles and snapshot field The exectorial corresponding relationship of scape;Order is executed by the snapshot scene, the central control system carries out meeting field device Control;Central control system includes but is not limited to the control mode of meeting field device: touching and clicks screen, remote control pen key triggers And speech recognition controlled.
Such as: snapshot title be respectively as follows: the suitable mode of conference model, favour, visual scheduling scene, section's petrochemical industry scheduling scenario, Light triumphant mode, light standard-sized sheet desert formula etc. entirely.Each mode corresponds to one group of correspondence executed to each controlled device Instruction.Wherein, in practical application, snapshot title the case where there are wrong words, polyphone.For example, in light standard-sized sheet desert formula " desert " is wrong word;The two snapshot title phonetics of the suitable mode of favour and conference model are identical, and only the tone of partial words is different, quilt It is considered multitone snapshot title.
The subsequent scene interactivity control method based on speech recognition of the present invention, can be realized to there are wrong word, exist The identification and execution of the snapshot title of polyphone.
In addition, the snapshot library that the central control system is established being capable of dynamic real-time update.That is: central control system permits Family allowable according to business needs to create different snapshots, and (actually include inside each snapshot is the pre- of one or more controlled devices Control message), and be the customized name of the snapshot.And the name set of these snapshots is the effective of voice control function Range.
Speech recognition controlled program is to avoid maloperation, is usually in unwakened dormant state;At this point, waking up language prison Program is listened persistently to be in an open state;The continuously off state of main speech oracle listener;
In practical application, wake-up button can also be configured, when the wake-up button is clicked, is manually controlled the center It is state of activation that the speech recognition controlled program of system processed wakes up from dormant state.
The design principle of wakeup process are as follows:
System is in order to avoid (such as multiple users' user misoperation are talking, and may refer to the identifiable order of system Sentence causes snapshot accidentally to execute) so wake-up mechanism is increased, screen locking/unlocking mechanism of similar mobile phone.That is: it is not waking up When, system is similar to screen lock state;When state of activation, system is in the unlocked state.
In synchronization, language oracle listener and main speech oracle listener are waken up, only one is in an open state the two, another It is a to be in off state.
Step 2, the wake-up language oracle listener is monitored in real time, judges whether to listen to wake-up word;If do not listened to Word is waken up, then is persistently monitored;If listening to wake-up word, 3 are thened follow the steps;
Step 3, central control system closes the wake-up language oracle listener, opens the main speech oracle listener, in turn The speech recognition controlled program of the central control system is waken up, at this time the speech recognition controlled program of the central control system It is changed into state of activation;
For example, user says some word when central control system is in unwakened dormant state, the word quilt It wakes up language oracle listener to listen to, then, wakes up language oracle listener and judge whether the word is to wake up word;Wherein, waking up word is System pre-customized word according to demand, for example, " the small small perseverance of perseverance ";If it is word is waken up, then main speech oracle listener is opened;Together When, display module export speech waveform feedback, prompt user currently capturing sound, and have voice output prompt " voice helps Hand has been turned on ";As identification does not provide any feedback less than word, system is waken up.
Step 4, the speech recognition controlled program of the central control system starts voice recording module, passes through the voice Module recording is recorded from the voice command of user, and stores the institute's speech commands being recorded to;Meanwhile in the Speech Record During molding block records institute's speech commands, speech volume waveform is shown by display module;
Specifically, central control system is in active state, user says voice command, center control system using mandarin System both executed " voice recording " operation and " waveform feedback "." voice recording " is to wait user to complete current speech information Expression, and give speech recognition module to analyze voice messaging progress memory storage;" waveform feedback " is to give User records the feedback of quality when stating voice messaging, as waveform is unobvious, then it represents that user speech quality is low, prompts user It improves volume or furthers at a distance from audio collecting device.
Step 5, the institute's speech commands being recorded to are transferred to speech recognition module by the voice recording module;
Step 6, the speech recognition module carries out preliminary voice validity identification to institute's speech commands, if be identified as Function thens follow the steps 7;If unidentified success, to the prompt information of user feedback recognition failures;
Specifically, the voice command being recorded to is transferred to speech recognition module by voice recording module;Speech recognition module Judge whether to recognize information, if recognized, carries out subsequent processing;If do not recognized, voice prompting apology is played Sentence, while display module output prompt quasi-sentence, such as " you can ask me ... ".
Step 7, the speech recognition module carries out intention assessment to institute's speech commands, identifies in following four type One kind: snapshot type, option type, confirmation/cancellation type and other types;
If institute's speech commands are snapshot type, step 8 is executed by snapshot type submodule;If the voice Order is option type, then executes step 9 by option type submodule;If institute's speech commands are confirmation/cancellation type, Step 10 is then executed by confirmation/cancellation type submodule;If institute's speech commands are other types, pass through other types Submodule executes step 11;
In the present invention, effective result of speech recognition is classified, and is handled respectively.Classification includes: snapshot, Option (multitone set option), confirmation/cancellation (lower than the confirmation command of threshold value) and other (exceed system processing capacity).
Step 8: voice command corresponding with snapshot type is executed by snapshot type submodule, comprising the following steps:
Step 8.1, if institute's speech commands are snapshot type, the resolution score of institute's speech commands is obtained, and sentence Whether the resolution score that breaks is more than threshold value, if be not above, shows that institute's speech commands are not clear enough, thens follow the steps 8.2;If it does, showing that institute's speech commands are clear, 8.3 are thened follow the steps;
Step 8.2, it carries out further recognition result uniqueness to institute's speech commands to judge, it may be assumed that judge recognition result It whether is multitone situation, if it is not, then showing that voice command corresponds to the snapshot title of unique pronunciation in snapshot library, and by aobvious Show whether module output executes the prompt information of snapshot further confirmed that, meanwhile, remember in confirmation/cancellation context allocation list This confirmation/cancellation object information is recorded, subsequent step is then executed by confirmation/cancellation type submodule;If it is, showing Voice command corresponds to the snapshot title of multitone in snapshot library, and each snapshot title of multitone is formed multitone snapshot results collection It closes, and the multitone snapshot results set is shown by display module, meanwhile, this choosing is recorded in option context allocation list Item information, and subsequent step is executed by the option type submodule;
Step 8.3, it carries out further recognition result uniqueness to institute's speech commands to judge, it may be assumed that judge recognition result Whether it is multitone situation, if it is not, then showing that voice command corresponds to the snapshot title of unique pronunciation in snapshot library, and directly holds Row snapshot scene command corresponding with the snapshot title;If it is, showing that voice command corresponds to the fast of multitone in snapshot library Multitone snapshot results set is formed according to title, and by each snapshot title of multitone, and the multitone is shown by display module Snapshot results set, meanwhile, the minute book occurrences option information in option configuration table, and by the option type submodule execute after Continuous step;
After option context allocation list the Save option information, when any next voice from user is ordered After order is performed, that is, empty the option context allocation list;
After the confirmation/cancellation context allocation list storage confirmation/cancellation object information, come when any next After being performed from the voice command in user, that is, empty the confirmation/cancellation context allocation list;
Specifically, the voice command of snapshot type is carried out " identification score is more than threshold value " (threshold by snapshot type submodule Value is according to the resolution height for being " speech recognition " result) judgement is such as more than threshold value, system thinks that the instruction is clear, after progress Continuous " recognition result is unique " property judgement does not need to confirm to user, can directly carry out snapshot execution if recognition result is unique Instruction;If recognition result is not unique, it is shown to be multitone snapshot title, then pushes corresponding snapshot name list to user, to User does not confirm to user further after selecting corresponding snapshot title in list, directly carries out snapshot and executes instruction;Such as Resolution is lower than threshold value, and system thinks that the instruction is not clear enough, first carries out " recognition result is unique " judgement, recognition result in this way is only One, then there is still a need for prompt information is exported in display module, ask user to further confirm that whether execute the snapshot, system plays language Sound prompt " whether execute so-and-so snapshot ".If it is multitone situation, then multitone processing module processing (multitone processing itself is transferred to It is the behavior further confirmed that).
Step 9, voice command corresponding with option type is executed by option type submodule, comprising the following steps:
If institute's speech commands are option type, the option context allocation list is first looked for, judges the option Whether context allocation list is empty, if being not sky, shows there is information above corresponding with institute's speech commands, then according to institute The option information stored in option context allocation list and institute's speech commands are stated, corresponding snapshot scene is directly executed Order;If the option context allocation list is sky, shows that there is no information above, then voice is exported by display module The prompt information of recognition failures;
Specifically, option type submodule needs to have context memory ability, and when encountering multitone snapshot title, system It will do it " multitone processing ", the selection of user's multitone snapshot name set be provided.After system identification is option type to voice command, Information above first can be judged whether there is based on option context allocation list, if so, then the option with the dialogue of upper bout carries out Matching, and specific option is subjected to snapshot execution, system voice can give broadcasting and execute the successful information of so-and-so snapshot, display Equipment output executes the successful information of so-and-so snapshot.If system design considers from anthropomorphic angle, when one without information above The topic that do not support above, it is believed that the situation is an illegal operation, and system then plays voice prompting apology sentence, shows simultaneously Show equipment output prompt quasi-sentence, such as " you can ask me ... ".
Step 10, voice command corresponding with confirmation/cancellation type is executed by confirmation/cancellation type submodule, including Following steps:
If institute's speech commands are confirmation/cancellation type, the confirmation/cancellation context allocation list is first looked for, is sentenced Whether the confirmation/cancellation context the allocation list that breaks is empty, if being not sky, shows exist on corresponding with institute's speech commands Literary information, then according to the confirmation/cancellation object information and the voice stored in the confirmation/cancellation context allocation list Order, directly executes corresponding snapshot scene command;If the confirmation/cancellation context allocation list is sky, show not deposit Information above then exports the prompt information of speech recognition failure by display module;
Specifically, confirmation/cancellation type submodule is equally context disposition, it is lower than speech recognition degree for handling The result secondary-confirmation of given threshold after system identification to confirmation/cancellation type, is primarily based on confirmation/cancellation context configuration Table judges whether there is information above, if there is information above, then carries out snapshot execution, system voice can give broadcasting and execute so-and-so The successful information of snapshot, display equipment output execute the successful information of so-and-so snapshot.If without information above, system design from Anthropomorphic angle considers, when the topic that one is not supported above, it is believed that the situation is an illegal operation, and system then plays voice Prompt apology sentence, while showing equipment output prompt quasi-sentence, such as " you can ask me ... ".
Step 11, voice command corresponding with other types is executed by other types submodule, comprising the following steps: logical Cross the prompt information of display module output speech recognition failure.
Specifically, system, which designs, is considered to have exceeded system processing capacity, at unification when voice command is other types Reason are as follows: system plays voice prompting apologize sentence, while show equipment output prompt quasi-sentence, as " you can ask I ... ".
In each step of the present invention, when the prompt information for exporting speech recognition failure by display module specifically: play Voice prompting apology sentence, while exporting the prompt quasi-sentence of replacement voice command.
In the present invention, in being intended to identification process, " identification will do it after judging that active user is intended to snapshot type As a result unique " judgement, as a result such as "Yes", indicates that the result is the snapshot title of unique pronunciation in snapshot library, then directly executes The snapshot, system voice can give broadcasting and execute the successful information of so-and-so snapshot, and display equipment output executes so-and-so snapshot success Information;Such as "No", then it represents that the result is not the snapshot title of unique pronunciation in snapshot library, and system can be by all pronunciations Snapshot set by showing that equipment lists option user is allowed to select, and have voice feedback output prompt user, such as " please according to screen Project number is said or is clicked in curtain display ".For example, when recognize from user voice command be " the suitable mode of favour " when, be The snapshot collection searched of uniting is combined into " the suitable mode of favour, conference model 1, conference model 2 ", and then, system is shown in the form of a list " the suitable mode of favour, conference model 1, conference model 2 ", if user selects to execute conference model 1, " conference model 1 " is clicked, System executes snapshot scene corresponding with " conference model 1 " and executes order.
A kind of scene interactivity control method based on speech recognition provided by the invention has the following characteristics that
1, system can close audio monitoring program after completing certain tasks in time, avoid the sound of site environment complexity Maloperation is caused to system, sound or the mode clicked manually can be used efficiently to wake up intelligent sound identifying system for user.
It is i.e. controllable existing to say the voice command containing effective snapshot name under speech control system activation pattern by user Field device, such as say " starting monitoring mode ", " executing conference model ", " opening mode one "), there is user's operation letter Single advantage, to improve the usage experience of user;
2, voice feedback mechanism: this method focuses on user-interaction experience, and the various situations said in use for user are equal Effective Feedback information is provided, the proper use of speech control system of user is guided.
It is specific:
1) high-quality voice: when system receives clear, standard voice command, system is considered as safety, certifiable life It enables, will directly execute the task of user's intention.
2) if the voice quality that general voice recognizes is lower than outstanding speech score, system is to guarantee field device The generation of safety and other problems, first the going out speech recognition of the task and user confirm that user can click or say manually prompt Order word confirms or cancels this task.
3) order of polyphonic word: when there is multitone mode to be identified in system, system can feed back to user option, please use Click or say manually option in family.
4) exceed effective scene keyword range: the case where system is said outside knowledge and (do not understood) for user, meeting The example sentence for thering is prompt page guidance user can refer to.
3, Context Capability: this method can temporarily store up recognition result the case where coping with more recognition result (polyphonic words) There are calculator memories, carry out subsequent processing again after user answers confirmation message, realize the language of the context of the simulation mankind Say ability to exchange.
4, the snapshot name keyword that this system supports dynamically load newly-built.When user creates a snapshot and is the mode Name, after saving successfully, speech recognition system supports the speech identifying function of the snapshot immediately.
Therefore, a kind of scene interactivity control method based on speech recognition provided by the invention, by central control system and Speech recognition technology combines, and realizes the function of replacing conventional input device control central control system with language, has user The advantage experienced.
The method of the present invention is a kind of interactive means of assisted class, is conveyed using the expression voice of mankind's most original as message Medium is one of best mode of human-computer interaction.Speech recognition and middle control control system, which are combined, will further promote quotient The construction value and technology sense at industry control scene.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered Depending on protection scope of the present invention.

Claims (5)

1. a kind of scene interactivity control method based on speech recognition, which comprises the following steps:
Step 1, the pre-established snapshot library of central control system;The snapshot library stores several snapshot titles and snapshot scene is held The corresponding relationship of line command;Order is executed by the snapshot scene, the central control system controls meeting field device;
Speech recognition controlled program is to avoid maloperation, is usually in unwakened dormant state;At this point, waking up language monitors journey Sequence is persistently in an open state;The continuously off state of main speech oracle listener;
Step 2, the wake-up language oracle listener is monitored in real time, judges whether to listen to wake-up word;If not listening to wake-up Word is then persistently monitored;If listening to wake-up word, 3 are thened follow the steps;
Step 3, central control system closes the wake-up language oracle listener, opens the main speech oracle listener, and then wake up The speech recognition controlled program of the central control system, at this time the speech recognition controlled program transformation of the central control system For state of activation;
Step 4, the speech recognition controlled program of the central control system starts voice recording module, passes through the voice recording Module records the voice command from user, and stores the institute's speech commands being recorded to;Meanwhile in the Speech Record molding During block records institute's speech commands, speech volume waveform is shown by display module;
Step 5, the institute's speech commands being recorded to are transferred to speech recognition module by the voice recording module;
Step 6, the speech recognition module carries out preliminary voice validity identification to institute's speech commands, if identified successfully, Then follow the steps 7;If unidentified success, to the prompt information of user feedback recognition failures;
Step 7, the speech recognition module carries out intention assessment to institute's speech commands, identifies one in following four type Kind: snapshot type, option type, confirmation/cancellation type and other types;
If institute's speech commands are snapshot type, step 8 is executed by snapshot type submodule;If institute's speech commands For option type, then step 9 is executed by option type submodule;If institute's speech commands are confirmation/cancellation type, lead to It crosses confirmation/cancellation type submodule and executes step 10;If institute's speech commands are other types, pass through other types submodule Block executes step 11;
Step 8: voice command corresponding with snapshot type is executed by snapshot type submodule, comprising the following steps:
Step 8.1, if institute's speech commands are snapshot type, the resolution score of institute's speech commands is obtained, and judge institute State whether resolution score is more than threshold value, if be not above, shows that institute's speech commands are not clear enough, then follow the steps 8.2; If it does, showing that institute's speech commands are clear, 8.3 are thened follow the steps;
Step 8.2, it carries out further recognition result uniqueness to institute's speech commands to judge, it may be assumed that whether judge recognition result For multitone situation, if it is not, then showing that voice command corresponds to the snapshot title of unique pronunciation in snapshot library, and pass through display mould Block exports the prompt information further confirmed that for whether executing snapshot, meanwhile, the minute book in confirmation/cancellation context allocation list Then the object information of secondary confirmation/cancellation executes subsequent step by confirmation/cancellation type submodule;If it is, showing voice The snapshot title of multitone in corresponding snapshot library is ordered, and each snapshot title of multitone is formed into multitone snapshot results set, and The multitone snapshot results set is shown by display module, meanwhile, minute book occurrences option is believed in option context allocation list Breath, and subsequent step is executed by the option type submodule;
Step 8.3, it carries out further recognition result uniqueness to institute's speech commands to judge, it may be assumed that whether judge recognition result For multitone situation, if it is not, then show that voice command corresponds to the snapshot title of unique pronunciation in snapshot library, and directly execute with The corresponding snapshot scene command of the snapshot title;If it is, showing that voice command corresponds to the snapshot name of multitone in snapshot library Claim, and each snapshot title of multitone is formed into multitone snapshot results set, and the multitone snapshot is shown by display module Results set, meanwhile, the minute book occurrences option information in option configuration table, and subsequent step is executed by the option type submodule Suddenly;
After option context allocation list the Save option information, when any next voice command quilt from user After execution, that is, empty the option context allocation list;
After the confirmation/cancellation context allocation list storage confirmation/cancellation object information, when any next from After the voice command of user is performed, that is, empty the confirmation/cancellation context allocation list;
Step 9, voice command corresponding with option type is executed by option type submodule, comprising the following steps:
If institute's speech commands are option type, the option context allocation list is first looked for, judges the option or more Whether literary allocation list is empty, if being not sky, shows there is information above corresponding with institute's speech commands, then according to the choosing The option information and institute's speech commands stored in item context allocation list directly executes corresponding snapshot scene life It enables;If the option context allocation list is sky, show that there is no information above, then voice is exported by display module and known Not Shi Bai prompt information;
Step 10, voice command corresponding with confirmation/cancellation type is executed by confirmation/cancellation type submodule, including following Step:
If institute's speech commands are confirmation/cancellation type, the confirmation/cancellation context allocation list is first looked for, judges institute State whether confirmation/cancellation context allocation list is empty, if being not sky, shows there is letter above corresponding with institute's speech commands Breath is then ordered according to the confirmation/cancellation object information stored in the confirmation/cancellation context allocation list and the voice It enables, directly executes corresponding snapshot scene command;If the confirmation/cancellation context allocation list is sky, show to be not present Information above then exports the prompt information of speech recognition failure by display module;
Step 11, voice command corresponding with other types is executed by other types submodule, comprising the following steps: by aobvious Show the prompt information of module output speech recognition failure.
2. a kind of scene interactivity control method based on speech recognition according to claim 1, which is characterized in that configuration is called out It wakes up button, when the wake-up button is clicked, manually by the speech recognition controlled program of the central control system from suspend mode It is state of activation that state, which wakes up,.
3. a kind of scene interactivity control method based on speech recognition according to claim 1, which is characterized in that by aobvious Show the prompt information of module output speech recognition failure specifically: play voice prompting apology sentence, while exporting replacement voice The prompt quasi-sentence of order.
4. a kind of scene interactivity control method based on speech recognition according to claim 1, which is characterized in that step 1 In, snapshot library dynamic real-time update that the central control system is established.
5. a kind of scene interactivity control method based on speech recognition according to claim 1, which is characterized in that in described Centre control system includes: to touch to click screen, the triggering of remote control pen key and speech recognition controlled to the control mode of meeting field device.
CN201811581756.4A 2018-12-24 2018-12-24 Scene interaction control method based on voice recognition Active CN109616111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811581756.4A CN109616111B (en) 2018-12-24 2018-12-24 Scene interaction control method based on voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811581756.4A CN109616111B (en) 2018-12-24 2018-12-24 Scene interaction control method based on voice recognition

Publications (2)

Publication Number Publication Date
CN109616111A true CN109616111A (en) 2019-04-12
CN109616111B CN109616111B (en) 2023-03-14

Family

ID=66011357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811581756.4A Active CN109616111B (en) 2018-12-24 2018-12-24 Scene interaction control method based on voice recognition

Country Status (1)

Country Link
CN (1) CN109616111B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111176607A (en) * 2019-12-27 2020-05-19 国网山东省电力公司临沂供电公司 Voice interaction system and method based on power business
CN111554285A (en) * 2020-04-26 2020-08-18 三一重机有限公司 Voice control system and control method thereof
CN111897916A (en) * 2020-07-24 2020-11-06 惠州Tcl移动通信有限公司 Voice instruction recognition method and device, terminal equipment and storage medium
CN112291281A (en) * 2019-07-09 2021-01-29 钉钉控股(开曼)有限公司 Voice broadcast and voice broadcast content setting method and device
CN111128160B (en) * 2019-12-19 2024-04-09 中国平安财产保险股份有限公司 Receipt modification method and device based on voice recognition and computer equipment

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8219407B1 (en) * 2007-12-27 2012-07-10 Great Northern Research, LLC Method for processing the output of a speech recognizer
CN102800315A (en) * 2012-07-13 2012-11-28 上海博泰悦臻电子设备制造有限公司 Vehicle-mounted voice control method and system
CN103903619A (en) * 2012-12-28 2014-07-02 安徽科大讯飞信息科技股份有限公司 Method and system for improving accuracy of speech recognition
CN103943105A (en) * 2014-04-18 2014-07-23 安徽科大讯飞信息科技股份有限公司 Voice interaction method and system
CN104715754A (en) * 2015-03-05 2015-06-17 北京华丰亨通科贸有限公司 Method and device for rapidly responding to voice commands
US20160026253A1 (en) * 2014-03-11 2016-01-28 Magic Leap, Inc. Methods and systems for creating virtual and augmented reality
CN105609105A (en) * 2014-11-13 2016-05-25 现代自动车株式会社 Speech recognition system and speech recognition method
CN105786880A (en) * 2014-12-24 2016-07-20 中兴通讯股份有限公司 Voice recognition method, client and terminal device
CN105869634A (en) * 2016-03-31 2016-08-17 重庆大学 Field-based method and system for feeding back text error correction after speech recognition
US9424840B1 (en) * 2012-08-31 2016-08-23 Amazon Technologies, Inc. Speech recognition platforms
CN106710585A (en) * 2016-12-22 2017-05-24 上海语知义信息技术有限公司 Method and system for broadcasting polyphonic characters in voice interaction process
CN107272887A (en) * 2017-05-17 2017-10-20 四川新网银行股份有限公司 A kind of method that client scene interactivity is realized based on augmented reality
CN107615377A (en) * 2015-10-05 2018-01-19 萨万特系统有限责任公司 The key phrase suggestion based on history for the Voice command of domestic automation system
CN107705787A (en) * 2017-09-25 2018-02-16 北京捷通华声科技股份有限公司 A kind of audio recognition method and device
CN108564940A (en) * 2018-03-20 2018-09-21 平安科技(深圳)有限公司 Audio recognition method, server and computer readable storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8219407B1 (en) * 2007-12-27 2012-07-10 Great Northern Research, LLC Method for processing the output of a speech recognizer
CN102800315A (en) * 2012-07-13 2012-11-28 上海博泰悦臻电子设备制造有限公司 Vehicle-mounted voice control method and system
US9424840B1 (en) * 2012-08-31 2016-08-23 Amazon Technologies, Inc. Speech recognition platforms
CN103903619A (en) * 2012-12-28 2014-07-02 安徽科大讯飞信息科技股份有限公司 Method and system for improving accuracy of speech recognition
US20160026253A1 (en) * 2014-03-11 2016-01-28 Magic Leap, Inc. Methods and systems for creating virtual and augmented reality
CN103943105A (en) * 2014-04-18 2014-07-23 安徽科大讯飞信息科技股份有限公司 Voice interaction method and system
CN105609105A (en) * 2014-11-13 2016-05-25 现代自动车株式会社 Speech recognition system and speech recognition method
CN105786880A (en) * 2014-12-24 2016-07-20 中兴通讯股份有限公司 Voice recognition method, client and terminal device
CN104715754A (en) * 2015-03-05 2015-06-17 北京华丰亨通科贸有限公司 Method and device for rapidly responding to voice commands
CN107615377A (en) * 2015-10-05 2018-01-19 萨万特系统有限责任公司 The key phrase suggestion based on history for the Voice command of domestic automation system
CN105869634A (en) * 2016-03-31 2016-08-17 重庆大学 Field-based method and system for feeding back text error correction after speech recognition
CN106710585A (en) * 2016-12-22 2017-05-24 上海语知义信息技术有限公司 Method and system for broadcasting polyphonic characters in voice interaction process
CN107272887A (en) * 2017-05-17 2017-10-20 四川新网银行股份有限公司 A kind of method that client scene interactivity is realized based on augmented reality
CN107705787A (en) * 2017-09-25 2018-02-16 北京捷通华声科技股份有限公司 A kind of audio recognition method and device
CN108564940A (en) * 2018-03-20 2018-09-21 平安科技(深圳)有限公司 Audio recognition method, server and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周海涛: "语音合成中多音字识别的实现", 《科技资讯》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291281A (en) * 2019-07-09 2021-01-29 钉钉控股(开曼)有限公司 Voice broadcast and voice broadcast content setting method and device
CN112291281B (en) * 2019-07-09 2023-11-03 钉钉控股(开曼)有限公司 Voice broadcasting and voice broadcasting content setting method and device
CN111128160B (en) * 2019-12-19 2024-04-09 中国平安财产保险股份有限公司 Receipt modification method and device based on voice recognition and computer equipment
CN111176607A (en) * 2019-12-27 2020-05-19 国网山东省电力公司临沂供电公司 Voice interaction system and method based on power business
CN111554285A (en) * 2020-04-26 2020-08-18 三一重机有限公司 Voice control system and control method thereof
CN111897916A (en) * 2020-07-24 2020-11-06 惠州Tcl移动通信有限公司 Voice instruction recognition method and device, terminal equipment and storage medium
CN111897916B (en) * 2020-07-24 2024-03-19 惠州Tcl移动通信有限公司 Voice instruction recognition method, device, terminal equipment and storage medium

Also Published As

Publication number Publication date
CN109616111B (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN109616111A (en) A kind of scene interactivity control method based on speech recognition
US8010369B2 (en) System and method for controlling devices that are connected to a network
JP4036879B2 (en) Network-based intelligent support mechanism
US7007235B1 (en) Collaborative agent interaction control and synchronization system
US11282519B2 (en) Voice interaction method, device and computer readable storage medium
CN103593230B (en) background task control method of mobile terminal and mobile terminal
CN109243431A (en) A kind of processing method, control method, recognition methods and its device and electronic equipment
CN109192208A (en) A kind of control method of electrical equipment, system, device, equipment and medium
CN101365076B (en) Simplified setting method for television set remote controller and apparatus thereof
CN106356059A (en) Voice control method, device and projector
CN105446146A (en) Intelligent terminal control method based on semantic analysis, system and intelligent terminal
CN110309005A (en) A kind of funcall method, apparatus, terminal device and storage medium
WO2021203674A1 (en) Skill selection method and apparatus
CN109143879A (en) A method of controlling household electrical appliances centered on air-conditioning
CN109509468A (en) A kind of equipment executes the method and device of voice broadcast task
JP2021530130A (en) Methods and equipment for managing holds
CN114067798A (en) Server, intelligent equipment and intelligent voice control method
CN202103780U (en) Multimedia digital conference system
CN104899087A (en) Speech recognition achieving method and system for third-party applications
US11516346B2 (en) Three-way calling terminal for mobile human-machine coordination calling robot
CN109166572A (en) The method and reading machine people that robot is read
TWI297123B (en) Interactive entertainment center
CN101808219A (en) Control method for controlling dialing operation in video conference system terminal and device
CN116566760A (en) Smart home equipment control method and device, storage medium and electronic equipment
CN110233944A (en) Method, system, electronic equipment and the medium of interactive voice response

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant