CN109616111A - A kind of scene interactivity control method based on speech recognition - Google Patents
A kind of scene interactivity control method based on speech recognition Download PDFInfo
- Publication number
- CN109616111A CN109616111A CN201811581756.4A CN201811581756A CN109616111A CN 109616111 A CN109616111 A CN 109616111A CN 201811581756 A CN201811581756 A CN 201811581756A CN 109616111 A CN109616111 A CN 109616111A
- Authority
- CN
- China
- Prior art keywords
- snapshot
- speech
- institute
- speech recognition
- option
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000012790 confirmation Methods 0.000 claims abstract description 59
- 230000004913 activation Effects 0.000 claims abstract description 9
- 230000002618 waking effect Effects 0.000 claims description 5
- 238000000465 moulding Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims 1
- 230000006870 function Effects 0.000 abstract description 11
- 238000005516 engineering process Methods 0.000 abstract description 9
- 230000008901 benefit Effects 0.000 abstract description 6
- 238000012545 processing Methods 0.000 description 8
- 238000012544 monitoring process Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000006872 improvement Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 241001672694 Citrus reticulata Species 0.000 description 1
- 238000004378 air conditioning Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000008713 feedback mechanism Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 230000007087 memory ability Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000007474 system interaction Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The present invention provides a kind of scene interactivity control method based on speech recognition, comprising: the pre-established snapshot library of central control system;After the speech recognition controlled program of central control system is changed into state of activation, the speech recognition controlled program of central control system starts voice recording module, records the voice command from user;Speech recognition module carries out intention assessment to institute's speech commands, identifies one of following four type: snapshot type, option type, confirmation/cancellation type and other types, and execute respectively.Advantage are as follows: a kind of scene interactivity control method based on speech recognition provided by the invention combines central control system and speech recognition technology, realizes the function of replacing conventional input device control central control system with language, has the advantages that user experience is good.
Description
Technical field
The invention belongs to scene interactivity control technology fields, and in particular to a kind of scene interactivity control based on speech recognition
Method.
Background technique
In recent years, as China's economy shows the situation of rapid growth, the application demand in government and enterprise meeting-place also by
Gradually change from single to diversification;It is each that meeting-place application is related to meeting, scheduling controlling, emergency command, daily operation, Centralized Monitoring etc.
Kind function;Meeting-place device category is various, it may for example comprise lamp, speaker, mosaic screen, TV, video camera, projector, lifting display
The equipment such as device, video disc player, matrix, mosaic screen processor.
Currently, meeting-place controls main method are as follows: according to each meeting field mode, controlled manually to all kinds of meeting field devices
System, for example, the brightness of lamp under field mode, can be controlled respectively at certain, speaker is opened, open video camera and by display register
To a certain height, to meet meeting demand.Another kind can under field mode, then control respectively the brightness of lamp, open video disc player, will
Display register is to another height, to meet meeting demand.
It is above-mentioned can field control method have the problem that using by the way of manual, to each controlled device carry out control with
It adjusts, low, worker's heavy workload the deficiency with control efficiency.
Summary of the invention
In view of the defects existing in the prior art, the present invention provides a kind of scene interactivity control method based on speech recognition,
It can effectively solve the above problems.
The technical solution adopted by the invention is as follows:
The present invention provides a kind of scene interactivity control method based on speech recognition, comprising the following steps:
Step 1, the pre-established snapshot library of central control system;The snapshot library stores several snapshot titles and snapshot field
The exectorial corresponding relationship of scape;Order is executed by the snapshot scene, the central control system carries out meeting field device
Control;
Speech recognition controlled program is to avoid maloperation, is usually in unwakened dormant state;At this point, waking up language prison
Program is listened persistently to be in an open state;The continuously off state of main speech oracle listener;
Step 2, the wake-up language oracle listener is monitored in real time, judges whether to listen to wake-up word;If do not listened to
Word is waken up, then is persistently monitored;If listening to wake-up word, 3 are thened follow the steps;
Step 3, central control system closes the wake-up language oracle listener, opens the main speech oracle listener, in turn
The speech recognition controlled program of the central control system is waken up, at this time the speech recognition controlled program of the central control system
It is changed into state of activation;
Step 4, the speech recognition controlled program of the central control system starts voice recording module, passes through the voice
Module recording is recorded from the voice command of user, and stores the institute's speech commands being recorded to;Meanwhile in the Speech Record
During molding block records institute's speech commands, speech volume waveform is shown by display module;
Step 5, the institute's speech commands being recorded to are transferred to speech recognition module by the voice recording module;
Step 6, the speech recognition module carries out preliminary voice validity identification to institute's speech commands, if be identified as
Function thens follow the steps 7;If unidentified success, to the prompt information of user feedback recognition failures;
Step 7, the speech recognition module carries out intention assessment to institute's speech commands, identifies in following four type
One kind: snapshot type, option type, confirmation/cancellation type and other types;
If institute's speech commands are snapshot type, step 8 is executed by snapshot type submodule;If the voice
Order is option type, then executes step 9 by option type submodule;If institute's speech commands are confirmation/cancellation type,
Step 10 is then executed by confirmation/cancellation type submodule;If institute's speech commands are other types, pass through other types
Submodule executes step 11;
Step 8: voice command corresponding with snapshot type is executed by snapshot type submodule, comprising the following steps:
Step 8.1, if institute's speech commands are snapshot type, the resolution score of institute's speech commands is obtained, and sentence
Whether the resolution score that breaks is more than threshold value, if be not above, shows that institute's speech commands are not clear enough, thens follow the steps
8.2;If it does, showing that institute's speech commands are clear, 8.3 are thened follow the steps;
Step 8.2, it carries out further recognition result uniqueness to institute's speech commands to judge, it may be assumed that judge recognition result
It whether is multitone situation, if it is not, then showing that voice command corresponds to the snapshot title of unique pronunciation in snapshot library, and by aobvious
Show whether module output executes the prompt information of snapshot further confirmed that, meanwhile, remember in confirmation/cancellation context allocation list
This confirmation/cancellation object information is recorded, subsequent step is then executed by confirmation/cancellation type submodule;If it is, showing
Voice command corresponds to the snapshot title of multitone in snapshot library, and each snapshot title of multitone is formed multitone snapshot results collection
It closes, and the multitone snapshot results set is shown by display module, meanwhile, this choosing is recorded in option context allocation list
Item information, and subsequent step is executed by the option type submodule;
Step 8.3, it carries out further recognition result uniqueness to institute's speech commands to judge, it may be assumed that judge recognition result
Whether it is multitone situation, if it is not, then showing that voice command corresponds to the snapshot title of unique pronunciation in snapshot library, and directly holds
Row snapshot scene command corresponding with the snapshot title;If it is, showing that voice command corresponds to the fast of multitone in snapshot library
Multitone snapshot results set is formed according to title, and by each snapshot title of multitone, and the multitone is shown by display module
Snapshot results set, meanwhile, the minute book occurrences option information in option configuration table, and by the option type submodule execute after
Continuous step;
After option context allocation list the Save option information, when any next voice from user is ordered
After order is performed, that is, empty the option context allocation list;
After the confirmation/cancellation context allocation list storage confirmation/cancellation object information, come when any next
After being performed from the voice command in user, that is, empty the confirmation/cancellation context allocation list;
Step 9, voice command corresponding with option type is executed by option type submodule, comprising the following steps:
If institute's speech commands are option type, the option context allocation list is first looked for, judges the option
Whether context allocation list is empty, if being not sky, shows there is information above corresponding with institute's speech commands, then according to institute
The option information stored in option context allocation list and institute's speech commands are stated, corresponding snapshot scene is directly executed
Order;If the option context allocation list is sky, shows that there is no information above, then voice is exported by display module
The prompt information of recognition failures;
Step 10, voice command corresponding with confirmation/cancellation type is executed by confirmation/cancellation type submodule, including
Following steps:
If institute's speech commands are confirmation/cancellation type, the confirmation/cancellation context allocation list is first looked for, is sentenced
Whether the confirmation/cancellation context the allocation list that breaks is empty, if being not sky, shows exist on corresponding with institute's speech commands
Literary information, then according to the confirmation/cancellation object information and the voice stored in the confirmation/cancellation context allocation list
Order, directly executes corresponding snapshot scene command;If the confirmation/cancellation context allocation list is sky, show not deposit
Information above then exports the prompt information of speech recognition failure by display module;
Step 11, voice command corresponding with other types is executed by other types submodule, comprising the following steps: logical
Cross the prompt information of display module output speech recognition failure.
Preferably, configuration wakes up button, when the wake-up button is clicked, manually by the language of the central control system
It is state of activation that sound identification control program wakes up from dormant state.
Preferably, the prompt information of speech recognition failure is exported by display module specifically: play voice prompting apology
Sentence, while exporting the prompt quasi-sentence of replacement voice command.
Preferably, in step 1, snapshot library dynamic real-time update that the central control system is established.
Preferably, the central control system to can field device control mode include: touch click screen, remote control pen by
Key triggering and speech recognition controlled.
A kind of scene interactivity control method based on speech recognition provided by the invention has the advantage that
A kind of scene interactivity control method based on speech recognition provided by the invention knows central control system and voice
Other technology combines, and realizes the function of replacing conventional input device control central control system with language, has user experience good
The advantages of.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the scene interactivity control method based on speech recognition provided by the invention.
Specific embodiment
In order to which the technical problems, technical solutions and beneficial effects solved by the present invention is more clearly understood, below in conjunction with
Accompanying drawings and embodiments, the present invention will be described in further detail.It should be appreciated that specific embodiment described herein only to
It explains the present invention, is not intended to limit the present invention.
Key Term is explained:
Controlled device: support live traffic hall function infrastructure device, as large screen system, central air-conditioning, work station,
Sound reinforcement system, light management module etc..
Scene mode (hereinafter referred to as snapshot): the name that control is defined in service layer is combined for each controlled device in scene
Claim, such as contingency mode, daily monitoring mode.
Central control system: control system, which refers to, carries out central controlled equipment to the various equipment such as sound, light, electricity.It is answered
For multi-media classroom, the multifunctional conference Room, command and control center, intelligent family etc., user's available buttons formula control panel,
The equipment such as computer display, touch screen and wireless remote control pass through computer and central control system software control projector, exhibition
Show the equipment such as platform, video disc player, video recorder.
In the latest 20 years, speech recognition technology obtains marked improvement, starts to move towards market from laboratory.It is contemplated that not
Come in 10 years, speech recognition technology will enter industry, household electrical appliances, communication, automotive electronics, medical treatment, home services, consumption electronic product
Etc. every field.
Central control system constantly derives plurality of application scenes according to business demand in practical applications, and such as demonstration converges
Report, daily monitoring, production scheduling etc..User with the terminal of system interaction gradually from desktop computer toward more small and exquisite, mobile
Aspect direction is developed.
Therefore, the present invention provides a kind of scene interactivity control method based on speech recognition, by speech recognition technology in
Centre control system combines, and is related to a kind of scene interactivity control method for using voice as instruction media, the scene interactivity control
Method processed is applied to the halls such as business monitoring, scheduling control scene, as grid dispatching center, public security monitor hall, fight of army
Command hall etc..
The present invention utilizes speech recognition technology, and the speech message of speaker is identified, and is analyzed and processed, and such as knows
It is clipped to and transfers to central control system to carry out execution concrete operations comprising effective snapshot name, realize liberation both hands control equipment
Purpose.
The realization of this method is divided into hardware and software and handles two parts composition.
1, hardware is disposed
Need audio collecting device to be connected to central control system controlling terminal: the mobile terminal with Bluetooth function uses
Bluetooth headset connection, language acquisition and voice feedback are completed by bluetooth headset;Not having the controlling terminal use of Bluetooth function has
Line microphone, voice feedback need additional configuration loudspeaker (microphone sound integral machine or the on demand expansion at intervention scene to can be used
Sound system).
2, software realization process
This system is divided into speech recognition module and central control system, and wherein speech recognition module is this system emphasis application
The technical point of protection, central control system are controlled according to the equipment of snapmap after the instruction for receiving speech recognition module and are assisted
Negotiation physical link configures to target device and issues control instruction, quickly obtains required scene mode, improves control efficiency.This
The emphasis of invention is how to identify to the voice from user, and correspond to the snapshot title in snapshot library;And
After being fitted on the snapshot title in snapshot library, it is only necessary to execute corresponding control instruction, control instruction can be issued to target device.
In the present invention, link is waken up to avoid maloperation from being added, therefore business scenario is divided into wake-up and speech recognition two
Point, and the corresponding logical process method of G- Design is disagreed according to user, specific business judges that process is as shown in Figure 1.
For system, the sole operation path of user is exactly to say voice, and system devises unique service entrance for monitoring
The voice that user says.
With reference to Fig. 1, a kind of scene interactivity control method based on speech recognition, comprising the following steps:
Step 1, the pre-established snapshot library of central control system;The snapshot library stores several snapshot titles and snapshot field
The exectorial corresponding relationship of scape;Order is executed by the snapshot scene, the central control system carries out meeting field device
Control;Central control system includes but is not limited to the control mode of meeting field device: touching and clicks screen, remote control pen key triggers
And speech recognition controlled.
Such as: snapshot title be respectively as follows: the suitable mode of conference model, favour, visual scheduling scene, section's petrochemical industry scheduling scenario,
Light triumphant mode, light standard-sized sheet desert formula etc. entirely.Each mode corresponds to one group of correspondence executed to each controlled device
Instruction.Wherein, in practical application, snapshot title the case where there are wrong words, polyphone.For example, in light standard-sized sheet desert formula
" desert " is wrong word;The two snapshot title phonetics of the suitable mode of favour and conference model are identical, and only the tone of partial words is different, quilt
It is considered multitone snapshot title.
The subsequent scene interactivity control method based on speech recognition of the present invention, can be realized to there are wrong word, exist
The identification and execution of the snapshot title of polyphone.
In addition, the snapshot library that the central control system is established being capable of dynamic real-time update.That is: central control system permits
Family allowable according to business needs to create different snapshots, and (actually include inside each snapshot is the pre- of one or more controlled devices
Control message), and be the customized name of the snapshot.And the name set of these snapshots is the effective of voice control function
Range.
Speech recognition controlled program is to avoid maloperation, is usually in unwakened dormant state;At this point, waking up language prison
Program is listened persistently to be in an open state;The continuously off state of main speech oracle listener;
In practical application, wake-up button can also be configured, when the wake-up button is clicked, is manually controlled the center
It is state of activation that the speech recognition controlled program of system processed wakes up from dormant state.
The design principle of wakeup process are as follows:
System is in order to avoid (such as multiple users' user misoperation are talking, and may refer to the identifiable order of system
Sentence causes snapshot accidentally to execute) so wake-up mechanism is increased, screen locking/unlocking mechanism of similar mobile phone.That is: it is not waking up
When, system is similar to screen lock state;When state of activation, system is in the unlocked state.
In synchronization, language oracle listener and main speech oracle listener are waken up, only one is in an open state the two, another
It is a to be in off state.
Step 2, the wake-up language oracle listener is monitored in real time, judges whether to listen to wake-up word;If do not listened to
Word is waken up, then is persistently monitored;If listening to wake-up word, 3 are thened follow the steps;
Step 3, central control system closes the wake-up language oracle listener, opens the main speech oracle listener, in turn
The speech recognition controlled program of the central control system is waken up, at this time the speech recognition controlled program of the central control system
It is changed into state of activation;
For example, user says some word when central control system is in unwakened dormant state, the word quilt
It wakes up language oracle listener to listen to, then, wakes up language oracle listener and judge whether the word is to wake up word;Wherein, waking up word is
System pre-customized word according to demand, for example, " the small small perseverance of perseverance ";If it is word is waken up, then main speech oracle listener is opened;Together
When, display module export speech waveform feedback, prompt user currently capturing sound, and have voice output prompt " voice helps
Hand has been turned on ";As identification does not provide any feedback less than word, system is waken up.
Step 4, the speech recognition controlled program of the central control system starts voice recording module, passes through the voice
Module recording is recorded from the voice command of user, and stores the institute's speech commands being recorded to;Meanwhile in the Speech Record
During molding block records institute's speech commands, speech volume waveform is shown by display module;
Specifically, central control system is in active state, user says voice command, center control system using mandarin
System both executed " voice recording " operation and " waveform feedback "." voice recording " is to wait user to complete current speech information
Expression, and give speech recognition module to analyze voice messaging progress memory storage;" waveform feedback " is to give
User records the feedback of quality when stating voice messaging, as waveform is unobvious, then it represents that user speech quality is low, prompts user
It improves volume or furthers at a distance from audio collecting device.
Step 5, the institute's speech commands being recorded to are transferred to speech recognition module by the voice recording module;
Step 6, the speech recognition module carries out preliminary voice validity identification to institute's speech commands, if be identified as
Function thens follow the steps 7;If unidentified success, to the prompt information of user feedback recognition failures;
Specifically, the voice command being recorded to is transferred to speech recognition module by voice recording module;Speech recognition module
Judge whether to recognize information, if recognized, carries out subsequent processing;If do not recognized, voice prompting apology is played
Sentence, while display module output prompt quasi-sentence, such as " you can ask me ... ".
Step 7, the speech recognition module carries out intention assessment to institute's speech commands, identifies in following four type
One kind: snapshot type, option type, confirmation/cancellation type and other types;
If institute's speech commands are snapshot type, step 8 is executed by snapshot type submodule;If the voice
Order is option type, then executes step 9 by option type submodule;If institute's speech commands are confirmation/cancellation type,
Step 10 is then executed by confirmation/cancellation type submodule;If institute's speech commands are other types, pass through other types
Submodule executes step 11;
In the present invention, effective result of speech recognition is classified, and is handled respectively.Classification includes: snapshot,
Option (multitone set option), confirmation/cancellation (lower than the confirmation command of threshold value) and other (exceed system processing capacity).
Step 8: voice command corresponding with snapshot type is executed by snapshot type submodule, comprising the following steps:
Step 8.1, if institute's speech commands are snapshot type, the resolution score of institute's speech commands is obtained, and sentence
Whether the resolution score that breaks is more than threshold value, if be not above, shows that institute's speech commands are not clear enough, thens follow the steps
8.2;If it does, showing that institute's speech commands are clear, 8.3 are thened follow the steps;
Step 8.2, it carries out further recognition result uniqueness to institute's speech commands to judge, it may be assumed that judge recognition result
It whether is multitone situation, if it is not, then showing that voice command corresponds to the snapshot title of unique pronunciation in snapshot library, and by aobvious
Show whether module output executes the prompt information of snapshot further confirmed that, meanwhile, remember in confirmation/cancellation context allocation list
This confirmation/cancellation object information is recorded, subsequent step is then executed by confirmation/cancellation type submodule;If it is, showing
Voice command corresponds to the snapshot title of multitone in snapshot library, and each snapshot title of multitone is formed multitone snapshot results collection
It closes, and the multitone snapshot results set is shown by display module, meanwhile, this choosing is recorded in option context allocation list
Item information, and subsequent step is executed by the option type submodule;
Step 8.3, it carries out further recognition result uniqueness to institute's speech commands to judge, it may be assumed that judge recognition result
Whether it is multitone situation, if it is not, then showing that voice command corresponds to the snapshot title of unique pronunciation in snapshot library, and directly holds
Row snapshot scene command corresponding with the snapshot title;If it is, showing that voice command corresponds to the fast of multitone in snapshot library
Multitone snapshot results set is formed according to title, and by each snapshot title of multitone, and the multitone is shown by display module
Snapshot results set, meanwhile, the minute book occurrences option information in option configuration table, and by the option type submodule execute after
Continuous step;
After option context allocation list the Save option information, when any next voice from user is ordered
After order is performed, that is, empty the option context allocation list;
After the confirmation/cancellation context allocation list storage confirmation/cancellation object information, come when any next
After being performed from the voice command in user, that is, empty the confirmation/cancellation context allocation list;
Specifically, the voice command of snapshot type is carried out " identification score is more than threshold value " (threshold by snapshot type submodule
Value is according to the resolution height for being " speech recognition " result) judgement is such as more than threshold value, system thinks that the instruction is clear, after progress
Continuous " recognition result is unique " property judgement does not need to confirm to user, can directly carry out snapshot execution if recognition result is unique
Instruction;If recognition result is not unique, it is shown to be multitone snapshot title, then pushes corresponding snapshot name list to user, to
User does not confirm to user further after selecting corresponding snapshot title in list, directly carries out snapshot and executes instruction;Such as
Resolution is lower than threshold value, and system thinks that the instruction is not clear enough, first carries out " recognition result is unique " judgement, recognition result in this way is only
One, then there is still a need for prompt information is exported in display module, ask user to further confirm that whether execute the snapshot, system plays language
Sound prompt " whether execute so-and-so snapshot ".If it is multitone situation, then multitone processing module processing (multitone processing itself is transferred to
It is the behavior further confirmed that).
Step 9, voice command corresponding with option type is executed by option type submodule, comprising the following steps:
If institute's speech commands are option type, the option context allocation list is first looked for, judges the option
Whether context allocation list is empty, if being not sky, shows there is information above corresponding with institute's speech commands, then according to institute
The option information stored in option context allocation list and institute's speech commands are stated, corresponding snapshot scene is directly executed
Order;If the option context allocation list is sky, shows that there is no information above, then voice is exported by display module
The prompt information of recognition failures;
Specifically, option type submodule needs to have context memory ability, and when encountering multitone snapshot title, system
It will do it " multitone processing ", the selection of user's multitone snapshot name set be provided.After system identification is option type to voice command,
Information above first can be judged whether there is based on option context allocation list, if so, then the option with the dialogue of upper bout carries out
Matching, and specific option is subjected to snapshot execution, system voice can give broadcasting and execute the successful information of so-and-so snapshot, display
Equipment output executes the successful information of so-and-so snapshot.If system design considers from anthropomorphic angle, when one without information above
The topic that do not support above, it is believed that the situation is an illegal operation, and system then plays voice prompting apology sentence, shows simultaneously
Show equipment output prompt quasi-sentence, such as " you can ask me ... ".
Step 10, voice command corresponding with confirmation/cancellation type is executed by confirmation/cancellation type submodule, including
Following steps:
If institute's speech commands are confirmation/cancellation type, the confirmation/cancellation context allocation list is first looked for, is sentenced
Whether the confirmation/cancellation context the allocation list that breaks is empty, if being not sky, shows exist on corresponding with institute's speech commands
Literary information, then according to the confirmation/cancellation object information and the voice stored in the confirmation/cancellation context allocation list
Order, directly executes corresponding snapshot scene command;If the confirmation/cancellation context allocation list is sky, show not deposit
Information above then exports the prompt information of speech recognition failure by display module;
Specifically, confirmation/cancellation type submodule is equally context disposition, it is lower than speech recognition degree for handling
The result secondary-confirmation of given threshold after system identification to confirmation/cancellation type, is primarily based on confirmation/cancellation context configuration
Table judges whether there is information above, if there is information above, then carries out snapshot execution, system voice can give broadcasting and execute so-and-so
The successful information of snapshot, display equipment output execute the successful information of so-and-so snapshot.If without information above, system design from
Anthropomorphic angle considers, when the topic that one is not supported above, it is believed that the situation is an illegal operation, and system then plays voice
Prompt apology sentence, while showing equipment output prompt quasi-sentence, such as " you can ask me ... ".
Step 11, voice command corresponding with other types is executed by other types submodule, comprising the following steps: logical
Cross the prompt information of display module output speech recognition failure.
Specifically, system, which designs, is considered to have exceeded system processing capacity, at unification when voice command is other types
Reason are as follows: system plays voice prompting apologize sentence, while show equipment output prompt quasi-sentence, as " you can ask
I ... ".
In each step of the present invention, when the prompt information for exporting speech recognition failure by display module specifically: play
Voice prompting apology sentence, while exporting the prompt quasi-sentence of replacement voice command.
In the present invention, in being intended to identification process, " identification will do it after judging that active user is intended to snapshot type
As a result unique " judgement, as a result such as "Yes", indicates that the result is the snapshot title of unique pronunciation in snapshot library, then directly executes
The snapshot, system voice can give broadcasting and execute the successful information of so-and-so snapshot, and display equipment output executes so-and-so snapshot success
Information;Such as "No", then it represents that the result is not the snapshot title of unique pronunciation in snapshot library, and system can be by all pronunciations
Snapshot set by showing that equipment lists option user is allowed to select, and have voice feedback output prompt user, such as " please according to screen
Project number is said or is clicked in curtain display ".For example, when recognize from user voice command be " the suitable mode of favour " when, be
The snapshot collection searched of uniting is combined into " the suitable mode of favour, conference model 1, conference model 2 ", and then, system is shown in the form of a list
" the suitable mode of favour, conference model 1, conference model 2 ", if user selects to execute conference model 1, " conference model 1 " is clicked,
System executes snapshot scene corresponding with " conference model 1 " and executes order.
A kind of scene interactivity control method based on speech recognition provided by the invention has the following characteristics that
1, system can close audio monitoring program after completing certain tasks in time, avoid the sound of site environment complexity
Maloperation is caused to system, sound or the mode clicked manually can be used efficiently to wake up intelligent sound identifying system for user.
It is i.e. controllable existing to say the voice command containing effective snapshot name under speech control system activation pattern by user
Field device, such as say " starting monitoring mode ", " executing conference model ", " opening mode one "), there is user's operation letter
Single advantage, to improve the usage experience of user;
2, voice feedback mechanism: this method focuses on user-interaction experience, and the various situations said in use for user are equal
Effective Feedback information is provided, the proper use of speech control system of user is guided.
It is specific:
1) high-quality voice: when system receives clear, standard voice command, system is considered as safety, certifiable life
It enables, will directly execute the task of user's intention.
2) if the voice quality that general voice recognizes is lower than outstanding speech score, system is to guarantee field device
The generation of safety and other problems, first the going out speech recognition of the task and user confirm that user can click or say manually prompt
Order word confirms or cancels this task.
3) order of polyphonic word: when there is multitone mode to be identified in system, system can feed back to user option, please use
Click or say manually option in family.
4) exceed effective scene keyword range: the case where system is said outside knowledge and (do not understood) for user, meeting
The example sentence for thering is prompt page guidance user can refer to.
3, Context Capability: this method can temporarily store up recognition result the case where coping with more recognition result (polyphonic words)
There are calculator memories, carry out subsequent processing again after user answers confirmation message, realize the language of the context of the simulation mankind
Say ability to exchange.
4, the snapshot name keyword that this system supports dynamically load newly-built.When user creates a snapshot and is the mode
Name, after saving successfully, speech recognition system supports the speech identifying function of the snapshot immediately.
Therefore, a kind of scene interactivity control method based on speech recognition provided by the invention, by central control system and
Speech recognition technology combines, and realizes the function of replacing conventional input device control central control system with language, has user
The advantage experienced.
The method of the present invention is a kind of interactive means of assisted class, is conveyed using the expression voice of mankind's most original as message
Medium is one of best mode of human-computer interaction.Speech recognition and middle control control system, which are combined, will further promote quotient
The construction value and technology sense at industry control scene.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
Depending on protection scope of the present invention.
Claims (5)
1. a kind of scene interactivity control method based on speech recognition, which comprises the following steps:
Step 1, the pre-established snapshot library of central control system;The snapshot library stores several snapshot titles and snapshot scene is held
The corresponding relationship of line command;Order is executed by the snapshot scene, the central control system controls meeting field device;
Speech recognition controlled program is to avoid maloperation, is usually in unwakened dormant state;At this point, waking up language monitors journey
Sequence is persistently in an open state;The continuously off state of main speech oracle listener;
Step 2, the wake-up language oracle listener is monitored in real time, judges whether to listen to wake-up word;If not listening to wake-up
Word is then persistently monitored;If listening to wake-up word, 3 are thened follow the steps;
Step 3, central control system closes the wake-up language oracle listener, opens the main speech oracle listener, and then wake up
The speech recognition controlled program of the central control system, at this time the speech recognition controlled program transformation of the central control system
For state of activation;
Step 4, the speech recognition controlled program of the central control system starts voice recording module, passes through the voice recording
Module records the voice command from user, and stores the institute's speech commands being recorded to;Meanwhile in the Speech Record molding
During block records institute's speech commands, speech volume waveform is shown by display module;
Step 5, the institute's speech commands being recorded to are transferred to speech recognition module by the voice recording module;
Step 6, the speech recognition module carries out preliminary voice validity identification to institute's speech commands, if identified successfully,
Then follow the steps 7;If unidentified success, to the prompt information of user feedback recognition failures;
Step 7, the speech recognition module carries out intention assessment to institute's speech commands, identifies one in following four type
Kind: snapshot type, option type, confirmation/cancellation type and other types;
If institute's speech commands are snapshot type, step 8 is executed by snapshot type submodule;If institute's speech commands
For option type, then step 9 is executed by option type submodule;If institute's speech commands are confirmation/cancellation type, lead to
It crosses confirmation/cancellation type submodule and executes step 10;If institute's speech commands are other types, pass through other types submodule
Block executes step 11;
Step 8: voice command corresponding with snapshot type is executed by snapshot type submodule, comprising the following steps:
Step 8.1, if institute's speech commands are snapshot type, the resolution score of institute's speech commands is obtained, and judge institute
State whether resolution score is more than threshold value, if be not above, shows that institute's speech commands are not clear enough, then follow the steps 8.2;
If it does, showing that institute's speech commands are clear, 8.3 are thened follow the steps;
Step 8.2, it carries out further recognition result uniqueness to institute's speech commands to judge, it may be assumed that whether judge recognition result
For multitone situation, if it is not, then showing that voice command corresponds to the snapshot title of unique pronunciation in snapshot library, and pass through display mould
Block exports the prompt information further confirmed that for whether executing snapshot, meanwhile, the minute book in confirmation/cancellation context allocation list
Then the object information of secondary confirmation/cancellation executes subsequent step by confirmation/cancellation type submodule;If it is, showing voice
The snapshot title of multitone in corresponding snapshot library is ordered, and each snapshot title of multitone is formed into multitone snapshot results set, and
The multitone snapshot results set is shown by display module, meanwhile, minute book occurrences option is believed in option context allocation list
Breath, and subsequent step is executed by the option type submodule;
Step 8.3, it carries out further recognition result uniqueness to institute's speech commands to judge, it may be assumed that whether judge recognition result
For multitone situation, if it is not, then show that voice command corresponds to the snapshot title of unique pronunciation in snapshot library, and directly execute with
The corresponding snapshot scene command of the snapshot title;If it is, showing that voice command corresponds to the snapshot name of multitone in snapshot library
Claim, and each snapshot title of multitone is formed into multitone snapshot results set, and the multitone snapshot is shown by display module
Results set, meanwhile, the minute book occurrences option information in option configuration table, and subsequent step is executed by the option type submodule
Suddenly;
After option context allocation list the Save option information, when any next voice command quilt from user
After execution, that is, empty the option context allocation list;
After the confirmation/cancellation context allocation list storage confirmation/cancellation object information, when any next from
After the voice command of user is performed, that is, empty the confirmation/cancellation context allocation list;
Step 9, voice command corresponding with option type is executed by option type submodule, comprising the following steps:
If institute's speech commands are option type, the option context allocation list is first looked for, judges the option or more
Whether literary allocation list is empty, if being not sky, shows there is information above corresponding with institute's speech commands, then according to the choosing
The option information and institute's speech commands stored in item context allocation list directly executes corresponding snapshot scene life
It enables;If the option context allocation list is sky, show that there is no information above, then voice is exported by display module and known
Not Shi Bai prompt information;
Step 10, voice command corresponding with confirmation/cancellation type is executed by confirmation/cancellation type submodule, including following
Step:
If institute's speech commands are confirmation/cancellation type, the confirmation/cancellation context allocation list is first looked for, judges institute
State whether confirmation/cancellation context allocation list is empty, if being not sky, shows there is letter above corresponding with institute's speech commands
Breath is then ordered according to the confirmation/cancellation object information stored in the confirmation/cancellation context allocation list and the voice
It enables, directly executes corresponding snapshot scene command;If the confirmation/cancellation context allocation list is sky, show to be not present
Information above then exports the prompt information of speech recognition failure by display module;
Step 11, voice command corresponding with other types is executed by other types submodule, comprising the following steps: by aobvious
Show the prompt information of module output speech recognition failure.
2. a kind of scene interactivity control method based on speech recognition according to claim 1, which is characterized in that configuration is called out
It wakes up button, when the wake-up button is clicked, manually by the speech recognition controlled program of the central control system from suspend mode
It is state of activation that state, which wakes up,.
3. a kind of scene interactivity control method based on speech recognition according to claim 1, which is characterized in that by aobvious
Show the prompt information of module output speech recognition failure specifically: play voice prompting apology sentence, while exporting replacement voice
The prompt quasi-sentence of order.
4. a kind of scene interactivity control method based on speech recognition according to claim 1, which is characterized in that step 1
In, snapshot library dynamic real-time update that the central control system is established.
5. a kind of scene interactivity control method based on speech recognition according to claim 1, which is characterized in that in described
Centre control system includes: to touch to click screen, the triggering of remote control pen key and speech recognition controlled to the control mode of meeting field device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811581756.4A CN109616111B (en) | 2018-12-24 | 2018-12-24 | Scene interaction control method based on voice recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811581756.4A CN109616111B (en) | 2018-12-24 | 2018-12-24 | Scene interaction control method based on voice recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109616111A true CN109616111A (en) | 2019-04-12 |
CN109616111B CN109616111B (en) | 2023-03-14 |
Family
ID=66011357
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811581756.4A Active CN109616111B (en) | 2018-12-24 | 2018-12-24 | Scene interaction control method based on voice recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109616111B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111176607A (en) * | 2019-12-27 | 2020-05-19 | 国网山东省电力公司临沂供电公司 | Voice interaction system and method based on power business |
CN111554285A (en) * | 2020-04-26 | 2020-08-18 | 三一重机有限公司 | Voice control system and control method thereof |
CN111897916A (en) * | 2020-07-24 | 2020-11-06 | 惠州Tcl移动通信有限公司 | Voice instruction recognition method and device, terminal equipment and storage medium |
CN112291281A (en) * | 2019-07-09 | 2021-01-29 | 钉钉控股(开曼)有限公司 | Voice broadcast and voice broadcast content setting method and device |
CN111128160B (en) * | 2019-12-19 | 2024-04-09 | 中国平安财产保险股份有限公司 | Receipt modification method and device based on voice recognition and computer equipment |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8219407B1 (en) * | 2007-12-27 | 2012-07-10 | Great Northern Research, LLC | Method for processing the output of a speech recognizer |
CN102800315A (en) * | 2012-07-13 | 2012-11-28 | 上海博泰悦臻电子设备制造有限公司 | Vehicle-mounted voice control method and system |
CN103903619A (en) * | 2012-12-28 | 2014-07-02 | 安徽科大讯飞信息科技股份有限公司 | Method and system for improving accuracy of speech recognition |
CN103943105A (en) * | 2014-04-18 | 2014-07-23 | 安徽科大讯飞信息科技股份有限公司 | Voice interaction method and system |
CN104715754A (en) * | 2015-03-05 | 2015-06-17 | 北京华丰亨通科贸有限公司 | Method and device for rapidly responding to voice commands |
US20160026253A1 (en) * | 2014-03-11 | 2016-01-28 | Magic Leap, Inc. | Methods and systems for creating virtual and augmented reality |
CN105609105A (en) * | 2014-11-13 | 2016-05-25 | 现代自动车株式会社 | Speech recognition system and speech recognition method |
CN105786880A (en) * | 2014-12-24 | 2016-07-20 | 中兴通讯股份有限公司 | Voice recognition method, client and terminal device |
CN105869634A (en) * | 2016-03-31 | 2016-08-17 | 重庆大学 | Field-based method and system for feeding back text error correction after speech recognition |
US9424840B1 (en) * | 2012-08-31 | 2016-08-23 | Amazon Technologies, Inc. | Speech recognition platforms |
CN106710585A (en) * | 2016-12-22 | 2017-05-24 | 上海语知义信息技术有限公司 | Method and system for broadcasting polyphonic characters in voice interaction process |
CN107272887A (en) * | 2017-05-17 | 2017-10-20 | 四川新网银行股份有限公司 | A kind of method that client scene interactivity is realized based on augmented reality |
CN107615377A (en) * | 2015-10-05 | 2018-01-19 | 萨万特系统有限责任公司 | The key phrase suggestion based on history for the Voice command of domestic automation system |
CN107705787A (en) * | 2017-09-25 | 2018-02-16 | 北京捷通华声科技股份有限公司 | A kind of audio recognition method and device |
CN108564940A (en) * | 2018-03-20 | 2018-09-21 | 平安科技(深圳)有限公司 | Audio recognition method, server and computer readable storage medium |
-
2018
- 2018-12-24 CN CN201811581756.4A patent/CN109616111B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8219407B1 (en) * | 2007-12-27 | 2012-07-10 | Great Northern Research, LLC | Method for processing the output of a speech recognizer |
CN102800315A (en) * | 2012-07-13 | 2012-11-28 | 上海博泰悦臻电子设备制造有限公司 | Vehicle-mounted voice control method and system |
US9424840B1 (en) * | 2012-08-31 | 2016-08-23 | Amazon Technologies, Inc. | Speech recognition platforms |
CN103903619A (en) * | 2012-12-28 | 2014-07-02 | 安徽科大讯飞信息科技股份有限公司 | Method and system for improving accuracy of speech recognition |
US20160026253A1 (en) * | 2014-03-11 | 2016-01-28 | Magic Leap, Inc. | Methods and systems for creating virtual and augmented reality |
CN103943105A (en) * | 2014-04-18 | 2014-07-23 | 安徽科大讯飞信息科技股份有限公司 | Voice interaction method and system |
CN105609105A (en) * | 2014-11-13 | 2016-05-25 | 现代自动车株式会社 | Speech recognition system and speech recognition method |
CN105786880A (en) * | 2014-12-24 | 2016-07-20 | 中兴通讯股份有限公司 | Voice recognition method, client and terminal device |
CN104715754A (en) * | 2015-03-05 | 2015-06-17 | 北京华丰亨通科贸有限公司 | Method and device for rapidly responding to voice commands |
CN107615377A (en) * | 2015-10-05 | 2018-01-19 | 萨万特系统有限责任公司 | The key phrase suggestion based on history for the Voice command of domestic automation system |
CN105869634A (en) * | 2016-03-31 | 2016-08-17 | 重庆大学 | Field-based method and system for feeding back text error correction after speech recognition |
CN106710585A (en) * | 2016-12-22 | 2017-05-24 | 上海语知义信息技术有限公司 | Method and system for broadcasting polyphonic characters in voice interaction process |
CN107272887A (en) * | 2017-05-17 | 2017-10-20 | 四川新网银行股份有限公司 | A kind of method that client scene interactivity is realized based on augmented reality |
CN107705787A (en) * | 2017-09-25 | 2018-02-16 | 北京捷通华声科技股份有限公司 | A kind of audio recognition method and device |
CN108564940A (en) * | 2018-03-20 | 2018-09-21 | 平安科技(深圳)有限公司 | Audio recognition method, server and computer readable storage medium |
Non-Patent Citations (1)
Title |
---|
周海涛: "语音合成中多音字识别的实现", 《科技资讯》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112291281A (en) * | 2019-07-09 | 2021-01-29 | 钉钉控股(开曼)有限公司 | Voice broadcast and voice broadcast content setting method and device |
CN112291281B (en) * | 2019-07-09 | 2023-11-03 | 钉钉控股(开曼)有限公司 | Voice broadcasting and voice broadcasting content setting method and device |
CN111128160B (en) * | 2019-12-19 | 2024-04-09 | 中国平安财产保险股份有限公司 | Receipt modification method and device based on voice recognition and computer equipment |
CN111176607A (en) * | 2019-12-27 | 2020-05-19 | 国网山东省电力公司临沂供电公司 | Voice interaction system and method based on power business |
CN111554285A (en) * | 2020-04-26 | 2020-08-18 | 三一重机有限公司 | Voice control system and control method thereof |
CN111897916A (en) * | 2020-07-24 | 2020-11-06 | 惠州Tcl移动通信有限公司 | Voice instruction recognition method and device, terminal equipment and storage medium |
CN111897916B (en) * | 2020-07-24 | 2024-03-19 | 惠州Tcl移动通信有限公司 | Voice instruction recognition method, device, terminal equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109616111B (en) | 2023-03-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109616111A (en) | A kind of scene interactivity control method based on speech recognition | |
US8010369B2 (en) | System and method for controlling devices that are connected to a network | |
JP4036879B2 (en) | Network-based intelligent support mechanism | |
US7007235B1 (en) | Collaborative agent interaction control and synchronization system | |
US11282519B2 (en) | Voice interaction method, device and computer readable storage medium | |
CN103593230B (en) | background task control method of mobile terminal and mobile terminal | |
CN109243431A (en) | A kind of processing method, control method, recognition methods and its device and electronic equipment | |
CN109192208A (en) | A kind of control method of electrical equipment, system, device, equipment and medium | |
CN101365076B (en) | Simplified setting method for television set remote controller and apparatus thereof | |
CN106356059A (en) | Voice control method, device and projector | |
CN105446146A (en) | Intelligent terminal control method based on semantic analysis, system and intelligent terminal | |
CN110309005A (en) | A kind of funcall method, apparatus, terminal device and storage medium | |
WO2021203674A1 (en) | Skill selection method and apparatus | |
CN109143879A (en) | A method of controlling household electrical appliances centered on air-conditioning | |
CN109509468A (en) | A kind of equipment executes the method and device of voice broadcast task | |
JP2021530130A (en) | Methods and equipment for managing holds | |
CN114067798A (en) | Server, intelligent equipment and intelligent voice control method | |
CN202103780U (en) | Multimedia digital conference system | |
CN104899087A (en) | Speech recognition achieving method and system for third-party applications | |
US11516346B2 (en) | Three-way calling terminal for mobile human-machine coordination calling robot | |
CN109166572A (en) | The method and reading machine people that robot is read | |
TWI297123B (en) | Interactive entertainment center | |
CN101808219A (en) | Control method for controlling dialing operation in video conference system terminal and device | |
CN116566760A (en) | Smart home equipment control method and device, storage medium and electronic equipment | |
CN110233944A (en) | Method, system, electronic equipment and the medium of interactive voice response |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |