CN109616111B - Scene interaction control method based on voice recognition - Google Patents
Scene interaction control method based on voice recognition Download PDFInfo
- Publication number
- CN109616111B CN109616111B CN201811581756.4A CN201811581756A CN109616111B CN 109616111 B CN109616111 B CN 109616111B CN 201811581756 A CN201811581756 A CN 201811581756A CN 109616111 B CN109616111 B CN 109616111B
- Authority
- CN
- China
- Prior art keywords
- snapshot
- voice
- voice command
- option
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000003993 interaction Effects 0.000 title claims abstract description 22
- 238000012790 confirmation Methods 0.000 claims description 38
- 238000012545 processing Methods 0.000 claims description 14
- 238000012544 monitoring process Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 7
- 238000013461 design Methods 0.000 claims description 6
- 230000002618 waking effect Effects 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 10
- 230000006870 function Effects 0.000 abstract description 9
- 238000005516 engineering process Methods 0.000 abstract description 7
- 230000006978 adaptation Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 241001672694 Citrus reticulata Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000003028 elevating effect Effects 0.000 description 1
- 230000008713 feedback mechanism Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention provides a scene interaction control method based on voice recognition, which comprises the following steps: the central control system pre-establishes a snapshot library; after the voice recognition control program of the central control system is converted into an activated state, the voice recognition control program of the central control system starts a voice recording module to record a voice command from a user; the voice recognition module performs intention recognition on the voice command, and recognizes one of the following four types: snapshot type, option type, confirm/cancel type, and other types, and executed separately. Has the advantages that: according to the scene interaction control method based on voice recognition, the central control system and the voice recognition technology are combined, the function of controlling the central control system by replacing traditional input equipment with languages is achieved, and the scene interaction control method based on voice recognition has the advantage of good user experience.
Description
Technical Field
The invention belongs to the technical field of scene interaction control, and particularly relates to a scene interaction control method based on voice recognition.
Background
In recent years, with the rapid growth situation of the economy of China, the application requirements of governments and enterprise meeting places gradually change from singleness to diversification; the meeting place application relates to various functions of meeting, scheduling control, emergency command, daily operation, centralized monitoring and the like; the conference room devices are of various types, and include, for example, lamps, speakers, a tiled screen, televisions, cameras, projectors, elevating displays, video disc players, matrix, tiled screen processors, and the like.
At present, the main methods for controlling the meeting place are as follows: according to each meeting place mode, various meeting place devices are manually controlled, for example, in a certain meeting place mode, the brightness of a lamp is respectively controlled, a sound box is turned on, a camera is turned on, and a display is adjusted to a certain height so as to meet meeting requirements. In another meeting place mode, the brightness of the lamp is controlled, the video disc player is started, and the display is adjusted to another height to meet meeting requirements.
The above-mentioned meeting place control method has the following problems: and each controlled device is controlled and adjusted in a manual mode, so that the defects of low control efficiency and large workload of workers are overcome.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a scene interaction control method based on voice recognition, which can effectively solve the problems.
The technical scheme adopted by the invention is as follows:
the invention provides a scene interaction control method based on voice recognition, which comprises the following steps:
step 1, a central control system pre-establishes a snapshot library; the snapshot library stores the corresponding relation between a plurality of snapshot names and snapshot scene execution commands; the central control system controls meeting place equipment by executing the command through the snapshot scene;
the voice recognition control program is in a dormant state which is not awakened at ordinary times to avoid misoperation; at this time, the wakeup words monitor program is continuously in an open state; the main voice monitoring program is in a continuous closing state;
step 2, the awakening word monitoring program monitors in real time and judges whether an awakening word is monitored or not; if the awakening words are not monitored, continuously monitoring; if the awakening word is monitored, executing the step 3;
step 3, the central control system closes the awakening language monitoring program, starts the main voice monitoring program, and further awakens the voice recognition control program of the central control system, and at the moment, the voice recognition control program of the central control system is converted into an activated state;
step 4, a voice recognition control program of the central control system starts a voice recording module, records a voice command from a user through the voice recording module, and stores the recorded voice command; meanwhile, in the process of recording the voice command by the voice recording module, displaying a voice volume waveform by a display module;
step 5, the voice recording module transmits the recorded voice command to a voice recognition module;
step 6, the voice recognition module carries out preliminary voice validity recognition on the voice command, and if the recognition is successful, step 7 is executed; if the identification is not successful, feeding back prompt information of identification failure to the user;
step 7, the voice recognition module performs intention recognition on the voice command, and recognizes one of the following four types: snapshot type, option type, confirm/cancel type, and other types;
if the voice command is of a snapshot type, executing the step 8 through a snapshot type submodule; if the voice command is of the option type, executing step 9 by an option type sub-module; if the voice command is of a confirm/cancel type, executing step 10 by a confirm/cancel type sub-module; if the voice command is of other types, executing step 11 through other types of sub-modules;
and 8: the method for executing the voice command corresponding to the snapshot type through the snapshot type submodule comprises the following steps:
step 8.1, if the voice command is of a snapshot type, obtaining the recognition score of the voice command, judging whether the recognition score exceeds a threshold value, and if not, indicating that the voice command is not clear enough, executing step 8.2; if yes, indicating that the voice command is definite, executing step 8.3;
step 8.2, further judging the uniqueness of the recognition result of the voice command, namely: judging whether the recognition result is a polyphonic condition, if not, indicating that the voice command corresponds to the snapshot name of the only pronunciation in the snapshot library, outputting prompt information for further confirmation of whether to execute the snapshot through a display module, simultaneously recording object information of the confirmation/cancellation in a confirmation/cancellation context configuration table, and then executing subsequent steps by a confirmation/cancellation type submodule; if yes, indicating that the voice command corresponds to polyphonic snapshot names in the snapshot library, forming polyphonic snapshot result sets by the polyphonic snapshot names, displaying the polyphonic snapshot result sets through a display module, recording the option information in an option context configuration table, and executing subsequent steps by the option type submodule;
step 8.3, further judging the uniqueness of the recognition result of the voice command, namely: judging whether the recognition result is a polyphone condition, if not, indicating that the voice command corresponds to the snapshot name of the only pronunciation in the snapshot library, and directly executing the snapshot scene command corresponding to the snapshot name; if yes, indicating that the voice command corresponds to polyphonic snapshot names in the snapshot library, forming each polyphonic snapshot name into a polyphonic snapshot result set, displaying the polyphonic snapshot result set through a display module, recording the option information in an option configuration table, and executing subsequent steps by the option type sub-module;
when any next voice command from the user is executed after the option context configuration table stores the option information, the option context configuration table is emptied;
every time the confirm/cancel context configuration table stores the confirmed/canceled object information, when any next voice command from the user is executed, the confirm/cancel context configuration table is emptied;
step 9, executing the voice command corresponding to the option type by the option type submodule, comprising the following steps:
if the voice command is of an option type, firstly searching the option context configuration table, judging whether the option context configuration table is empty, if not, indicating that the above information corresponding to the voice command exists, and directly executing a corresponding snapshot scene command according to the option information and the voice command stored in the option context configuration table; if the option context configuration table is empty, indicating that the above information does not exist, outputting prompt information of voice recognition failure through a display module;
step 10, executing the voice command corresponding to the confirmation/cancellation type through the confirmation/cancellation type submodule, including the following steps:
if the voice command is of a confirmation/cancellation type, firstly searching the confirmation/cancellation context configuration table, judging whether the confirmation/cancellation context configuration table is empty or not, if not, indicating that the above information corresponding to the voice command exists, and directly executing a corresponding snapshot scene command according to the confirmation/cancellation object information stored in the confirmation/cancellation context configuration table and the voice command; if the confirmation/cancellation context configuration table is empty, indicating that the above information does not exist, outputting prompt information of voice recognition failure through a display module;
step 11, executing the voice command corresponding to other types through the sub-modules of other types, comprising the following steps: and outputting prompt information of voice recognition failure through a display module.
Preferably, a wake-up button is configured, and when the wake-up button is clicked, the voice recognition control program of the central control system is manually woken up from a dormant state to an active state.
Preferably, the outputting of the prompt information of the voice recognition failure through the display module specifically includes: and playing the speech prompt apology statement, and simultaneously outputting the prompt type statement of the speech command.
Preferably, in step 1, the snapshot library established by the central control system is dynamically updated in real time.
Preferably, the control method of the central control system for the meeting place equipment includes: touch and click on the screen, remote control pen key triggering and voice recognition control.
The scene interaction control method based on the voice recognition provided by the invention has the following advantages:
according to the scene interaction control method based on voice recognition, the central control system and the voice recognition technology are combined, the function of controlling the central control system by using the language to replace the traditional input equipment is achieved, and the scene interaction control method based on the voice recognition has the advantage of good user experience.
Drawings
Fig. 1 is a schematic flow chart of a scene interaction control method based on speech recognition according to the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Key term interpretation:
the controlled device: basic equipment for supporting the functions of the field service hall, such as a large screen system, a central air conditioner, a workstation, a sound amplification system, a light management module and the like.
Scene mode (hereinafter snapshot): and performing combined control on the names defined in the service level for each on-site controlled device, such as an emergency mode, a daily monitoring mode and the like.
The central control system: the control system is a device that centrally controls various devices such as sound, light, and electricity. The intelligent control system is applied to multimedia classrooms, multifunctional conference halls, command control centers, intelligent families and the like, and users can use devices such as a button control panel, a computer display, a touch screen, a wireless remote control and the like to control devices such as a projector, a display stand, a video disc player, a video recorder and the like through a computer and central control system software.
In the last two decades, speech recognition technology has advanced significantly, starting to move from the laboratory to the market. It is expected that voice recognition technology will enter various fields such as industry, home appliances, communications, automotive electronics, medical care, home services, consumer electronics, etc. within the next 10 years.
In practical application, the central control system continuously derives various application scenes such as demonstration report, daily monitoring, production scheduling and the like according to business requirements. Users are gradually moving from desktop computers to smaller, mobile terminals that interact with the system.
Therefore, the invention provides a scene interaction control method based on voice recognition, which combines a voice recognition technology with a central control system and relates to a scene interaction control method using voice as a command medium.
The invention utilizes the voice recognition technology to recognize the voice message of the speaker and analyze and process the voice message, if the voice message contains the effective snapshot name, the voice message is sent to the central control system to execute the specific operation, thereby realizing the purpose of liberating the double-hand control equipment.
The method is realized by two parts of hardware and software processing.
1. Hardware deployment
The audio acquisition device is required to be connected to the control terminal of the central control system: the mobile terminal with the Bluetooth function is connected by using a Bluetooth earphone, and both language acquisition and voice feedback are completed by the Bluetooth earphone; the control terminal without the Bluetooth function uses a wired microphone, and the voice feedback needs to be additionally provided with a loudspeaker (a microphone and sound integrated machine or a sound amplifying system which is intervened in a field as required).
2. Software implementation process
The system is divided into a voice recognition module and a central control system, wherein the voice recognition module is a technical point of the system for key application and protection, and the central control system sends a control instruction to target equipment according to a snapshot mapping equipment control protocol and physical link configuration after receiving the instruction of the voice recognition module, so that a required scene mode is quickly obtained, and the control efficiency is improved. The key point of the invention is how to identify the voice from the user and correspond to the snapshot name in the snapshot library; after the snapshot names in the snapshot library are matched, the control instruction can be sent to the target device only by executing the corresponding control instruction.
In the invention, in order to avoid adding a wakeup link for misoperation, a service scene is divided into a wakeup part and a voice recognition part, and a corresponding logic processing method is designed according to different intentions of a user, wherein a specific service judgment flow is shown in fig. 1. For the system, the only operation path of the user is the voice, and the system designs the only service entrance for monitoring the voice spoken by the user.
Referring to fig. 1, a scene interaction control method based on speech recognition includes the following steps:
step 1, a central control system pre-establishes a snapshot library; the snapshot library stores the corresponding relation between a plurality of snapshot names and snapshot scene execution commands; the central control system controls meeting place equipment by executing the command through the snapshot scene; the control mode of the central control system to the meeting place equipment includes but is not limited to: touch and click on the screen, remote control pen key triggering and voice recognition control.
For example: the snapshot names are respectively: the system comprises a conference mode, a benefit mode, a visual scheduling scene, a science and petrochemical scheduling scene, a light full-Kai mode, a light full-desert mode and the like. Each mode corresponds to a set of corresponding instructions for executing each controlled device. In practical application, the snapshot name has the condition of wrongly written or polyphonic characters. For example, the desert in the full desert type of light is a wrongly written word; the two snapshot names of the Huifei mode and the conference mode are in the same Pinyin, but the tone of part of the words is different, and the two snapshot names are considered to be polyphonic snapshot names.
The subsequent scene interaction control method based on the voice recognition can realize the recognition and execution of the snapshot names with wrongly-written or polyphonic characters.
In addition, the snapshot library established by the central control system can be dynamically updated in real time. Namely: the central control system allows the user to create different snapshots according to business needs (each snapshot actually contains inside it pre-control messages for one or more controlled devices) and to customize a name for the snapshot. And the name set of the snapshots is the valid range of the voice control function.
The voice recognition control program is in a dormant state which is not awakened at ordinary times to avoid misoperation; at this time, the wakeup words monitor is continuously in the open state; the main voice monitoring program is in a continuous closing state;
in practical application, a wake-up button may be configured, and when the wake-up button is clicked, the voice recognition control program of the central control system is manually woken up from a dormant state to an active state.
The design principle of the awakening process is as follows:
the system adds a wake-up mechanism to avoid user misoperation (e.g. multiple users are talking and may mention that the system can recognize command statements causing snapshot misoperation), similar to the screen locking/unlocking mechanism of a mobile phone. Namely: when not awakened, the system is similar to the screen locking state; when in the active state, the system is in the unlocked state.
At the same time, only one of the awakening voice monitoring program and the main voice monitoring program is in an open state, and the other is in a closed state.
Step 2, the awakening word monitoring program monitors in real time and judges whether an awakening word is monitored or not; if the awakening words are not monitored, continuously monitoring; if the awakening word is monitored, executing the step 3;
step 3, the central control system closes the awakening language monitoring program, starts the main voice monitoring program, and further awakens the voice recognition control program of the central control system, and at the moment, the voice recognition control program of the central control system is converted into an activated state;
for example, when the central control system is in a dormant state without being awakened, the user speaks a certain word, the word is monitored by the awakening word monitoring program, and then the awakening word monitoring program judges whether the word is an awakening word; wherein, the wake-up word is a word that the system pre-customizes according to the requirement, for example, "small, constant, small and constant"; if the word is a wake-up word, starting a main voice monitoring program; meanwhile, the display module outputs voice waveform feedback to prompt a user that sound is currently captured, and voice output prompts that the voice assistant is started; if the wake word is not recognized, the system does not give any feedback.
Step 4, a voice recognition control program of the central control system starts a voice recording module, records a voice command from a user through the voice recording module, and stores the recorded voice command; meanwhile, in the process of recording the voice command by the voice recording module, displaying a voice volume waveform by a display module;
specifically, when the central control system is in an activated state, a user speaks a voice command by using the Mandarin, and the central control system performs voice recording operation and waveform feedback. The 'voice recording' is to wait for the user to finish the expression of the current voice information and carry out memory storage on the voice information and then carry out analysis on the voice information by a voice recognition module; the waveform feedback is feedback for giving the user the recording quality when the voice information is expressed, if the waveform is not obvious, the voice quality of the user is low, and the user is prompted to increase the volume or close the distance between the voice information and the audio acquisition equipment.
Step 5, the voice recording module transmits the recorded voice command to a voice recognition module;
step 6, the voice recognition module carries out preliminary voice validity recognition on the voice command, and if the recognition is successful, the step 7 is executed; if the identification is not successful, feeding back prompt information of identification failure to the user;
specifically, the voice recording module transmits the recorded voice command to the voice recognition module; the voice recognition module judges whether the information is recognized or not, and if the information is recognized, the subsequent processing is carried out; if not, a voice prompt apology statement is played while the display module outputs a prompt-like statement, such as "you can ask me this \8230;".
Step 7, the voice recognition module performs intention recognition on the voice command, and recognizes one of the following four types: snapshot type, option type, confirm/cancel type, and other types;
if the voice command is of a snapshot type, executing the step 8 through a snapshot type submodule; if the voice command is of the option type, executing step 9 by an option type sub-module; if the voice command is of a confirm/cancel type, executing step 10 by a confirm/cancel type sub-module; if the voice command is of other types, executing step 11 through other types of sub-modules;
in the invention, the effective results of the voice recognition are classified and processed respectively. The classification includes: snapshot, option (polyphonic group option), confirm/cancel (instruction confirmation below threshold), and others (beyond system processing power).
And 8: the method for executing the voice command corresponding to the snapshot type through the snapshot type submodule comprises the following steps:
step 8.1, if the voice command is of a snapshot type, obtaining the recognition score of the voice command, judging whether the recognition score exceeds a threshold value, and if not, indicating that the voice command is not clear enough, executing step 8.2; if yes, indicating that the voice command is definite, executing step 8.3;
step 8.2, further judging the uniqueness of the recognition result of the voice command, namely: judging whether the recognition result is a polyphonic condition, if not, indicating that the voice command corresponds to the snapshot name of the only pronunciation in the snapshot library, outputting prompt information for further confirmation of whether to execute the snapshot through a display module, simultaneously recording object information of the confirmation/cancellation in a confirmation/cancellation context configuration table, and then executing subsequent steps by a confirmation/cancellation type submodule; if yes, indicating that the voice command corresponds to polyphonic snapshot names in the snapshot library, forming each polyphonic snapshot name into a polyphonic snapshot result set, displaying the polyphonic snapshot result set through a display module, recording the option information in an option context configuration table, and executing subsequent steps by the option type sub-module;
step 8.3, further judging the uniqueness of the recognition result of the voice command, namely: judging whether the identification result is a polyphonic condition, if not, indicating that the voice command corresponds to the snapshot name of the only pronunciation in the snapshot library, and directly executing the snapshot scene command corresponding to the snapshot name; if yes, indicating that the voice command corresponds to polyphonic snapshot names in the snapshot library, forming polyphonic snapshot result sets by the polyphonic snapshot names, displaying the polyphonic snapshot result sets through a display module, simultaneously recording the option information in an option configuration table, and executing subsequent steps by the option type submodule;
when any next voice command from the user is executed after the option context configuration table stores the option information, the option context configuration table is emptied;
every time the confirm/cancel context configuration table stores the confirmed/canceled object information, when any next voice command from the user is executed, the confirm/cancel context configuration table is emptied;
specifically, the snapshot type submodule judges that the recognition score of the snapshot type voice command exceeds a threshold value (the threshold value is based on the recognition degree of the voice recognition result), if the recognition score exceeds the threshold value, the system considers that the command is clear, and then carries out subsequent judgment on uniqueness of the recognition result, and if the recognition result is unique, the user does not need to confirm the unique recognition result, and the snapshot execution command can be directly carried out; if the identification result is not unique and indicates that the identification result is a polyphonic snapshot name, pushing a corresponding snapshot name list to the user, and directly carrying out snapshot execution instructions without further confirmation to the user after the user selects the corresponding snapshot name from the list; if the identification degree is lower than the threshold value, the system judges that the instruction is not clear enough, firstly, the judgment of 'identification result is only' is carried out, if the identification result is only, prompt information still needs to be output on a display module, a user is asked to further confirm whether the snapshot is executed, and the system plays a voice prompt to 'whether a certain snapshot is executed'. If it is a polyphonic case, it is processed by the polyphonic processing module (polyphonic processing itself is also a further confirmation action).
Step 9, executing the voice command corresponding to the option type by the option type submodule, comprising the following steps:
if the voice command is of an option type, firstly searching the option context configuration table, judging whether the option context configuration table is empty, if not, indicating that the above information corresponding to the voice command exists, and directly executing a corresponding snapshot scene command according to the option information and the voice command stored in the option context configuration table; if the option context configuration table is empty, indicating that the above information does not exist, outputting prompt information of voice recognition failure through a display module;
specifically, the option type sub-module needs to have a context memory capability, and when a polyphonic snapshot name is encountered, the system performs polyphonic processing to provide a user with a selection of the polyphonic snapshot name set. After the system recognizes that the voice command is of the option type, whether the voice command has the above information or not is judged based on the option context configuration table, if yes, the voice command is matched with the options of the previous round of conversation, the clear options are subjected to snapshot execution, the system voice gives information that the playing and the execution of a certain snapshot are successful, and the display equipment outputs the information that the execution of a certain snapshot is successful. If there is no information above, the system design is considered from the anthropomorphic perspective, and when a topic without support above is considered as an illegal operation, the system plays a voice prompt apology statement and the display device outputs a prompt type statement, such as "you can ask me this 8230;".
Step 10, executing the voice command corresponding to the confirmation/cancellation type through the confirmation/cancellation type submodule, including the following steps:
if the voice command is of a confirm/cancel type, firstly searching the confirm/cancel context configuration table, judging whether the confirm/cancel context configuration table is empty, if not, indicating that the above information corresponding to the voice command exists, and directly executing a corresponding snapshot scene command according to the confirm/cancel object information and the voice command stored in the confirm/cancel context configuration table; if the confirmation/cancellation context configuration table is empty, indicating that the above information does not exist, outputting prompt information of voice recognition failure through a display module;
specifically, the confirm/cancel type sub-module is also a context processing condition and is used for processing the result secondary confirmation lower than the set threshold of the speech recognition degree, after the system recognizes the confirm/cancel type, the system firstly judges whether the above information exists on the basis of the confirm/cancel context configuration table, if the above information exists, the snapshot is executed, the system speech gives the information that the playing and the execution of a certain snapshot are successful, and the display device outputs the information that the execution of a certain snapshot is successful. If there is no information above, the system design is considered from the anthropomorphic perspective, and when a topic without the above support is considered as an illegal operation, the system plays a voice prompt apology statement and the display device outputs a prompt-like statement such as "you can ask me for 8230this way".
Step 11, executing the voice command corresponding to other types through the sub-modules of other types, comprising the following steps: and outputting prompt information of voice recognition failure through a display module.
Specifically, when the voice command is of another type, the system is designed to be beyond the system processing capacity, and the unified processing is as follows: the system plays a speech prompting apology statement, and the display device outputs a prompting statement such as 'you can ask me this way' \8230;).
In each step of the invention, when the prompt message of speech recognition failure is output through the display module, the prompt message is specifically as follows: and playing the speech prompt apology statement, and simultaneously outputting the prompt type statement of the speech command.
In the invention, in the process of identifying the intention, when the intention of the current user is judged to be the snapshot type, the judgment of 'unique identification result' is carried out, if the result is 'yes', the result is the snapshot name with unique pronunciation in a snapshot library, the snapshot is directly executed, the system voice gives information of successful playing and executing of a certain snapshot, and the display equipment outputs the information of successful executing of a certain snapshot; if no, the result is not the snapshot name of the unique pronunciation in the snapshot library, the system lists all the snapshot sets of the pronunciation through the display equipment to allow the user to select, and voice feedback output is provided to prompt the user, such as "please speak or click the item number according to the screen display". For example, when it is recognized that the voice command from the user is "benefit mode", the system searches the snapshot set as "benefit mode, conference mode 1, conference mode 2", then the system displays "benefit mode, conference mode 1, conference mode 2" in the form of a list, and if the user selects to execute conference mode 1, then "conference mode 1" is clicked, and the system executes the snapshot scene execution command corresponding to "conference mode 1".
The scene interaction control method based on the voice recognition provided by the invention has the following characteristics:
1. the system closes the voice monitoring program in time after completing certain tasks, avoids misoperation of the system caused by complex sound of a field environment, and enables a user to rapidly wake up the intelligent voice recognition system in a sound or manual clicking mode.
Under the activation mode of the voice control system, a user speaks a voice command containing an effective snapshot name to control the field device, such as speaking a 'monitoring starting mode', 'meeting executing mode', 'opening mode one') and the like, and the method has the advantage of simple operation of the user, so that the use experience of the user is improved;
2. a voice feedback mechanism: the method pays attention to the user interaction experience, provides effective feedback information for various situations spoken by the user in use, and guides the user to correctly use the voice control system.
Specifically, the method comprises the following steps:
1) High-quality speech: when the system receives a clear, standard voice command, the system will directly perform the task intended by the user as a secure, identifiable command.
2) If the recognized voice quality is lower than the excellent voice score, the system firstly confirms the voice recognition task and the user, and the user can manually click or speak a prompt command word to confirm or cancel the task in order to ensure the safety of the field device and other problems.
3) Command of polyphone: when a polyphonic mode in the system is identified, the system feeds back to the user options, and asks the user to manually click or speak the options.
4) Out of valid scene keywords range: for the case that the user speaks out of knowledge (cannot understand), the system will have a prompt page to guide the example sentence that the user can refer to.
3. Context capabilities: the method can temporarily store the recognition result in the memory of the computer under the condition of coping with the multiple recognition results (polyphones) for the user to answer the confirmation message and then carry out subsequent processing, thereby realizing the language communication capability of simulating the context of the human.
4. The system supports dynamic loading of newly-built snapshot name keywords. When a user creates a snapshot and names the mode, and the snapshot is stored successfully, the voice recognition function of the snapshot is immediately supported by the voice recognition system.
Therefore, the scene interaction control method based on the voice recognition combines the central control system and the voice recognition technology, realizes the function of controlling the central control system by replacing the traditional input equipment with the language, and has the advantage of good user experience.
The method is an auxiliary interactive means, adopts the most original expression voice of human as a message transmission medium, and is one of the best modes of human-computer interaction. The combination of voice recognition and a central control system can further improve the construction value and the technological sense of a commercial control site.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, many modifications and adaptations can be made without departing from the principle of the present invention, and such modifications and adaptations should also be considered to be within the scope of the present invention.
Claims (1)
1. A scene interaction control method based on voice recognition is characterized by comprising the following steps:
step 1, a central control system pre-establishes a snapshot library; the snapshot library stores the corresponding relation between a plurality of snapshot names and snapshot scene execution commands; the central control system controls meeting place equipment by executing the command through the snapshot scene;
the voice recognition control program is in a dormant state which is not awakened at ordinary times to avoid misoperation; at this time, the wakeup words monitor program is continuously in an open state; the main voice monitoring program is in a continuous closing state;
step 2, the awakening word monitoring program monitors in real time and judges whether an awakening word is monitored or not; if the awakening words are not monitored, continuously monitoring; if the awakening words are monitored, executing the step 3;
step 3, the central control system closes the awakening language monitoring program, starts the main voice monitoring program, and further awakens the voice recognition control program of the central control system, and at the moment, the voice recognition control program of the central control system is converted into an activated state;
step 4, a voice recognition control program of the central control system starts a voice recording module, records a voice command from a user through the voice recording module, and stores the recorded voice command; meanwhile, in the process of recording the voice command by the voice recording module, displaying a voice volume waveform by a display module;
step 5, the voice recording module transmits the recorded voice command to a voice recognition module;
step 6, the voice recognition module carries out preliminary voice validity recognition on the voice command, and if the recognition is successful, step 7 is executed; if the identification is not successful, feeding back prompt information of identification failure to the user;
step 7, the voice recognition module performs intention recognition on the voice command, and recognizes one of the following four types: snapshot type, option type, confirm/cancel type, and other types;
if the voice command is of a snapshot type, executing the step 8 through a snapshot type submodule; if the voice command is of the option type, executing step 9 by an option type sub-module; if the voice command is of a confirm/cancel type, executing step 10 by a confirm/cancel type sub-module; if the voice command is of other types, executing step 11 through other types of sub-modules;
and 8: the method for executing the voice command corresponding to the snapshot type through the snapshot type submodule comprises the following steps:
step 8.1, if the voice command is of a snapshot type, obtaining a recognition score of the voice command, judging whether the recognition score exceeds a threshold value, and if not, indicating that the voice command is not clear enough, executing step 8.2; if yes, indicating that the voice command is definite, executing step 8.3;
step 8.2, further judging the uniqueness of the recognition result of the voice command, namely: judging whether the recognition result is a polyphonic condition, if not, indicating that the voice command corresponds to the snapshot name of the only pronunciation in the snapshot library, outputting prompt information for further confirmation of whether to execute the snapshot through a display module, simultaneously recording object information of the confirmation/cancellation in a confirmation/cancellation context configuration table, and then executing subsequent steps by a confirmation/cancellation type submodule; if yes, indicating that the voice command corresponds to polyphonic snapshot names in the snapshot library, forming each polyphonic snapshot name into a polyphonic snapshot result set, displaying the polyphonic snapshot result set through a display module, recording the option information in an option context configuration table, and executing subsequent steps by the option type sub-module;
and 8.3, further judging the uniqueness of the recognition result of the voice command, namely: judging whether the identification result is a polyphonic condition, if not, indicating that the voice command corresponds to the snapshot name of the only pronunciation in the snapshot library, and directly executing the snapshot scene command corresponding to the snapshot name; if yes, indicating that the voice command corresponds to polyphonic snapshot names in the snapshot library, forming each polyphonic snapshot name into a polyphonic snapshot result set, displaying the polyphonic snapshot result set through a display module, recording the option information in an option configuration table, and executing subsequent steps by the option type sub-module;
when the option context configuration table stores option information, the option context configuration table is emptied after any next voice command from the user is executed;
every time when the affirmation/cancellation context configuration table stores affirmation/cancellation object information, when any next voice command from the user is executed, the affirmation/cancellation context configuration table is emptied;
specifically, the snapshot type sub-module performs 'identification score exceeding threshold' on the voice command of the snapshot type, the threshold is determined according to the identification degree of the 'voice identification' result, if the identification score exceeds the threshold, the system considers that the command is clear, performs subsequent 'identification result uniqueness' determination, and if the identification result is unique, the user does not need to confirm the unique identification result, and the snapshot execution command can be directly performed; if the identification result is not unique and indicates that the identification result is a polyphonic snapshot name, pushing a corresponding snapshot name list to the user, and directly carrying out snapshot execution instructions without further confirmation to the user after the user selects the corresponding snapshot name from the list; if the identification degree is lower than the threshold value, the system judges that the instruction is not clear enough, firstly, the judgment of 'only identification result' is carried out, if the identification result is only, prompt information still needs to be output on a display module, a user is asked to further confirm whether to execute the snapshot, and the system plays a voice prompt to 'whether to execute a certain snapshot'; if the condition is polyphone, the polyphone is processed by a polyphone processing module;
step 9, executing the voice command corresponding to the option type by the option type sub-module, including the following steps:
if the voice command is of an option type, firstly searching the option context configuration table, judging whether the option context configuration table is empty or not, if not, indicating that the above information corresponding to the voice command exists, and directly executing a corresponding snapshot scene command according to the option information and the voice command stored in the option context configuration table; if the option context configuration table is empty, indicating that the above information does not exist, outputting prompt information of voice recognition failure through a display module;
specifically, the option type sub-module needs to have context memory capability, and when a polyphonic snapshot name is encountered, the system performs polyphonic processing to provide a user with selection of a polyphonic snapshot name set; after the system identifies that the voice command is of an option type, whether the voice command has the above information or not is judged based on an option context configuration table, if yes, the voice command is matched with the options of the previous round of conversation, the clear options are subjected to snapshot execution, the voice of the system gives information that the playing and the execution of a certain snapshot are successful, and the display equipment outputs the information that the execution of a certain snapshot is successful; if the above information does not exist, the system design is considered from the anthropomorphic perspective, and when one topic which is not supported by the above information exists, the situation is considered as illegal operation, the system plays a speech prompt apology statement, and meanwhile, the display equipment outputs a prompt type statement;
step 10, executing the voice command corresponding to the confirmation/cancellation type through the confirmation/cancellation type sub-module, including the following steps:
if the voice command is of a confirmation/cancellation type, firstly searching the confirmation/cancellation context configuration table, judging whether the confirmation/cancellation context configuration table is empty or not, if not, indicating that the above information corresponding to the voice command exists, and directly executing a corresponding snapshot scene command according to the confirmation/cancellation object information stored in the confirmation/cancellation context configuration table and the voice command; if the confirmation/cancellation context configuration table is empty, indicating that the above information does not exist, outputting prompt information of voice recognition failure through a display module;
specifically, the confirm/cancel type sub-module is also a context processing condition and is used for processing the secondary confirmation of the result lower than the set threshold of the speech recognition degree, after the system recognizes the confirm/cancel type, the system firstly judges whether the above information exists based on the confirm/cancel context configuration table, if the above information exists, the snapshot is executed, the system speech gives the information that the playing and the execution of a certain snapshot are successful, and the display device outputs the information that the execution of a certain snapshot is successful; if the above information does not exist, the system design is considered from the anthropomorphic perspective, and when one topic which is not supported by the above information exists, the situation is considered as illegal operation, the system plays a speech prompt apology statement, and meanwhile, the display equipment outputs a prompt type statement;
step 11, executing the voice command corresponding to other types through the sub-modules of other types, comprising the following steps: outputting prompt information of voice recognition failure through a display module;
when the wake-up button is clicked, manually waking up a voice recognition control program of the central control system from a dormant state to an activated state;
the prompt message of the voice recognition failure output through the display module specifically comprises: playing a speech prompting apology statement, and simultaneously outputting a prompting statement for replacing the speech command;
in step 1, a snapshot library established by the central control system is dynamically updated in real time;
wherein, the control mode of the central control system to the meeting place equipment comprises the following steps: touch screen clicking, remote control pen button triggering and voice recognition control.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811581756.4A CN109616111B (en) | 2018-12-24 | 2018-12-24 | Scene interaction control method based on voice recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811581756.4A CN109616111B (en) | 2018-12-24 | 2018-12-24 | Scene interaction control method based on voice recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109616111A CN109616111A (en) | 2019-04-12 |
CN109616111B true CN109616111B (en) | 2023-03-14 |
Family
ID=66011357
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811581756.4A Active CN109616111B (en) | 2018-12-24 | 2018-12-24 | Scene interaction control method based on voice recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109616111B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112291281B (en) * | 2019-07-09 | 2023-11-03 | 钉钉控股(开曼)有限公司 | Voice broadcasting and voice broadcasting content setting method and device |
CN111128160B (en) * | 2019-12-19 | 2024-04-09 | 中国平安财产保险股份有限公司 | Receipt modification method and device based on voice recognition and computer equipment |
CN111176607A (en) * | 2019-12-27 | 2020-05-19 | 国网山东省电力公司临沂供电公司 | Voice interaction system and method based on power business |
CN111554285A (en) * | 2020-04-26 | 2020-08-18 | 三一重机有限公司 | Voice control system and control method thereof |
CN111897916B (en) * | 2020-07-24 | 2024-03-19 | 惠州Tcl移动通信有限公司 | Voice instruction recognition method, device, terminal equipment and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8219407B1 (en) * | 2007-12-27 | 2012-07-10 | Great Northern Research, LLC | Method for processing the output of a speech recognizer |
CN102800315A (en) * | 2012-07-13 | 2012-11-28 | 上海博泰悦臻电子设备制造有限公司 | Vehicle-mounted voice control method and system |
CN103903619A (en) * | 2012-12-28 | 2014-07-02 | 安徽科大讯飞信息科技股份有限公司 | Method and system for improving accuracy of speech recognition |
CN103943105A (en) * | 2014-04-18 | 2014-07-23 | 安徽科大讯飞信息科技股份有限公司 | Voice interaction method and system |
CN104715754A (en) * | 2015-03-05 | 2015-06-17 | 北京华丰亨通科贸有限公司 | Method and device for rapidly responding to voice commands |
CN105609105A (en) * | 2014-11-13 | 2016-05-25 | 现代自动车株式会社 | Speech recognition system and speech recognition method |
CN105786880A (en) * | 2014-12-24 | 2016-07-20 | 中兴通讯股份有限公司 | Voice recognition method, client and terminal device |
CN105869634A (en) * | 2016-03-31 | 2016-08-17 | 重庆大学 | Field-based method and system for feeding back text error correction after speech recognition |
US9424840B1 (en) * | 2012-08-31 | 2016-08-23 | Amazon Technologies, Inc. | Speech recognition platforms |
CN106710585A (en) * | 2016-12-22 | 2017-05-24 | 上海语知义信息技术有限公司 | Method and system for broadcasting polyphonic characters in voice interaction process |
CN107615377A (en) * | 2015-10-05 | 2018-01-19 | 萨万特系统有限责任公司 | The key phrase suggestion based on history for the Voice command of domestic automation system |
CN107705787A (en) * | 2017-09-25 | 2018-02-16 | 北京捷通华声科技股份有限公司 | A kind of audio recognition method and device |
CN108564940A (en) * | 2018-03-20 | 2018-09-21 | 平安科技(深圳)有限公司 | Audio recognition method, server and computer readable storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10203762B2 (en) * | 2014-03-11 | 2019-02-12 | Magic Leap, Inc. | Methods and systems for creating virtual and augmented reality |
CN107272887A (en) * | 2017-05-17 | 2017-10-20 | 四川新网银行股份有限公司 | A kind of method that client scene interactivity is realized based on augmented reality |
-
2018
- 2018-12-24 CN CN201811581756.4A patent/CN109616111B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8219407B1 (en) * | 2007-12-27 | 2012-07-10 | Great Northern Research, LLC | Method for processing the output of a speech recognizer |
CN102800315A (en) * | 2012-07-13 | 2012-11-28 | 上海博泰悦臻电子设备制造有限公司 | Vehicle-mounted voice control method and system |
US9424840B1 (en) * | 2012-08-31 | 2016-08-23 | Amazon Technologies, Inc. | Speech recognition platforms |
CN103903619A (en) * | 2012-12-28 | 2014-07-02 | 安徽科大讯飞信息科技股份有限公司 | Method and system for improving accuracy of speech recognition |
CN103943105A (en) * | 2014-04-18 | 2014-07-23 | 安徽科大讯飞信息科技股份有限公司 | Voice interaction method and system |
CN105609105A (en) * | 2014-11-13 | 2016-05-25 | 现代自动车株式会社 | Speech recognition system and speech recognition method |
CN105786880A (en) * | 2014-12-24 | 2016-07-20 | 中兴通讯股份有限公司 | Voice recognition method, client and terminal device |
CN104715754A (en) * | 2015-03-05 | 2015-06-17 | 北京华丰亨通科贸有限公司 | Method and device for rapidly responding to voice commands |
CN107615377A (en) * | 2015-10-05 | 2018-01-19 | 萨万特系统有限责任公司 | The key phrase suggestion based on history for the Voice command of domestic automation system |
CN105869634A (en) * | 2016-03-31 | 2016-08-17 | 重庆大学 | Field-based method and system for feeding back text error correction after speech recognition |
CN106710585A (en) * | 2016-12-22 | 2017-05-24 | 上海语知义信息技术有限公司 | Method and system for broadcasting polyphonic characters in voice interaction process |
CN107705787A (en) * | 2017-09-25 | 2018-02-16 | 北京捷通华声科技股份有限公司 | A kind of audio recognition method and device |
CN108564940A (en) * | 2018-03-20 | 2018-09-21 | 平安科技(深圳)有限公司 | Audio recognition method, server and computer readable storage medium |
Non-Patent Citations (1)
Title |
---|
语音合成中多音字识别的实现;周海涛;《科技资讯》;20080430(第11期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109616111A (en) | 2019-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109616111B (en) | Scene interaction control method based on voice recognition | |
US9953648B2 (en) | Electronic device and method for controlling the same | |
US11354089B2 (en) | System and method for dialog interaction in distributed automation systems | |
JP6516585B2 (en) | Control device, method thereof and program | |
KR101726945B1 (en) | Reducing the need for manual start/end-pointing and trigger phrases | |
US11282519B2 (en) | Voice interaction method, device and computer readable storage medium | |
US10811008B2 (en) | Electronic apparatus for processing user utterance and server | |
WO2017012511A1 (en) | Voice control method and device, and projector apparatus | |
WO2020029500A1 (en) | Voice command customization method, device, apparatus, and computer storage medium | |
WO2016052018A1 (en) | Home appliance management system, home appliance, remote control device, and robot | |
CN109240107B (en) | Control method and device of electrical equipment, electrical equipment and medium | |
CN105323648A (en) | Method for closed captioning and electronic device | |
US10540973B2 (en) | Electronic device for performing operation corresponding to voice input | |
CN109920416A (en) | Voice control method, device, storage medium and control system | |
CN114172757A (en) | Server, intelligent home system and multi-device voice awakening method | |
CN114067798A (en) | Server, intelligent equipment and intelligent voice control method | |
CN108648754A (en) | Sound control method and device | |
CN109215642A (en) | Processing method, device and the electronic equipment of man-machine conversation | |
WO2024103926A1 (en) | Voice control methods and apparatuses, storage medium, and electronic device | |
JP2021530130A (en) | Methods and equipment for managing holds | |
WO2020135773A1 (en) | Data processing method, device, and computer-readable storage medium | |
CN104423992A (en) | Speech recognition startup method for display | |
US11516346B2 (en) | Three-way calling terminal for mobile human-machine coordination calling robot | |
CN116802602A (en) | Hot word group | |
CN116566760B (en) | Smart home equipment control method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |