CN109616111B - Scene interaction control method based on voice recognition - Google Patents

Scene interaction control method based on voice recognition Download PDF

Info

Publication number
CN109616111B
CN109616111B CN201811581756.4A CN201811581756A CN109616111B CN 109616111 B CN109616111 B CN 109616111B CN 201811581756 A CN201811581756 A CN 201811581756A CN 109616111 B CN109616111 B CN 109616111B
Authority
CN
China
Prior art keywords
snapshot
voice
voice command
option
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811581756.4A
Other languages
Chinese (zh)
Other versions
CN109616111A (en
Inventor
钱苏晋
门涛
刘鹏
董杰
周金涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jibei Power Co ltd Smart Distribution Network Center
Beijing E Techstar Co ltd
Original Assignee
State Grid Jibei Power Co ltd Smart Distribution Network Center
Beijing E Techstar Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jibei Power Co ltd Smart Distribution Network Center, Beijing E Techstar Co ltd filed Critical State Grid Jibei Power Co ltd Smart Distribution Network Center
Priority to CN201811581756.4A priority Critical patent/CN109616111B/en
Publication of CN109616111A publication Critical patent/CN109616111A/en
Application granted granted Critical
Publication of CN109616111B publication Critical patent/CN109616111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention provides a scene interaction control method based on voice recognition, which comprises the following steps: the central control system pre-establishes a snapshot library; after the voice recognition control program of the central control system is converted into an activated state, the voice recognition control program of the central control system starts a voice recording module to record a voice command from a user; the voice recognition module performs intention recognition on the voice command, and recognizes one of the following four types: snapshot type, option type, confirm/cancel type, and other types, and executed separately. Has the advantages that: according to the scene interaction control method based on voice recognition, the central control system and the voice recognition technology are combined, the function of controlling the central control system by replacing traditional input equipment with languages is achieved, and the scene interaction control method based on voice recognition has the advantage of good user experience.

Description

Scene interaction control method based on voice recognition
Technical Field
The invention belongs to the technical field of scene interaction control, and particularly relates to a scene interaction control method based on voice recognition.
Background
In recent years, with the rapid growth situation of the economy of China, the application requirements of governments and enterprise meeting places gradually change from singleness to diversification; the meeting place application relates to various functions of meeting, scheduling control, emergency command, daily operation, centralized monitoring and the like; the conference room devices are of various types, and include, for example, lamps, speakers, a tiled screen, televisions, cameras, projectors, elevating displays, video disc players, matrix, tiled screen processors, and the like.
At present, the main methods for controlling the meeting place are as follows: according to each meeting place mode, various meeting place devices are manually controlled, for example, in a certain meeting place mode, the brightness of a lamp is respectively controlled, a sound box is turned on, a camera is turned on, and a display is adjusted to a certain height so as to meet meeting requirements. In another meeting place mode, the brightness of the lamp is controlled, the video disc player is started, and the display is adjusted to another height to meet meeting requirements.
The above-mentioned meeting place control method has the following problems: and each controlled device is controlled and adjusted in a manual mode, so that the defects of low control efficiency and large workload of workers are overcome.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a scene interaction control method based on voice recognition, which can effectively solve the problems.
The technical scheme adopted by the invention is as follows:
the invention provides a scene interaction control method based on voice recognition, which comprises the following steps:
step 1, a central control system pre-establishes a snapshot library; the snapshot library stores the corresponding relation between a plurality of snapshot names and snapshot scene execution commands; the central control system controls meeting place equipment by executing the command through the snapshot scene;
the voice recognition control program is in a dormant state which is not awakened at ordinary times to avoid misoperation; at this time, the wakeup words monitor program is continuously in an open state; the main voice monitoring program is in a continuous closing state;
step 2, the awakening word monitoring program monitors in real time and judges whether an awakening word is monitored or not; if the awakening words are not monitored, continuously monitoring; if the awakening word is monitored, executing the step 3;
step 3, the central control system closes the awakening language monitoring program, starts the main voice monitoring program, and further awakens the voice recognition control program of the central control system, and at the moment, the voice recognition control program of the central control system is converted into an activated state;
step 4, a voice recognition control program of the central control system starts a voice recording module, records a voice command from a user through the voice recording module, and stores the recorded voice command; meanwhile, in the process of recording the voice command by the voice recording module, displaying a voice volume waveform by a display module;
step 5, the voice recording module transmits the recorded voice command to a voice recognition module;
step 6, the voice recognition module carries out preliminary voice validity recognition on the voice command, and if the recognition is successful, step 7 is executed; if the identification is not successful, feeding back prompt information of identification failure to the user;
step 7, the voice recognition module performs intention recognition on the voice command, and recognizes one of the following four types: snapshot type, option type, confirm/cancel type, and other types;
if the voice command is of a snapshot type, executing the step 8 through a snapshot type submodule; if the voice command is of the option type, executing step 9 by an option type sub-module; if the voice command is of a confirm/cancel type, executing step 10 by a confirm/cancel type sub-module; if the voice command is of other types, executing step 11 through other types of sub-modules;
and 8: the method for executing the voice command corresponding to the snapshot type through the snapshot type submodule comprises the following steps:
step 8.1, if the voice command is of a snapshot type, obtaining the recognition score of the voice command, judging whether the recognition score exceeds a threshold value, and if not, indicating that the voice command is not clear enough, executing step 8.2; if yes, indicating that the voice command is definite, executing step 8.3;
step 8.2, further judging the uniqueness of the recognition result of the voice command, namely: judging whether the recognition result is a polyphonic condition, if not, indicating that the voice command corresponds to the snapshot name of the only pronunciation in the snapshot library, outputting prompt information for further confirmation of whether to execute the snapshot through a display module, simultaneously recording object information of the confirmation/cancellation in a confirmation/cancellation context configuration table, and then executing subsequent steps by a confirmation/cancellation type submodule; if yes, indicating that the voice command corresponds to polyphonic snapshot names in the snapshot library, forming polyphonic snapshot result sets by the polyphonic snapshot names, displaying the polyphonic snapshot result sets through a display module, recording the option information in an option context configuration table, and executing subsequent steps by the option type submodule;
step 8.3, further judging the uniqueness of the recognition result of the voice command, namely: judging whether the recognition result is a polyphone condition, if not, indicating that the voice command corresponds to the snapshot name of the only pronunciation in the snapshot library, and directly executing the snapshot scene command corresponding to the snapshot name; if yes, indicating that the voice command corresponds to polyphonic snapshot names in the snapshot library, forming each polyphonic snapshot name into a polyphonic snapshot result set, displaying the polyphonic snapshot result set through a display module, recording the option information in an option configuration table, and executing subsequent steps by the option type sub-module;
when any next voice command from the user is executed after the option context configuration table stores the option information, the option context configuration table is emptied;
every time the confirm/cancel context configuration table stores the confirmed/canceled object information, when any next voice command from the user is executed, the confirm/cancel context configuration table is emptied;
step 9, executing the voice command corresponding to the option type by the option type submodule, comprising the following steps:
if the voice command is of an option type, firstly searching the option context configuration table, judging whether the option context configuration table is empty, if not, indicating that the above information corresponding to the voice command exists, and directly executing a corresponding snapshot scene command according to the option information and the voice command stored in the option context configuration table; if the option context configuration table is empty, indicating that the above information does not exist, outputting prompt information of voice recognition failure through a display module;
step 10, executing the voice command corresponding to the confirmation/cancellation type through the confirmation/cancellation type submodule, including the following steps:
if the voice command is of a confirmation/cancellation type, firstly searching the confirmation/cancellation context configuration table, judging whether the confirmation/cancellation context configuration table is empty or not, if not, indicating that the above information corresponding to the voice command exists, and directly executing a corresponding snapshot scene command according to the confirmation/cancellation object information stored in the confirmation/cancellation context configuration table and the voice command; if the confirmation/cancellation context configuration table is empty, indicating that the above information does not exist, outputting prompt information of voice recognition failure through a display module;
step 11, executing the voice command corresponding to other types through the sub-modules of other types, comprising the following steps: and outputting prompt information of voice recognition failure through a display module.
Preferably, a wake-up button is configured, and when the wake-up button is clicked, the voice recognition control program of the central control system is manually woken up from a dormant state to an active state.
Preferably, the outputting of the prompt information of the voice recognition failure through the display module specifically includes: and playing the speech prompt apology statement, and simultaneously outputting the prompt type statement of the speech command.
Preferably, in step 1, the snapshot library established by the central control system is dynamically updated in real time.
Preferably, the control method of the central control system for the meeting place equipment includes: touch and click on the screen, remote control pen key triggering and voice recognition control.
The scene interaction control method based on the voice recognition provided by the invention has the following advantages:
according to the scene interaction control method based on voice recognition, the central control system and the voice recognition technology are combined, the function of controlling the central control system by using the language to replace the traditional input equipment is achieved, and the scene interaction control method based on the voice recognition has the advantage of good user experience.
Drawings
Fig. 1 is a schematic flow chart of a scene interaction control method based on speech recognition according to the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Key term interpretation:
the controlled device: basic equipment for supporting the functions of the field service hall, such as a large screen system, a central air conditioner, a workstation, a sound amplification system, a light management module and the like.
Scene mode (hereinafter snapshot): and performing combined control on the names defined in the service level for each on-site controlled device, such as an emergency mode, a daily monitoring mode and the like.
The central control system: the control system is a device that centrally controls various devices such as sound, light, and electricity. The intelligent control system is applied to multimedia classrooms, multifunctional conference halls, command control centers, intelligent families and the like, and users can use devices such as a button control panel, a computer display, a touch screen, a wireless remote control and the like to control devices such as a projector, a display stand, a video disc player, a video recorder and the like through a computer and central control system software.
In the last two decades, speech recognition technology has advanced significantly, starting to move from the laboratory to the market. It is expected that voice recognition technology will enter various fields such as industry, home appliances, communications, automotive electronics, medical care, home services, consumer electronics, etc. within the next 10 years.
In practical application, the central control system continuously derives various application scenes such as demonstration report, daily monitoring, production scheduling and the like according to business requirements. Users are gradually moving from desktop computers to smaller, mobile terminals that interact with the system.
Therefore, the invention provides a scene interaction control method based on voice recognition, which combines a voice recognition technology with a central control system and relates to a scene interaction control method using voice as a command medium.
The invention utilizes the voice recognition technology to recognize the voice message of the speaker and analyze and process the voice message, if the voice message contains the effective snapshot name, the voice message is sent to the central control system to execute the specific operation, thereby realizing the purpose of liberating the double-hand control equipment.
The method is realized by two parts of hardware and software processing.
1. Hardware deployment
The audio acquisition device is required to be connected to the control terminal of the central control system: the mobile terminal with the Bluetooth function is connected by using a Bluetooth earphone, and both language acquisition and voice feedback are completed by the Bluetooth earphone; the control terminal without the Bluetooth function uses a wired microphone, and the voice feedback needs to be additionally provided with a loudspeaker (a microphone and sound integrated machine or a sound amplifying system which is intervened in a field as required).
2. Software implementation process
The system is divided into a voice recognition module and a central control system, wherein the voice recognition module is a technical point of the system for key application and protection, and the central control system sends a control instruction to target equipment according to a snapshot mapping equipment control protocol and physical link configuration after receiving the instruction of the voice recognition module, so that a required scene mode is quickly obtained, and the control efficiency is improved. The key point of the invention is how to identify the voice from the user and correspond to the snapshot name in the snapshot library; after the snapshot names in the snapshot library are matched, the control instruction can be sent to the target device only by executing the corresponding control instruction.
In the invention, in order to avoid adding a wakeup link for misoperation, a service scene is divided into a wakeup part and a voice recognition part, and a corresponding logic processing method is designed according to different intentions of a user, wherein a specific service judgment flow is shown in fig. 1. For the system, the only operation path of the user is the voice, and the system designs the only service entrance for monitoring the voice spoken by the user.
Referring to fig. 1, a scene interaction control method based on speech recognition includes the following steps:
step 1, a central control system pre-establishes a snapshot library; the snapshot library stores the corresponding relation between a plurality of snapshot names and snapshot scene execution commands; the central control system controls meeting place equipment by executing the command through the snapshot scene; the control mode of the central control system to the meeting place equipment includes but is not limited to: touch and click on the screen, remote control pen key triggering and voice recognition control.
For example: the snapshot names are respectively: the system comprises a conference mode, a benefit mode, a visual scheduling scene, a science and petrochemical scheduling scene, a light full-Kai mode, a light full-desert mode and the like. Each mode corresponds to a set of corresponding instructions for executing each controlled device. In practical application, the snapshot name has the condition of wrongly written or polyphonic characters. For example, the desert in the full desert type of light is a wrongly written word; the two snapshot names of the Huifei mode and the conference mode are in the same Pinyin, but the tone of part of the words is different, and the two snapshot names are considered to be polyphonic snapshot names.
The subsequent scene interaction control method based on the voice recognition can realize the recognition and execution of the snapshot names with wrongly-written or polyphonic characters.
In addition, the snapshot library established by the central control system can be dynamically updated in real time. Namely: the central control system allows the user to create different snapshots according to business needs (each snapshot actually contains inside it pre-control messages for one or more controlled devices) and to customize a name for the snapshot. And the name set of the snapshots is the valid range of the voice control function.
The voice recognition control program is in a dormant state which is not awakened at ordinary times to avoid misoperation; at this time, the wakeup words monitor is continuously in the open state; the main voice monitoring program is in a continuous closing state;
in practical application, a wake-up button may be configured, and when the wake-up button is clicked, the voice recognition control program of the central control system is manually woken up from a dormant state to an active state.
The design principle of the awakening process is as follows:
the system adds a wake-up mechanism to avoid user misoperation (e.g. multiple users are talking and may mention that the system can recognize command statements causing snapshot misoperation), similar to the screen locking/unlocking mechanism of a mobile phone. Namely: when not awakened, the system is similar to the screen locking state; when in the active state, the system is in the unlocked state.
At the same time, only one of the awakening voice monitoring program and the main voice monitoring program is in an open state, and the other is in a closed state.
Step 2, the awakening word monitoring program monitors in real time and judges whether an awakening word is monitored or not; if the awakening words are not monitored, continuously monitoring; if the awakening word is monitored, executing the step 3;
step 3, the central control system closes the awakening language monitoring program, starts the main voice monitoring program, and further awakens the voice recognition control program of the central control system, and at the moment, the voice recognition control program of the central control system is converted into an activated state;
for example, when the central control system is in a dormant state without being awakened, the user speaks a certain word, the word is monitored by the awakening word monitoring program, and then the awakening word monitoring program judges whether the word is an awakening word; wherein, the wake-up word is a word that the system pre-customizes according to the requirement, for example, "small, constant, small and constant"; if the word is a wake-up word, starting a main voice monitoring program; meanwhile, the display module outputs voice waveform feedback to prompt a user that sound is currently captured, and voice output prompts that the voice assistant is started; if the wake word is not recognized, the system does not give any feedback.
Step 4, a voice recognition control program of the central control system starts a voice recording module, records a voice command from a user through the voice recording module, and stores the recorded voice command; meanwhile, in the process of recording the voice command by the voice recording module, displaying a voice volume waveform by a display module;
specifically, when the central control system is in an activated state, a user speaks a voice command by using the Mandarin, and the central control system performs voice recording operation and waveform feedback. The 'voice recording' is to wait for the user to finish the expression of the current voice information and carry out memory storage on the voice information and then carry out analysis on the voice information by a voice recognition module; the waveform feedback is feedback for giving the user the recording quality when the voice information is expressed, if the waveform is not obvious, the voice quality of the user is low, and the user is prompted to increase the volume or close the distance between the voice information and the audio acquisition equipment.
Step 5, the voice recording module transmits the recorded voice command to a voice recognition module;
step 6, the voice recognition module carries out preliminary voice validity recognition on the voice command, and if the recognition is successful, the step 7 is executed; if the identification is not successful, feeding back prompt information of identification failure to the user;
specifically, the voice recording module transmits the recorded voice command to the voice recognition module; the voice recognition module judges whether the information is recognized or not, and if the information is recognized, the subsequent processing is carried out; if not, a voice prompt apology statement is played while the display module outputs a prompt-like statement, such as "you can ask me this \8230;".
Step 7, the voice recognition module performs intention recognition on the voice command, and recognizes one of the following four types: snapshot type, option type, confirm/cancel type, and other types;
if the voice command is of a snapshot type, executing the step 8 through a snapshot type submodule; if the voice command is of the option type, executing step 9 by an option type sub-module; if the voice command is of a confirm/cancel type, executing step 10 by a confirm/cancel type sub-module; if the voice command is of other types, executing step 11 through other types of sub-modules;
in the invention, the effective results of the voice recognition are classified and processed respectively. The classification includes: snapshot, option (polyphonic group option), confirm/cancel (instruction confirmation below threshold), and others (beyond system processing power).
And 8: the method for executing the voice command corresponding to the snapshot type through the snapshot type submodule comprises the following steps:
step 8.1, if the voice command is of a snapshot type, obtaining the recognition score of the voice command, judging whether the recognition score exceeds a threshold value, and if not, indicating that the voice command is not clear enough, executing step 8.2; if yes, indicating that the voice command is definite, executing step 8.3;
step 8.2, further judging the uniqueness of the recognition result of the voice command, namely: judging whether the recognition result is a polyphonic condition, if not, indicating that the voice command corresponds to the snapshot name of the only pronunciation in the snapshot library, outputting prompt information for further confirmation of whether to execute the snapshot through a display module, simultaneously recording object information of the confirmation/cancellation in a confirmation/cancellation context configuration table, and then executing subsequent steps by a confirmation/cancellation type submodule; if yes, indicating that the voice command corresponds to polyphonic snapshot names in the snapshot library, forming each polyphonic snapshot name into a polyphonic snapshot result set, displaying the polyphonic snapshot result set through a display module, recording the option information in an option context configuration table, and executing subsequent steps by the option type sub-module;
step 8.3, further judging the uniqueness of the recognition result of the voice command, namely: judging whether the identification result is a polyphonic condition, if not, indicating that the voice command corresponds to the snapshot name of the only pronunciation in the snapshot library, and directly executing the snapshot scene command corresponding to the snapshot name; if yes, indicating that the voice command corresponds to polyphonic snapshot names in the snapshot library, forming polyphonic snapshot result sets by the polyphonic snapshot names, displaying the polyphonic snapshot result sets through a display module, simultaneously recording the option information in an option configuration table, and executing subsequent steps by the option type submodule;
when any next voice command from the user is executed after the option context configuration table stores the option information, the option context configuration table is emptied;
every time the confirm/cancel context configuration table stores the confirmed/canceled object information, when any next voice command from the user is executed, the confirm/cancel context configuration table is emptied;
specifically, the snapshot type submodule judges that the recognition score of the snapshot type voice command exceeds a threshold value (the threshold value is based on the recognition degree of the voice recognition result), if the recognition score exceeds the threshold value, the system considers that the command is clear, and then carries out subsequent judgment on uniqueness of the recognition result, and if the recognition result is unique, the user does not need to confirm the unique recognition result, and the snapshot execution command can be directly carried out; if the identification result is not unique and indicates that the identification result is a polyphonic snapshot name, pushing a corresponding snapshot name list to the user, and directly carrying out snapshot execution instructions without further confirmation to the user after the user selects the corresponding snapshot name from the list; if the identification degree is lower than the threshold value, the system judges that the instruction is not clear enough, firstly, the judgment of 'identification result is only' is carried out, if the identification result is only, prompt information still needs to be output on a display module, a user is asked to further confirm whether the snapshot is executed, and the system plays a voice prompt to 'whether a certain snapshot is executed'. If it is a polyphonic case, it is processed by the polyphonic processing module (polyphonic processing itself is also a further confirmation action).
Step 9, executing the voice command corresponding to the option type by the option type submodule, comprising the following steps:
if the voice command is of an option type, firstly searching the option context configuration table, judging whether the option context configuration table is empty, if not, indicating that the above information corresponding to the voice command exists, and directly executing a corresponding snapshot scene command according to the option information and the voice command stored in the option context configuration table; if the option context configuration table is empty, indicating that the above information does not exist, outputting prompt information of voice recognition failure through a display module;
specifically, the option type sub-module needs to have a context memory capability, and when a polyphonic snapshot name is encountered, the system performs polyphonic processing to provide a user with a selection of the polyphonic snapshot name set. After the system recognizes that the voice command is of the option type, whether the voice command has the above information or not is judged based on the option context configuration table, if yes, the voice command is matched with the options of the previous round of conversation, the clear options are subjected to snapshot execution, the system voice gives information that the playing and the execution of a certain snapshot are successful, and the display equipment outputs the information that the execution of a certain snapshot is successful. If there is no information above, the system design is considered from the anthropomorphic perspective, and when a topic without support above is considered as an illegal operation, the system plays a voice prompt apology statement and the display device outputs a prompt type statement, such as "you can ask me this 8230;".
Step 10, executing the voice command corresponding to the confirmation/cancellation type through the confirmation/cancellation type submodule, including the following steps:
if the voice command is of a confirm/cancel type, firstly searching the confirm/cancel context configuration table, judging whether the confirm/cancel context configuration table is empty, if not, indicating that the above information corresponding to the voice command exists, and directly executing a corresponding snapshot scene command according to the confirm/cancel object information and the voice command stored in the confirm/cancel context configuration table; if the confirmation/cancellation context configuration table is empty, indicating that the above information does not exist, outputting prompt information of voice recognition failure through a display module;
specifically, the confirm/cancel type sub-module is also a context processing condition and is used for processing the result secondary confirmation lower than the set threshold of the speech recognition degree, after the system recognizes the confirm/cancel type, the system firstly judges whether the above information exists on the basis of the confirm/cancel context configuration table, if the above information exists, the snapshot is executed, the system speech gives the information that the playing and the execution of a certain snapshot are successful, and the display device outputs the information that the execution of a certain snapshot is successful. If there is no information above, the system design is considered from the anthropomorphic perspective, and when a topic without the above support is considered as an illegal operation, the system plays a voice prompt apology statement and the display device outputs a prompt-like statement such as "you can ask me for 8230this way".
Step 11, executing the voice command corresponding to other types through the sub-modules of other types, comprising the following steps: and outputting prompt information of voice recognition failure through a display module.
Specifically, when the voice command is of another type, the system is designed to be beyond the system processing capacity, and the unified processing is as follows: the system plays a speech prompting apology statement, and the display device outputs a prompting statement such as 'you can ask me this way' \8230;).
In each step of the invention, when the prompt message of speech recognition failure is output through the display module, the prompt message is specifically as follows: and playing the speech prompt apology statement, and simultaneously outputting the prompt type statement of the speech command.
In the invention, in the process of identifying the intention, when the intention of the current user is judged to be the snapshot type, the judgment of 'unique identification result' is carried out, if the result is 'yes', the result is the snapshot name with unique pronunciation in a snapshot library, the snapshot is directly executed, the system voice gives information of successful playing and executing of a certain snapshot, and the display equipment outputs the information of successful executing of a certain snapshot; if no, the result is not the snapshot name of the unique pronunciation in the snapshot library, the system lists all the snapshot sets of the pronunciation through the display equipment to allow the user to select, and voice feedback output is provided to prompt the user, such as "please speak or click the item number according to the screen display". For example, when it is recognized that the voice command from the user is "benefit mode", the system searches the snapshot set as "benefit mode, conference mode 1, conference mode 2", then the system displays "benefit mode, conference mode 1, conference mode 2" in the form of a list, and if the user selects to execute conference mode 1, then "conference mode 1" is clicked, and the system executes the snapshot scene execution command corresponding to "conference mode 1".
The scene interaction control method based on the voice recognition provided by the invention has the following characteristics:
1. the system closes the voice monitoring program in time after completing certain tasks, avoids misoperation of the system caused by complex sound of a field environment, and enables a user to rapidly wake up the intelligent voice recognition system in a sound or manual clicking mode.
Under the activation mode of the voice control system, a user speaks a voice command containing an effective snapshot name to control the field device, such as speaking a 'monitoring starting mode', 'meeting executing mode', 'opening mode one') and the like, and the method has the advantage of simple operation of the user, so that the use experience of the user is improved;
2. a voice feedback mechanism: the method pays attention to the user interaction experience, provides effective feedback information for various situations spoken by the user in use, and guides the user to correctly use the voice control system.
Specifically, the method comprises the following steps:
1) High-quality speech: when the system receives a clear, standard voice command, the system will directly perform the task intended by the user as a secure, identifiable command.
2) If the recognized voice quality is lower than the excellent voice score, the system firstly confirms the voice recognition task and the user, and the user can manually click or speak a prompt command word to confirm or cancel the task in order to ensure the safety of the field device and other problems.
3) Command of polyphone: when a polyphonic mode in the system is identified, the system feeds back to the user options, and asks the user to manually click or speak the options.
4) Out of valid scene keywords range: for the case that the user speaks out of knowledge (cannot understand), the system will have a prompt page to guide the example sentence that the user can refer to.
3. Context capabilities: the method can temporarily store the recognition result in the memory of the computer under the condition of coping with the multiple recognition results (polyphones) for the user to answer the confirmation message and then carry out subsequent processing, thereby realizing the language communication capability of simulating the context of the human.
4. The system supports dynamic loading of newly-built snapshot name keywords. When a user creates a snapshot and names the mode, and the snapshot is stored successfully, the voice recognition function of the snapshot is immediately supported by the voice recognition system.
Therefore, the scene interaction control method based on the voice recognition combines the central control system and the voice recognition technology, realizes the function of controlling the central control system by replacing the traditional input equipment with the language, and has the advantage of good user experience.
The method is an auxiliary interactive means, adopts the most original expression voice of human as a message transmission medium, and is one of the best modes of human-computer interaction. The combination of voice recognition and a central control system can further improve the construction value and the technological sense of a commercial control site.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, many modifications and adaptations can be made without departing from the principle of the present invention, and such modifications and adaptations should also be considered to be within the scope of the present invention.

Claims (1)

1. A scene interaction control method based on voice recognition is characterized by comprising the following steps:
step 1, a central control system pre-establishes a snapshot library; the snapshot library stores the corresponding relation between a plurality of snapshot names and snapshot scene execution commands; the central control system controls meeting place equipment by executing the command through the snapshot scene;
the voice recognition control program is in a dormant state which is not awakened at ordinary times to avoid misoperation; at this time, the wakeup words monitor program is continuously in an open state; the main voice monitoring program is in a continuous closing state;
step 2, the awakening word monitoring program monitors in real time and judges whether an awakening word is monitored or not; if the awakening words are not monitored, continuously monitoring; if the awakening words are monitored, executing the step 3;
step 3, the central control system closes the awakening language monitoring program, starts the main voice monitoring program, and further awakens the voice recognition control program of the central control system, and at the moment, the voice recognition control program of the central control system is converted into an activated state;
step 4, a voice recognition control program of the central control system starts a voice recording module, records a voice command from a user through the voice recording module, and stores the recorded voice command; meanwhile, in the process of recording the voice command by the voice recording module, displaying a voice volume waveform by a display module;
step 5, the voice recording module transmits the recorded voice command to a voice recognition module;
step 6, the voice recognition module carries out preliminary voice validity recognition on the voice command, and if the recognition is successful, step 7 is executed; if the identification is not successful, feeding back prompt information of identification failure to the user;
step 7, the voice recognition module performs intention recognition on the voice command, and recognizes one of the following four types: snapshot type, option type, confirm/cancel type, and other types;
if the voice command is of a snapshot type, executing the step 8 through a snapshot type submodule; if the voice command is of the option type, executing step 9 by an option type sub-module; if the voice command is of a confirm/cancel type, executing step 10 by a confirm/cancel type sub-module; if the voice command is of other types, executing step 11 through other types of sub-modules;
and 8: the method for executing the voice command corresponding to the snapshot type through the snapshot type submodule comprises the following steps:
step 8.1, if the voice command is of a snapshot type, obtaining a recognition score of the voice command, judging whether the recognition score exceeds a threshold value, and if not, indicating that the voice command is not clear enough, executing step 8.2; if yes, indicating that the voice command is definite, executing step 8.3;
step 8.2, further judging the uniqueness of the recognition result of the voice command, namely: judging whether the recognition result is a polyphonic condition, if not, indicating that the voice command corresponds to the snapshot name of the only pronunciation in the snapshot library, outputting prompt information for further confirmation of whether to execute the snapshot through a display module, simultaneously recording object information of the confirmation/cancellation in a confirmation/cancellation context configuration table, and then executing subsequent steps by a confirmation/cancellation type submodule; if yes, indicating that the voice command corresponds to polyphonic snapshot names in the snapshot library, forming each polyphonic snapshot name into a polyphonic snapshot result set, displaying the polyphonic snapshot result set through a display module, recording the option information in an option context configuration table, and executing subsequent steps by the option type sub-module;
and 8.3, further judging the uniqueness of the recognition result of the voice command, namely: judging whether the identification result is a polyphonic condition, if not, indicating that the voice command corresponds to the snapshot name of the only pronunciation in the snapshot library, and directly executing the snapshot scene command corresponding to the snapshot name; if yes, indicating that the voice command corresponds to polyphonic snapshot names in the snapshot library, forming each polyphonic snapshot name into a polyphonic snapshot result set, displaying the polyphonic snapshot result set through a display module, recording the option information in an option configuration table, and executing subsequent steps by the option type sub-module;
when the option context configuration table stores option information, the option context configuration table is emptied after any next voice command from the user is executed;
every time when the affirmation/cancellation context configuration table stores affirmation/cancellation object information, when any next voice command from the user is executed, the affirmation/cancellation context configuration table is emptied;
specifically, the snapshot type sub-module performs 'identification score exceeding threshold' on the voice command of the snapshot type, the threshold is determined according to the identification degree of the 'voice identification' result, if the identification score exceeds the threshold, the system considers that the command is clear, performs subsequent 'identification result uniqueness' determination, and if the identification result is unique, the user does not need to confirm the unique identification result, and the snapshot execution command can be directly performed; if the identification result is not unique and indicates that the identification result is a polyphonic snapshot name, pushing a corresponding snapshot name list to the user, and directly carrying out snapshot execution instructions without further confirmation to the user after the user selects the corresponding snapshot name from the list; if the identification degree is lower than the threshold value, the system judges that the instruction is not clear enough, firstly, the judgment of 'only identification result' is carried out, if the identification result is only, prompt information still needs to be output on a display module, a user is asked to further confirm whether to execute the snapshot, and the system plays a voice prompt to 'whether to execute a certain snapshot'; if the condition is polyphone, the polyphone is processed by a polyphone processing module;
step 9, executing the voice command corresponding to the option type by the option type sub-module, including the following steps:
if the voice command is of an option type, firstly searching the option context configuration table, judging whether the option context configuration table is empty or not, if not, indicating that the above information corresponding to the voice command exists, and directly executing a corresponding snapshot scene command according to the option information and the voice command stored in the option context configuration table; if the option context configuration table is empty, indicating that the above information does not exist, outputting prompt information of voice recognition failure through a display module;
specifically, the option type sub-module needs to have context memory capability, and when a polyphonic snapshot name is encountered, the system performs polyphonic processing to provide a user with selection of a polyphonic snapshot name set; after the system identifies that the voice command is of an option type, whether the voice command has the above information or not is judged based on an option context configuration table, if yes, the voice command is matched with the options of the previous round of conversation, the clear options are subjected to snapshot execution, the voice of the system gives information that the playing and the execution of a certain snapshot are successful, and the display equipment outputs the information that the execution of a certain snapshot is successful; if the above information does not exist, the system design is considered from the anthropomorphic perspective, and when one topic which is not supported by the above information exists, the situation is considered as illegal operation, the system plays a speech prompt apology statement, and meanwhile, the display equipment outputs a prompt type statement;
step 10, executing the voice command corresponding to the confirmation/cancellation type through the confirmation/cancellation type sub-module, including the following steps:
if the voice command is of a confirmation/cancellation type, firstly searching the confirmation/cancellation context configuration table, judging whether the confirmation/cancellation context configuration table is empty or not, if not, indicating that the above information corresponding to the voice command exists, and directly executing a corresponding snapshot scene command according to the confirmation/cancellation object information stored in the confirmation/cancellation context configuration table and the voice command; if the confirmation/cancellation context configuration table is empty, indicating that the above information does not exist, outputting prompt information of voice recognition failure through a display module;
specifically, the confirm/cancel type sub-module is also a context processing condition and is used for processing the secondary confirmation of the result lower than the set threshold of the speech recognition degree, after the system recognizes the confirm/cancel type, the system firstly judges whether the above information exists based on the confirm/cancel context configuration table, if the above information exists, the snapshot is executed, the system speech gives the information that the playing and the execution of a certain snapshot are successful, and the display device outputs the information that the execution of a certain snapshot is successful; if the above information does not exist, the system design is considered from the anthropomorphic perspective, and when one topic which is not supported by the above information exists, the situation is considered as illegal operation, the system plays a speech prompt apology statement, and meanwhile, the display equipment outputs a prompt type statement;
step 11, executing the voice command corresponding to other types through the sub-modules of other types, comprising the following steps: outputting prompt information of voice recognition failure through a display module;
when the wake-up button is clicked, manually waking up a voice recognition control program of the central control system from a dormant state to an activated state;
the prompt message of the voice recognition failure output through the display module specifically comprises: playing a speech prompting apology statement, and simultaneously outputting a prompting statement for replacing the speech command;
in step 1, a snapshot library established by the central control system is dynamically updated in real time;
wherein, the control mode of the central control system to the meeting place equipment comprises the following steps: touch screen clicking, remote control pen button triggering and voice recognition control.
CN201811581756.4A 2018-12-24 2018-12-24 Scene interaction control method based on voice recognition Active CN109616111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811581756.4A CN109616111B (en) 2018-12-24 2018-12-24 Scene interaction control method based on voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811581756.4A CN109616111B (en) 2018-12-24 2018-12-24 Scene interaction control method based on voice recognition

Publications (2)

Publication Number Publication Date
CN109616111A CN109616111A (en) 2019-04-12
CN109616111B true CN109616111B (en) 2023-03-14

Family

ID=66011357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811581756.4A Active CN109616111B (en) 2018-12-24 2018-12-24 Scene interaction control method based on voice recognition

Country Status (1)

Country Link
CN (1) CN109616111B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291281B (en) * 2019-07-09 2023-11-03 钉钉控股(开曼)有限公司 Voice broadcasting and voice broadcasting content setting method and device
CN111128160B (en) * 2019-12-19 2024-04-09 中国平安财产保险股份有限公司 Receipt modification method and device based on voice recognition and computer equipment
CN111176607A (en) * 2019-12-27 2020-05-19 国网山东省电力公司临沂供电公司 Voice interaction system and method based on power business
CN111554285A (en) * 2020-04-26 2020-08-18 三一重机有限公司 Voice control system and control method thereof
CN111897916B (en) * 2020-07-24 2024-03-19 惠州Tcl移动通信有限公司 Voice instruction recognition method, device, terminal equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8219407B1 (en) * 2007-12-27 2012-07-10 Great Northern Research, LLC Method for processing the output of a speech recognizer
CN102800315A (en) * 2012-07-13 2012-11-28 上海博泰悦臻电子设备制造有限公司 Vehicle-mounted voice control method and system
CN103903619A (en) * 2012-12-28 2014-07-02 安徽科大讯飞信息科技股份有限公司 Method and system for improving accuracy of speech recognition
CN103943105A (en) * 2014-04-18 2014-07-23 安徽科大讯飞信息科技股份有限公司 Voice interaction method and system
CN104715754A (en) * 2015-03-05 2015-06-17 北京华丰亨通科贸有限公司 Method and device for rapidly responding to voice commands
CN105609105A (en) * 2014-11-13 2016-05-25 现代自动车株式会社 Speech recognition system and speech recognition method
CN105786880A (en) * 2014-12-24 2016-07-20 中兴通讯股份有限公司 Voice recognition method, client and terminal device
CN105869634A (en) * 2016-03-31 2016-08-17 重庆大学 Field-based method and system for feeding back text error correction after speech recognition
US9424840B1 (en) * 2012-08-31 2016-08-23 Amazon Technologies, Inc. Speech recognition platforms
CN106710585A (en) * 2016-12-22 2017-05-24 上海语知义信息技术有限公司 Method and system for broadcasting polyphonic characters in voice interaction process
CN107615377A (en) * 2015-10-05 2018-01-19 萨万特系统有限责任公司 The key phrase suggestion based on history for the Voice command of domestic automation system
CN107705787A (en) * 2017-09-25 2018-02-16 北京捷通华声科技股份有限公司 A kind of audio recognition method and device
CN108564940A (en) * 2018-03-20 2018-09-21 平安科技(深圳)有限公司 Audio recognition method, server and computer readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10203762B2 (en) * 2014-03-11 2019-02-12 Magic Leap, Inc. Methods and systems for creating virtual and augmented reality
CN107272887A (en) * 2017-05-17 2017-10-20 四川新网银行股份有限公司 A kind of method that client scene interactivity is realized based on augmented reality

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8219407B1 (en) * 2007-12-27 2012-07-10 Great Northern Research, LLC Method for processing the output of a speech recognizer
CN102800315A (en) * 2012-07-13 2012-11-28 上海博泰悦臻电子设备制造有限公司 Vehicle-mounted voice control method and system
US9424840B1 (en) * 2012-08-31 2016-08-23 Amazon Technologies, Inc. Speech recognition platforms
CN103903619A (en) * 2012-12-28 2014-07-02 安徽科大讯飞信息科技股份有限公司 Method and system for improving accuracy of speech recognition
CN103943105A (en) * 2014-04-18 2014-07-23 安徽科大讯飞信息科技股份有限公司 Voice interaction method and system
CN105609105A (en) * 2014-11-13 2016-05-25 现代自动车株式会社 Speech recognition system and speech recognition method
CN105786880A (en) * 2014-12-24 2016-07-20 中兴通讯股份有限公司 Voice recognition method, client and terminal device
CN104715754A (en) * 2015-03-05 2015-06-17 北京华丰亨通科贸有限公司 Method and device for rapidly responding to voice commands
CN107615377A (en) * 2015-10-05 2018-01-19 萨万特系统有限责任公司 The key phrase suggestion based on history for the Voice command of domestic automation system
CN105869634A (en) * 2016-03-31 2016-08-17 重庆大学 Field-based method and system for feeding back text error correction after speech recognition
CN106710585A (en) * 2016-12-22 2017-05-24 上海语知义信息技术有限公司 Method and system for broadcasting polyphonic characters in voice interaction process
CN107705787A (en) * 2017-09-25 2018-02-16 北京捷通华声科技股份有限公司 A kind of audio recognition method and device
CN108564940A (en) * 2018-03-20 2018-09-21 平安科技(深圳)有限公司 Audio recognition method, server and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
语音合成中多音字识别的实现;周海涛;《科技资讯》;20080430(第11期);全文 *

Also Published As

Publication number Publication date
CN109616111A (en) 2019-04-12

Similar Documents

Publication Publication Date Title
CN109616111B (en) Scene interaction control method based on voice recognition
US9953648B2 (en) Electronic device and method for controlling the same
US11354089B2 (en) System and method for dialog interaction in distributed automation systems
JP6516585B2 (en) Control device, method thereof and program
KR101726945B1 (en) Reducing the need for manual start/end-pointing and trigger phrases
US11282519B2 (en) Voice interaction method, device and computer readable storage medium
US10811008B2 (en) Electronic apparatus for processing user utterance and server
WO2017012511A1 (en) Voice control method and device, and projector apparatus
WO2020029500A1 (en) Voice command customization method, device, apparatus, and computer storage medium
WO2016052018A1 (en) Home appliance management system, home appliance, remote control device, and robot
CN109240107B (en) Control method and device of electrical equipment, electrical equipment and medium
CN105323648A (en) Method for closed captioning and electronic device
US10540973B2 (en) Electronic device for performing operation corresponding to voice input
CN109920416A (en) Voice control method, device, storage medium and control system
CN114172757A (en) Server, intelligent home system and multi-device voice awakening method
CN114067798A (en) Server, intelligent equipment and intelligent voice control method
CN108648754A (en) Sound control method and device
CN109215642A (en) Processing method, device and the electronic equipment of man-machine conversation
WO2024103926A1 (en) Voice control methods and apparatuses, storage medium, and electronic device
JP2021530130A (en) Methods and equipment for managing holds
WO2020135773A1 (en) Data processing method, device, and computer-readable storage medium
CN104423992A (en) Speech recognition startup method for display
US11516346B2 (en) Three-way calling terminal for mobile human-machine coordination calling robot
CN116802602A (en) Hot word group
CN116566760B (en) Smart home equipment control method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant