CN110767232A - Speech recognition control method and device, computer equipment and computer storage medium - Google Patents

Speech recognition control method and device, computer equipment and computer storage medium Download PDF

Info

Publication number
CN110767232A
CN110767232A CN201910931524.5A CN201910931524A CN110767232A CN 110767232 A CN110767232 A CN 110767232A CN 201910931524 A CN201910931524 A CN 201910931524A CN 110767232 A CN110767232 A CN 110767232A
Authority
CN
China
Prior art keywords
word
words
voice text
target
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910931524.5A
Other languages
Chinese (zh)
Other versions
CN110767232B (en
Inventor
周阳
徐宇垚
马秦宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Original Assignee
Shenzhen Heertai Home Furnishing Online Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Heertai Home Furnishing Online Network Technology Co Ltd filed Critical Shenzhen Heertai Home Furnishing Online Network Technology Co Ltd
Priority to CN201910931524.5A priority Critical patent/CN110767232B/en
Publication of CN110767232A publication Critical patent/CN110767232A/en
Application granted granted Critical
Publication of CN110767232B publication Critical patent/CN110767232B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Abstract

The application relates to a voice recognition control method and device, computer equipment and a computer storage medium. The method comprises the following steps: acquiring a device list in a control range, wherein the device list comprises device words and corresponding device physical addresses; the equipment words comprise equipment category words and/or equipment nickname words, and the priority of the equipment nickname words is higher than that of the equipment category words; then, performing word segmentation processing on the obtained initial voice text, and extracting equipment words and control words in the initial voice text; determining effective device words matched with the device words in the initial voice text according to the device list, determining the effective device words with the highest priority from the effective device words as target device words corresponding to the device words, and determining device physical addresses corresponding to the target device words from the device list; and generating a control instruction according to a target device word corresponding to the last device word in the initial voice text, the physical address of the target device, the last control word in the initial voice text and the control instruction word list.

Description

Speech recognition control method and device, computer equipment and computer storage medium
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a speech recognition control method and apparatus, a computer device, and a computer storage medium.
Background
The statements herein merely provide background information related to the present application and may not necessarily constitute prior art.
With the rapid development of voice technology, the internet of things equipment control solution using voice as an entrance develops rapidly. However, the speech recognition accuracy of the current speech control smart home solution is limited, when the smart home is controlled to start, similar devices are easily and mistakenly triggered to be simultaneously turned on, for example, all lamps in a living room are controlled to be turned on, the speech recognition control precision is low, the error is large, resources are wasted, and the requirements of users cannot be met.
Disclosure of Invention
In view of the above, it is desirable to provide a voice recognition control method and apparatus, a computer device, and a computer storage medium, which address the problem of low voice recognition control accuracy.
The embodiment of the invention provides a voice recognition control method, which comprises the following steps:
acquiring an initial voice text, wherein the initial voice text is generated after voice recognition is carried out on user voice;
acquiring a device list in a control range, wherein the device list comprises device words and device physical addresses corresponding to the device words; the equipment words comprise equipment category words and/or equipment nickname words, and the priority of the equipment nickname words is higher than that of the equipment category words;
performing word segmentation processing on the initial voice text, and extracting equipment words and control words in the initial voice text;
determining effective device words matched with the device words in the initial voice text according to the device list, determining target device words corresponding to the device words in the initial voice text from the effective device words, and determining device physical addresses corresponding to the target device words from the device list; the target equipment word refers to the equipment word with the highest priority in the effective equipment words;
generating a control instruction according to a target device word corresponding to the last device word in the initial voice text, a physical address of the target device, the last control word in the initial voice text and a control instruction word list; the target device physical address is a device physical address corresponding to a target device word corresponding to the last device word, the control instruction word list is used for representing the relation between the initial voice text and the control instruction, and the control instruction is used for indicating the device corresponding to the last target device word to execute the action corresponding to the control word.
In one embodiment, after the step of performing word segmentation processing on the initial voice text and extracting device words and control words in the initial voice text, and before the step of determining valid device words matching the device words in the initial voice text according to the device list, the method further includes:
standardizing the device words and the control words in the initial voice text according to the device words and the control words extracted from the initial voice text and a pre-stored word meaning standardization dictionary to obtain standardized device words and standardized control words, and correspondingly replacing the device words and the control words in the initial voice text with the standardized device words and the standardized control words to obtain a standardized voice text;
the initial phonetic text is updated to a normalized phonetic text.
In one embodiment, the step of performing the word segmentation process on the initial speech text further comprises:
performing field intention recognition according to the initial voice text;
and if the device control field is determined, executing a step of performing word segmentation processing on the initial voice text.
In one embodiment, the device list further includes device location information; the steps of generating the control instruction by the target device word corresponding to the last device word in the initial voice text, the physical address of the target device, the last control word in the initial voice text and the word list of the control instruction further include:
matching device position information corresponding to the target device word according to the device position information in the device list;
and if the equipment corresponding to the target equipment word is judged not to be in the control range according to the equipment position information corresponding to the target equipment word, generating and sending a first type reply language, wherein the first type reply language is used for prompting that the equipment controlled by the voice is not in the control range.
In one embodiment, the speech recognition control method further comprises:
if a control instruction is not generated according to a target device word corresponding to the last device word in the initial voice text, a physical address of the target device, the last control word in the initial voice text and a control instruction word list, generating and sending a second type reply word, wherein the second type reply word is used for prompting a user to provide new voice;
acquiring new voice provided by a user;
converting the new voice into a new voice text;
fusing the initialized voice text and the new voice text to generate a fused voice text;
and updating the initial voice text into a fused voice text, and executing the step of segmenting the initial voice text.
In one embodiment, the step of segmenting the initial speech text is preceded by the steps of:
correcting and updating the initial voice text in a mode of replacing error words by correct words according to the sequence of the lengths of the error words from long to short and according to the error words and the correct words in a pre-stored error correction word list in turn so as to generate a corrected voice text;
updating the initial voice text into an error correction voice text;
wherein, the words in the same position in the initial voice text are corrected and updated only once; the error correction word list comprises a plurality of key value pairs, wherein the keys are error words, and the values are correct words.
In one embodiment, error words in the error correction word list are arranged from long to short;
according to the sequence from long to short of the length of the error word, the method corrects and updates the initial voice text in a mode of replacing the error word by the correct word according to each error word and the correct word in a pre-stored error correction word list in sequence to generate a corrected voice text, and comprises the following steps:
comparing the error words at the target position in the error correction word list with the initial voice text to determine the target error words in the initial voice text; wherein, the initial value of the target position is the head position of the error correction word list;
searching a correct word corresponding to the target error word in the error correction word list, and replacing the target error word with the corresponding correct word;
updating the initial voice text into a text formed after replacement, updating the target position into a next table position, and comparing error words at the target position in the error correction word table with the initial voice text to determine the target error words in the initial voice text;
and updating the target position to the tail position of the error correction word list, and then replacing the updated and replaced voice text to form a corrected voice text.
In one embodiment, the step of comparing the error word at the target position in the error correction word list with the initial voice text and determining the target error word in the initial voice text comprises the steps of:
acquiring each error word in the error correction word list, and converting each error word into a pinyin character string;
converting the initial voice text into a pinyin character string to be corrected;
comparing the pinyin character string converted by the error word at the target position in the error correction word list with the pinyin character string to be corrected to obtain an intersection of a plurality of pinyin characters;
and determining the maximum substring in each pinyin character intersection as a target error word.
In one embodiment, after the step of determining the largest substring in each pinyin character intersection as the target error word, the method further includes:
recording the position information of the pinyin character corresponding to the target error word in the pinyin character string to be corrected;
the steps of searching for the correct word corresponding to the target error word in the error correction word list and replacing the target error word with the corresponding correct word comprise:
and if the position information recorded at present is different from the position information recorded at the previous time, searching a correct word corresponding to the target error word in the error correction word list, and replacing the target error word with the corresponding correct word.
A speech recognition control apparatus includes:
the voice text acquisition module is used for acquiring an initial voice text, wherein the initial voice text is generated after voice recognition is carried out on user voice;
the device list acquisition module is used for acquiring a device list in the control range, wherein the device list comprises device words and device physical addresses corresponding to the device words; the equipment words comprise equipment category words and/or equipment nickname words, and the priority of the equipment nickname words is higher than that of the equipment category words;
the word segmentation module is used for carrying out word segmentation processing on the initial voice text and extracting equipment words and control words in the initial voice text;
the analysis module is used for determining effective equipment words matched with the equipment words in the initial voice text according to the equipment list, determining target equipment words corresponding to the equipment words in the initial voice text from the effective equipment words, and determining equipment physical addresses corresponding to the target equipment words from the equipment list; the target equipment word refers to the equipment word with the highest priority in the effective equipment words;
the control instruction generating module is used for generating a control instruction according to a target device word corresponding to the last device word in the initial voice text, a physical address of the target device, the last control word in the initial voice text and a control instruction word list; the target device physical address is a device physical address corresponding to a target device word corresponding to the last device word, the control instruction word list is used for representing the relation between the initial voice text and the control instruction, and the control instruction is used for indicating the device corresponding to the last target device word to execute the action corresponding to the control word.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the speech recognition control method are implemented when the processor executes the program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned speech recognition control method.
One or more embodiments provided by the embodiments of the present application have at least the following beneficial effects: the voice recognition control method comprises the steps of setting priorities of equipment words, determining the equipment word with the highest priority from effective equipment words corresponding to the equipment words obtained by word segmentation in an initial voice text as a target equipment word for realizing subsequent voice control, reducing the probability of false triggering of other equipment of the same type during voice recognition control, determining the target equipment with pertinence, and reserving a control word and the effective equipment word behind an initialized voice text to generate a control instruction to avoid ambiguity and instruction conflict during generation of the control instruction, so that the accuracy and reliability of voice recognition control are improved.
Drawings
FIG. 1 is a diagram of an exemplary implementation of a speech recognition control method;
FIG. 2 is a flow diagram illustrating a speech recognition control method according to one embodiment;
FIG. 3 is a flow chart illustrating a speech recognition control method according to another embodiment;
FIG. 4 is a flow chart illustrating a speech recognition control method according to another embodiment;
FIG. 5 is a block diagram showing the structure of a speech recognition control device according to an embodiment;
FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "connected" to another element, it can be directly connected to the other element and be integral therewith, or intervening elements may also be present. The terms "mounted," "one end," "the other end," and the like are used herein for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The voice Recognition control method provided by the present application can be applied to the application scenario shown in fig. 1, and specifically includes a radio device 101, a terminal 102, and a controlled device 103, where the radio device 101 is configured to collect user voice in a surrounding environment, and also can perform voice Recognition (Automatic Speech Recognition) on the collected user voice to convert the collected user voice into a voice text, the radio device 101 sends the converted voice text to the terminal 102, the terminal 102 performs natural language processing on the received voice text, generates and sends a control command to the controlled device 103 according to a processing result, and drives the controlled device 103 to execute an action corresponding to the control command, thereby implementing voice control. The radio device 101 may be an intelligent sound, the terminal 102 may be, but not limited to, various controllers, intelligent sounds with a data processing function, a cloud computer, a personal computer, a notebook computer, a smart phone, a tablet computer, and the like, and the terminal 102 may also be an independent server or a server cluster formed by a plurality of servers. Specifically, under the application scenario that the radio device 101 is an intelligent sound box, the terminal 102 is a cloud server, and the controlled device 103 is an intelligent home device: after acquiring the voice of the user, the smart sound 101 converts the voice into a voice text and transmits the voice text to the cloud server 102, the cloud server 102 performs semantic matching, generates a control instruction and sends the control instruction to the smart home device 103 which the user wants to control, and instructs the smart home device to execute a corresponding action. However, in the implementation process, the inventor finds that false triggering often occurs when semantic recognition control is performed, for example, a user wants to turn on a certain lamp, but the voice recognition control result is that all lamps are turned on, the accuracy of voice recognition is low, false control of smart home is caused, and electric energy is wasted.
For the problem of low accuracy of speech recognition control, as shown in fig. 2, an embodiment of the present invention provides a speech recognition control method, which can be applied to a terminal 102 in an application scenario shown in fig. 1, and the method includes:
s20: acquiring an initial voice text, wherein the initial voice text is generated after voice recognition is carried out on user voice;
s40: acquiring a device list in a control range, wherein the device list comprises device words and device physical addresses corresponding to the device words; the equipment words comprise equipment category words and/or equipment nickname words, and the priority of the equipment nickname words is higher than that of the equipment category words;
s60: performing word segmentation processing on the initial voice text, and extracting equipment words and control words in the initial voice text;
s80: determining effective device words matched with the device words in the initial voice text according to the device list, determining target device words corresponding to the device words in the initial voice text from the effective device words, and determining device physical addresses corresponding to the target device words from the device list; the target equipment word refers to the equipment word with the highest priority in the effective equipment words;
s90: generating a control instruction according to a target device word corresponding to the last device word in the initial voice text, a physical address of the target device, the last control word in the initial voice text and a control instruction word list; the target device physical address is a device physical address corresponding to a target device word corresponding to the last device word, the control instruction word list is used for representing the relation between the initial voice text and the control instruction, and the control instruction is used for indicating the device corresponding to the last target device word to execute the action corresponding to the control word.
The initial speech text refers to a text generated by converting the collected sound by the sound receiving device 101 shown in fig. 1. The equipment category is a category obtained by dividing according to the equipment function, the equipment category words are words capable of representing various equipment functions, and the equipment category words corresponding to various categories are not repeatable. The control range refers to a range preset by a user according to an application scenario, and may be a range in which the sound receiving apparatus 101 collects sound. The nickname words of the devices are words which cannot be repeated in the control range, can be personalized by a user in advance, can be used for distinguishing the names of each device and other devices in the same device category, and contain more information than the device category words. Matching means that the characters included by the device words in the initial voice text and the precedence position relationship among the characters are the same as the characters included by the device words in the device list and the precedence position relationship among the characters.
In order to improve accuracy of voice recognition control, the voice recognition control method provided in the embodiment of the present application requests a device list (which may be a device list in a sound control range) in a control range from a device open platform (such as a cloud server), where the device list may include a device nickname word (such as a music smart lamp), a device nickname word (such as a bedside lamp), a device category word (such as a lamp), device location information (such as a bedroom), a device physical address, and other fields, and the device physical address is also referred to as a mac (media access control) address or a hardware address, and is used to confirm an address of a device location. The device category words, the device nickname words and other device words correspond to the device location information and the device physical addresses, the device location information and the device physical addresses can be determined by determining the device words, the device category words, the device nickname words, the device location information and the like in the device list can be preset and obtained according to the specific arrangement situation and model of the intelligent home to be controlled, and the device physical addresses can be distributed by a router, a home gateway, a server and the like. If the initialized voice text is represented in the table form, the initialized voice text in the table form can be spliced into a complete text character string by using a blank space, so that subsequent data processing is facilitated.
According to the preset matching priority, for example, the control object is preferentially determined according to the nickname word of the device in the effective device words obtained after the device words are matched, and if the device type word is the nickname word, such as turning on a bedside lamp, whether the text contains the nickname of the bedside lamp can be preferentially matched, and if the device type word is contained, such as turning on a lamp, can be matched. The specific matching process may be that, acquiring the device nickname words in the device list, converting the device nickname words into pinyin character strings, sequentially comparing the pinyin characters converted by the device nickname words with the pinyin characters converted by the initialized voice text, taking intersections, determining the largest character string in each intersection and the device nickname words, and similarly, matching the device category words, device location information, and the like in the above manner (for example, if a bedside lamp is a device nickname word, and the device nickname word has the highest priority, in the matched effective device words "bedside lamp" and "lamp", the bedside lamp "will be used as the target device word, so that all devices of the category that the user wants to turn on the lamp will not be mistakenly recognized), for each device word obtained after each word segmentation, if the device nickname word is matched, the device nickname word will be used as the effective device word, if the equipment nickname words are not matched (the equipment nickname words are not matched), only the equipment category words are matched, and the equipment category words are reserved as target equipment words. In order to realize the control of each device, the initialized voice text is generally required to be processed into a single device word (device nickname word/device category word) + control instruction, when intelligent sentence-breaking processing is not performed, in order to prevent ambiguity and instruction conflict, the last control word and the last target device word near the tail of a character string in the initialized voice text (character string) and the device physical address corresponding to the target device word can be reserved, the target device word and the control word are matched with a word list of the control instruction words, the device physical address is combined to generate a control instruction (machine instruction) which can be used for transmission, addressing is performed according to the device physical address, and the control instruction is accurately transmitted to the device which needs to be controlled. For example, the air conditioner turn-off light and the curtain are turned on, and finally the generated control command is retained as the curtain is turned off. Optionally, the process of searching for an effective device word in the device list, which is matched with each device word in the initial voice text, may be to search for a device nickname word in the device word list first, and then search for a priority order of device category words in the device word list to perform matching search.
According to the voice recognition control method provided by the embodiment of the application, the device words are subjected to priority setting, the device word with the highest priority is determined to be used as the target device word in the effective device words corresponding to the device words obtained by word segmentation in the initial voice text, the target device word is used for subsequent voice control implementation, the probability of false triggering of other devices of the same type during voice recognition control is reduced, the target device is determined more specifically, and in the process of generating the control command, in order to avoid ambiguity and command conflict situations, the control word behind the initial voice text and the effective device word are reserved to generate the control command, so that the accuracy and reliability of voice recognition control are improved.
In one embodiment, as shown in fig. 3, after the step of performing word segmentation processing on the initial voice text, and extracting device words and control words in the initial voice text, and before the step of determining valid device words matching the device words in the initial voice text according to the device list, the method further includes:
s50: standardizing equipment words and control words in an initial voice text according to the equipment words and the control words extracted from the initial voice text and a pre-stored word meaning standardization dictionary to obtain standardized equipment words and standardized control words, and correspondingly replacing the equipment words and the control words in the initial voice text with the standardized equipment words and the standardized control words to obtain a standardized voice text;
s70: the initial phonetic text is updated to a normalized phonetic text.
Specifically, an open-source tool jieba word segmentation tool may be used to segment the obtained corrected voice text, and then a word sense standardized dictionary loaded in advance is used, where the word sense standardized dictionary mainly includes some common device words and control words, and the spoken device words and control words (spoken words in the user's voice input) in the initial voice text are correspondingly replaced with standard control words and device words to generate a standardized voice text, for example: the control word "on" in turning on the light is replaced with "on". Wherein the standardized phonetic text may be represented in the form of a vocabulary. Then, the initial voice text is updated to the standardized voice text, then, the step of gradually determining the target device words corresponding to the device words in the initial voice text according to the device list is executed, the updated initial voice text is matched and analyzed by using the device nickname words and the device category words in the device list according to the matching priority setting of the device nickname words and the device category words, the device nickname words, the device category words and the device physical addresses corresponding to the initial voice text are obtained, and the control instruction generating process in the embodiment is executed. By standardizing the initialized voice text, the content in the control word list only needs to store standard control words and standard equipment words, so that the matching times required when the equipment words and the control words are matched with the content in the control word list can be greatly reduced, and the overall voice recognition efficiency is improved.
In one embodiment, as shown in fig. 2, the step of performing word segmentation processing on the initial speech text further includes:
s30: performing field intention recognition according to the initial voice text;
and if the device control field is determined, executing a step of performing word segmentation processing on the initial voice text.
The field intention recognition means that a control intention corresponding to the user voice (corresponding to the initial voice text) is recognized in the field; the intention recognition can be regarded as a multi-classification task, so the field intention recognition can be used for recognizing the control intention corresponding to the voice of the user in the field through a classifier. The device control field refers to the control intention corresponding to the user voice for controlling the device to execute actions.
In order to improve the efficiency of voice recognition control, before voice text matching, field intention recognition is performed, specifically, an initial voice text can be matched with pre-stored device words, when at least one device is matched, the device needs to be controlled currently, and then subsequent text word segmentation and other steps are performed. If the initial voice text is not matched with all the pre-stored device words, the user voice corresponding to the initial voice text is not aimed at controlling the device, and then the method enters other working modes, for example, the voice aimed at chatting interaction enters a chatting recognition mode.
In one embodiment, as shown in fig. 2, the step of obtaining the initial phonetic text further includes the steps of:
s10: and converting the acquired voice into an initial voice text.
Before the initial voice text is acquired, the acquired voice can be converted into the initial voice text, and mainly the voice of a user is acquired from the radio equipment and converted into the text. In addition, if the user Speech is collected by an intelligent sound device with Speech Recognition (ASR), etc., the Speech may be converted into an initial Speech text by the intelligent sound device, and then the initial Speech text may be directly obtained from the intelligent sound device.
In one embodiment, as shown in fig. 4, the device list further includes device location information; the steps of generating the control instruction by the target device word corresponding to the last device word in the initial voice text, the physical address of the target device, the last control word in the initial voice text and the word list of the control instruction further include:
s71: matching device position information corresponding to the target device word according to the device position information in the device list;
s72: and if the equipment corresponding to the target equipment word is judged not to be in the control range according to the equipment position information corresponding to the target equipment word, generating and sending a first type reply language, wherein the first type reply language is used for prompting that the equipment is not in the control range.
In order to improve the control effect, a control range is usually preset, after an initial voice text is matched to obtain a device intended to be controlled by a user, if the device is in a device control field, word segmentation is performed on the obtained initial voice text, a device word and a control word extracted by word segmentation are standardized, the standardized device word and the standardized device word are matched with each device word in a device list through effective device words, a target device word is further determined from the effective device words, and if the device position is judged to conflict with the control range according to device position information corresponding to the target device word obtained through matching, a first type reply is sent to prompt the user that the device intended to be controlled is not in the control range. For example, if the control range is bedroom, at this time, if the user sends a voice control command to the sound equipment to control the bedside lamp in the bedroom, the sound equipment converts the voice into an initial voice text and transmits the initial voice text to the processing terminal, the processing terminal performs a series of processing on the initial voice text and then matches the initial voice text to obtain the device position information of the bedside lamp, and as the device position information does not conflict with the control range, the control command can be generated to control the bedside lamp in the bedroom. If the initialization voice text is detected not to include the device position information, the preset default position is used as a control range, for example, the position of the sound box is located, so that other devices in the bedroom are controlled, the user does not need to specify the device position when inputting the voice, the device in the bedroom is opened by default, and if the user wants to open the devices other than the default position, the user specifies the device position when inputting the voice. The specific control range may be set by a user according to the location of the controlled device in the application scene, the network coverage, and the like, or may be preset by an open platform.
Optionally, if it is determined that the matched device location information conflicts with the control range, the first type of reply language may be sent to the sound equipment, so that the sound equipment broadcasts the first type of reply language. For example, if the control range is the living room, the user inputs to the sound in the living room: the aromatherapy machine is turned on, the control range is the default position due to the fact that the equipment position information is not provided, namely, the living room where the sound equipment is located, the equipment position information of the aromatherapy machine matched according to the list in the equipment is the bedroom, and the position information conflicts, then a first type of reply is sent: the fragrance machine is not found in the living room without good meaning.
In one embodiment, the speech recognition control method further comprises:
if a control instruction is not generated according to a target device word corresponding to the last device word in the initial voice text, a physical address of the target device, the last control word in the initial voice text and a control instruction word list, generating and sending a second type reply word, wherein the second type reply word is used for prompting a user to provide new voice;
acquiring new voice provided by a user;
converting the acquired new voice into a new voice text;
fusing the initialized voice text and the new voice text to generate a fused voice text;
and updating the initial voice text into a fused voice text, and executing the step of segmenting the initial voice text.
If the instruction is not correctly matched according to the last control word, the last target device word, the device physical address corresponding to the last target device word and the control instruction word list in the initial voice text, the user needs to be asked for clarification, at this time, a second type reply language is generated and sent for prompting the user to provide new voice, and the two cases can be roughly divided into two cases: one is when the user does not state what function the device is required to perform clearly, the second type of reply language may be: you control what the device does. The second type is that if multiple devices are matched to obtain a control instruction corresponding to the voice of the user (for example, if only a device type word, i.e., a lamp, is provided, and a device nickname word, i.e., a bedside lamp, is not provided), the second type of reply language may be: ask you specifically which device to control. Obtaining new voice provided by a user, converting the voice into a new voice text, fusing the new voice text with a previously obtained standardized voice text, for example, directly performing text splicing, updating an initial voice text into a fused voice text, performing the operations of error correction, word segmentation, standardization processing, control instruction matching and the like on the initial voice text, if the control instruction can be successfully matched, sending the control instruction to corresponding equipment, and ending the control flow. The new voice provided by the user can be the voice collected by the radio equipment within a certain time period, for example, the voice collected by the radio equipment within one minute after the second type of reply language is sent. Optionally, in order to reduce complexity of the dialog flow, only sending the second type of reply language once, if the control command cannot be successfully matched, sending the third type of reply language to prompt that the clarified voice content of the user is incorrect and cannot be identified, and storing all information in the current dialog into the user object.
In one embodiment, the speech recognition control method further comprises:
recording a first time for acquiring an initial voice text;
recording a second time for acquiring new voice;
and if the time difference between the second time and the first time is less than the preset life cycle, executing a step of fusing the initialized voice text and the new voice text.
To better explain the implementation process of the embodiment of the present application, a radio device is taken as an example to collect voice. The user who regards the pronunciation that same radio equipment gathered as the object establishes a user object who has the life cycle, and the life cycle is used for preventing the circumstances of high in the clouds or terminal memory pile up and the maximum time interval of control many rounds of conversations to improve the efficiency of speech error correction discernment. The user object may store the location of the radio receiver and the conversation content such as the previous initialized phonetic text (standardized phonetic text). Specifically, if the time difference between the time for acquiring the new voice and the time for acquiring the initial voice text last time is greater than the time of the life cycle, it is considered that the voice content of the previous round is unrelated to the voice content of the previous round, no voice fusion is performed, and a new round of voice recognition process is started. And if the time difference between the second time for acquiring the new voice and the time for acquiring the initial voice text last time is smaller than the life cycle, regarding that the voice content of the previous round is associated with the voice content of the previous round, performing voice fusion if the new voice is provided by the user according to the prompt pertinence of the second type of reply language, and performing voice recognition control on the fused voice text. Wherein the second time may also be a time when a new speech is converted into a new speech text.
Optionally, the setting of the life cycle may also be directed to a time interval between sending the second type reply language and acquiring the new speech, for example, if speech is received in the life cycle after sending the second type reply language, the new speech is converted into a new speech text and is fused with the normalized speech text obtained in the previous round. And if the life cycle is exceeded after the second type of reply language is sent and a new voice is received, the content corresponding to the voice is considered to be unrelated to the standardized voice text obtained by the previous round of matching. For example, the life cycle may be 1 minute, and if a new voice is received after 1 minute of sending the second type reply language, the voice content is considered to be unrelated to the original (standardized) voice text content obtained in the previous round, and text fusion is not performed, and a new round of voice text conversion, error correction and word segmentation, standardization processing, control instruction matching, storage, fusion, association of upper and lower voice contents, and the like are started with the newly obtained voice.
And if the command can be correctly matched according to the initialized voice text, the initialized voice text matching result and the control command word list, sending the control command to corresponding equipment, and driving the equipment to work according to the intention of the user. Optionally, when the matching succeeds in generating the control instruction, a fourth type reply word may be further sent, where the fourth type reply word is used to prompt the device to execute the control instruction, and the fourth type reply word may be sent to the sound equipment and broadcasted by the sound equipment. For example, the initial phonetic text is: the air conditioner is turned on. The successfully matched control instruction is used for driving the air conditioner to be turned on, and the first type of reply language content sent to the sound equipment can be as follows: the air conditioner is being turned on for you.
In one embodiment, the step of segmenting the initial speech text is preceded by the steps of:
correcting and updating the initial voice text in a mode of replacing error words by correct words according to the sequence of the lengths of the error words from long to short and according to the error words and the correct words in a pre-stored error correction word list in turn so as to generate a corrected voice text;
updating the initial voice text into an error correction voice text;
wherein, the words in the same position in the initial voice text are corrected and updated only once; the error correction word list comprises a plurality of key value pairs, wherein the keys are error words, and the values are correct words.
Each key in the key value pair is followed by a corresponding value, and when the corresponding key is pressed, the corresponding value is output as a result, namely, the correct word corresponding to the key value can be determined through the wrong word. The length of the error word is from long to short, which means that the number of words forming the error word is arranged from more to less. The words at the same position in the initial voice text are corrected and updated only once, which means that the correct words corrected and updated in the previous round are still ensured in each comparison and correction.
Specifically, the speech recognition control method provided in this embodiment of the present application compares the obtained initial speech text with a specific error correction vocabulary, for example, first compares the longest error word in the error correction vocabulary with the initial speech text, determines the error word in the initial speech text according to the comparison result, replaces the error word with a corresponding correct word in the error correction vocabulary, realizes error word correction, updates the initial speech text into a corrected speech text, compares, corrects, and updates the error word with the second length with the updated initial speech text, and circulates this until the error word with the shortest length is compared, and generates a corrected speech text. For example, if the initial voice text is "turn on the music smart lamp and the detailed sum", the wrong words in the error correction vocabulary are "detailed sum" and "sum" in sequence from long to short, and the "detailed sum" corresponding correct word is "fragrance machine", and the "sum" corresponding correct word is "close", the correct word "fragrance machine" is first used to replace the wrong words "detailed sum" to "turn on the music smart lamp and detailed sum" for the initial voice text ", the updated voice text is" turn on the music smart lamp and fragrance machine ", and then the updated voice text is corrected and updated according to the wrong words" sum ", but since the" sum "in the" sum "has been corrected and updated once, the words are not corrected and updated any more, that is, the" close "is not used to replace the sum" so as to avoid covering the last correction result. By correcting the initial voice text in advance, the accuracy of processes such as follow-up voice control instruction matching is improved, and the failure of equipment word and control word recognition caused by the initial voice text error caused by dialect and other reasons is avoided.
According to the voice recognition control method provided by the embodiment of the application, in the correction process, words at the same position in the initial voice text are corrected only once, and the content of the longer voice text is corrected preferentially, so that the problem that the front and back words at the splicing part of two adjacent words are mistakenly corrected as a whole word can be reduced or prevented, the originally correct word is replaced or covered with the originally correct correction content, the error correction is caused, the error correction accuracy is improved, and the accuracy of voice recognition is further improved. Optionally, the collected error words may be de-duplicated first, and then sequentially arranged from long to short to generate an error correction word list, so as to improve the speed of speech correction processing.
In one embodiment, error words in the error correction word list are arranged from long to short;
according to the sequence from long to short of the length of the error word, the method corrects and updates the initial voice text in a mode of replacing the error word by the correct word according to each error word and the correct word in a pre-stored error correction word list in sequence to generate a corrected voice text, and comprises the following steps:
comparing the error words at the target position in the error correction word list with the initial voice text to determine the target error words in the initial voice text; wherein, the initial value of the target position is the head position of the error correction word list;
searching a correct word corresponding to the target error word in the error correction word list, and replacing the target error word with the corresponding correct word;
updating the initial voice text into a text formed after replacement, updating the target position into a next table position, and comparing error words at the target position in the error correction word table with the initial voice text to determine the target error words in the initial voice text;
and updating the target position to the tail position of the error correction word list, and then replacing the updated and replaced voice text to form a corrected voice text.
In order to increase the processing speed without searching for error words with different lengths from the error correction word list each time, the speech recognition control method provided by the embodiment of the application arranges the error words in the error correction word list from long to short, compares the error words at the head position with the initial speech text from the head position when the initial speech text is corrected according to the error words in the error correction word list, determines the target error words in the initial speech text, searches for the corresponding correct words in the error correction word list through the target error words, corrects and updates the initial speech text in a manner that the correct words replace the error words, then compares the error words in the next lattice of the error correction word list with the speech text after updating and replacing to determine the target error words, searches for the corresponding correct words in the error correction word list, replaces the target error words with the correct words, finishes updating each time, comparing the next error word in the error correction table with the newly updated voice text, determining a target error word and replacing the target error word until the last updated voice text is corrected and updated according to the error word at the tail of the error correction table, indicating that the error correction is finished, and taking the last updated voice text as a corrected voice text. It should be noted that, for the position of the word that has been corrected in the initial speech text, if the word or such a word at the position still intersects with the error word in the error correction word list when the error word is aligned next time, the target error word determined by this comparison is not corrected, so as to avoid damaging or covering the previous correction result. Optionally, if the error word in the current table in the error correction word list is compared with the initial voice text, and it is found that the initial voice text has no error word, the initial voice text is not corrected and updated, and the initial voice text is directly corrected and updated according to the error word in the next table in the error correction word list.
In order to better explain the generation process of the corrected phonetic text, the following specific example is given, but the example does not limit the scope of the embodiments of the present application. In the error correction vocabulary, error words are sequentially arranged into a 'star-moon intelligent lamp', 'detailed sum', 'bar love', 'fragrance' for explanation from long to short, correct words corresponding to the error words are respectively 'music intelligent lamp', 'fragrance machine', 'bar love' and 'closed', when an initial voice text is corrected, the error word 'star-moon intelligent lamp' at the head of the vocabulary is firstly compared with the initial voice text 'star-moon intelligent lamp and detailed sum, bar love hall lamp', a target error word in the initial voice text is determined to be 'star-moon intelligent lamp', the correct word 'music intelligent lamp' obtained by looking up the table is used for replacing the 'star-moon intelligent lamp', and the initial voice text is updated to be 'music intelligent lamp on and detailed sum, bar love hall lamp'; then comparing the detail sum of the error words of the second lattice in the error correction word list with the updated initial voice text to turn on the intelligent music lamp and the detailed sum of the error words and the updated initial voice text to determine the detail sum of the target error words, replacing the detail sum with the correct word of the aromatherapy machine obtained by looking up the table to obtain the updated initial voice text, and turning on the intelligent music lamp and the aromatherapy machine to turn on the new intelligent music lamp and the updated initial voice text; comparing error words 'love' of a third lattice in the error correction word list with 'turning on the intelligent music lamp and the aromatherapy machine and turning on the hall lamp', determining a target error word 'love', replacing 'love' with a correct word 'dimming' obtained by looking up the table, updating an initial voice text to obtain 'turning on the intelligent music lamp and the aromatherapy machine and dimming the hall lamp'; and finally, comparing the error word ' the fragrance ' at the tail of the error correction word list with ' turning on the intelligent music lamp and the aromatherapy machine and dimming the lights in the living room ', and determining the target error word ' the fragrance ', wherein even if the error word ' the fragrance ' is matched, the correct word ' on ' cannot be adopted to replace the fragrance '.
In one embodiment, the step of comparing the error word at the target position in the error correction word list with the initial voice text and determining the target error word in the initial voice text comprises the steps of:
acquiring each error word in the error correction word list, and converting each error word into a pinyin character string;
converting the initial voice text into a pinyin character string to be corrected;
comparing the pinyin character string converted by the error word at the target position in the error correction word list with the pinyin character string to be corrected to obtain an intersection of a plurality of pinyin characters;
and determining the maximum substring in each pinyin character intersection as a target error word.
Because the character ratio text occupies less storage space and requires less processing capacity, in the voice recognition control method provided by the embodiment of the application, the process of determining the target error word in the initial voice text by comparing the error word at the target position in the error correction word list with the initial voice text can be realized by adopting a character string comparison mode, and the processing speed is increased. Specifically, each error word extracted from the error correction word list is converted into a pinyin character string, an initial voice text is converted into a pinyin character string to be corrected, then the pinyin character string converted from the error word at a target position in the error correction word list is compared with the pinyin character string to be corrected to obtain a plurality of pinyin character intersections, the largest sub-character string in each pinyin character intersection is determined to be the target error word, namely, the longer error word is preferably considered to be modified, the continuity between the upper word and the lower word is guaranteed to the maximum extent, and the phenomenon that the error correction is caused by comparison and modification after the single device word or control word is divided is avoided.
To better explain the implementation process of comparing the error word at the target position in the error correction word list with the initial speech text and determining the target error word in the initial speech text in the embodiment of the present application, this example is not to limit the protection scope of the embodiment of the present application. Taking the initial voice text as ' turning on the intelligent star-moon lamp, the star-moon player and the detailed and love hall lamp ' as an example, error words in the error correction vocabulary are sequentially arranged as ' star-moon intelligent lamp ', ' detailed and ' love hall lamp ' according to the length of the words from long to short, and correct words corresponding to the error words are ' music intelligent lamp ' and ' music ' respectively. Firstly, converting an initial voice text into a pinyin character string to be corrected, namely 'da, kai, xing, yue, zhi, neng, deng, xing, yue, bo, fang, qi, he, xiang, xi, ji, tiao, ai, ke, ting and deng', sequentially converting each error word into a pinyin character string, then comparing the pinyin character string 'xing, yue, zhi, neng and deng' converted by the error word 'Xinyue intelligent lamp' at the head position in an error correction word list with the pinyin character string to be corrected, namely 'da, kai, xing, yue, zhi, neu, deng, he, xiang, xi, ji, tiao, ai, pinyin, ke, ting' to obtain a pinyin character string 'xing, yue intersection, zhi, neu, ning, yue', determining the maximum intelligent character string in the pinyin string, yue, converting the pinyin character string into a music character string, the updated pinyin character string to be corrected is 'da, kai, yin, yue, zhi, neng, deng, he, xiang, xi, ji, tiao, ai, ke, ting, deng', and then the next error word 'detailed and' converted pinyin character string 'xiang, xi, ji' in the error correction word list is compared with the updated pinyin character string to be corrected. The method has the advantages that the longer wrong words are preferably modified, the consistency between the upper word and the lower word is guaranteed to the maximum extent, and the error correction caused by comparison and modification after the single equipment word or the control word is divided is avoided.
In one embodiment, after the step of determining the largest substring in each pinyin character intersection as the target error word, the method further includes:
recording the position information of the pinyin character corresponding to the target error word in the pinyin character string to be corrected;
the steps of searching for the correct word corresponding to the target error word in the error correction word list and replacing the target error word with the corresponding correct word comprise:
and if the position information recorded at present is different from the position information recorded at the previous time, searching a correct word corresponding to the target error word in the error correction word list, and replacing the target error word with the corresponding correct word.
In order to avoid the content of the previous modification in the initial voice text, the content is modified again in the subsequent correction and updating, after the current error word in the error correction word list is used for determining the target error word, the position of the target error word in the pinyin character string to be corrected is also recorded, when the next error word in the error correction word list is used for determining the target error word, if the position of the determined target error word in the pinyin character string to be corrected is overlapped with the position recorded in the previous step, the fact that part of the content in the target error word is modified is shown, the original correction and modification content is covered by correcting and modifying the part of the content, and therefore the target error word determined in the next step is not corrected and updated. Similarly, when the determined position of the target error word is not overlapped with the position recorded in advance, it indicates that the partial content has not been corrected and updated, and then the correct word corresponding to the target error word can be searched from the error correction word list, and the target error word is replaced by the correct word, and the initial voice text is updated so as to be compared with the next error word in the error correction word list.
It should be understood that although the various steps in the flow charts of fig. 1-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
A voice recognition control apparatus, as shown in fig. 5, comprising:
the voice text acquisition module 1 is used for acquiring an initial voice text, wherein the initial voice text is generated after voice recognition is carried out on user voice;
the device list acquiring module 2 is configured to acquire a device list within a control range, where the device list includes device words and device physical addresses corresponding to the device words; the equipment words comprise equipment category words and/or equipment nickname words, and the priority of the equipment nickname words is higher than that of the equipment category words;
the word segmentation module 3 is used for performing word segmentation processing on the initial voice text and extracting equipment words and control words in the initial voice text;
the analysis module 4 is configured to determine, according to the device list, valid device words matched with the device words in the initial voice text, determine, from the valid device words, target device words corresponding to the device words in the initial voice text, and determine, from the device list, device physical addresses corresponding to the target device words; the target device word is the device word with the highest priority in the effective device words, wherein the priority of the device nickname word is higher than that of the device category word;
the control instruction generating module 5 is configured to generate a control instruction according to a target device word corresponding to the last device word in the initial voice text, a physical address of the target device, the last control word in the initial voice text, and a control instruction word list; the target device physical address is a device physical address corresponding to a target device word corresponding to the last device word, the control instruction word list is used for representing the relation between the initial voice text and the control instruction, and the control instruction is used for indicating the device corresponding to the last target device word to execute the action corresponding to the control word.
For the specific limitation of the speech recognition control device, reference may be made to the above-mentioned limitation on the speech recognition control method, which is not described herein again. Specifically, an initial voice text is obtained through a voice text obtaining module 1; then, an equipment list in a control range is obtained through an equipment list obtaining module 2, the equipment list comprises equipment words and equipment physical addresses corresponding to the equipment words, the equipment words comprise equipment category words and/or equipment nickname words, the priority of the equipment nickname words is higher than that of the equipment category words, the initial voice text is further subjected to word segmentation through a word segmentation module 3, the equipment words and control words in the initial voice text are extracted, effective equipment words matched with the equipment words in the initial voice text are determined through an analysis module 4 according to the equipment list, target equipment words corresponding to the equipment words in the initial voice text are determined from the effective equipment words, and the equipment physical addresses corresponding to the target equipment words are determined from the equipment list; and finally, generating a control instruction according to a target device word corresponding to the last device word in the initial voice text, the physical address of the target device, the last control word in the initial voice text and the control instruction word list by using the control instruction generating module 5. All or part of the modules in the speech recognition control device can be realized by software, hardware and a combination thereof, and the beneficial effects are the same as those of the method embodiment. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, the speech recognition control apparatus further includes:
the standardized voice text acquisition unit is used for standardizing the equipment words and the control words in the initial voice text according to the equipment words and the control words extracted from the initial voice text and a pre-stored word meaning standardized dictionary to obtain standardized equipment words and standardized control words, and correspondingly replacing the equipment words and the control words in the initial voice text with the standardized equipment words and the standardized control words to obtain a standardized voice text;
and the first initial voice text updating unit is used for updating the initial voice text into the standardized voice text.
In one embodiment, the speech recognition control apparatus further includes:
the intention identification module is used for carrying out field intention identification according to the initial voice text;
and the control field judging module is used for executing the step of segmenting the initial voice text when the control field of the equipment is judged.
In one embodiment, the speech recognition control apparatus further includes:
the equipment position information acquisition module is used for matching equipment position information corresponding to the target equipment word according to the equipment position information in the equipment list;
the first type reply language sending module is used for generating and sending a first type reply language when judging that the equipment corresponding to the target equipment word is not in the control range according to the equipment position information corresponding to the target equipment word, wherein the first type reply language is used for prompting that the equipment controlled by the voice is not in the control range;
the device list also includes device location information.
In one embodiment, the speech recognition control apparatus further includes:
the second-class reply language sending module is used for generating and sending a second-class reply language when a control instruction is not generated according to a target device word corresponding to the last device word in the initial voice text, the physical address of the target device, the last control word in the initial voice text and the control instruction word list, wherein the second-class reply language is used for prompting a user to provide new voice;
the new voice acquisition module is used for acquiring new voice provided by a user;
the new voice text conversion module is used for converting the acquired new voice into a new voice text;
the voice text fusion module is used for fusing the initialized voice text and the new voice text to generate a fused voice text;
and the second initial voice text updating module is used for updating the initial voice text into a fused voice text and executing the step of segmenting the initial voice text.
In one embodiment, the speech recognition control apparatus further includes:
the first time acquisition module is used for recording the first time for acquiring the initial voice text;
the second time acquisition module is used for recording the second time for acquiring the new voice;
and the life cycle judging module is used for executing the step of fusing the initialized voice text and the new voice text when the time difference between the second time and the first time is less than the preset life cycle.
In one embodiment, the speech recognition control apparatus further includes:
the error correction voice text acquisition module is used for correcting and updating the initial voice text in a mode that the correct words replace the error words according to the length of the error words from long to short and according to the error words and the correct words in a pre-stored error correction word list in sequence so as to generate a corrected voice text;
the third initial voice text updating module is used for updating the initial voice text into an error correction voice text;
wherein, the words in the same position in the initial voice text are corrected and updated only once; the error correction word list comprises a plurality of key value pairs, wherein the keys are error words, and the values are correct words.
In one embodiment, the correction speech text acquisition module comprises:
the comparison unit is used for comparing the error words at the target position in the error correction word list with the initial voice text so as to determine the target error words in the initial voice text; wherein, the initial value of the target position is the head position of the error correction word list;
the correct word replacing unit is used for searching a correct word corresponding to the target error word in the error correction word list and replacing the target error word with the corresponding correct word;
a voice text updating unit, configured to update the initial voice text to a text formed after replacement, update the target position to a next table position, and perform a step of comparing an error word at the target position in the error correction word table with the initial voice text to determine the target error word in the initial voice text;
the correction voice text generation unit is used for updating the target position to the tail position of the error correction word list and then replacing the updated and replaced voice text to form a voice text as a correction voice text;
wherein, all error words in the error correction word list are arranged from long to short.
All or part of each unit in each module can be realized by software, hardware and a combination thereof, and the beneficial effects are the same as those of the method embodiments. Specifically, the comparison unit compares the error word at the target position in the error correction word list with the initial voice text, to determine the target error word in the initial voice text and send the determined target error word to the correct word replacing unit, the correct word replacing unit searches the correct word corresponding to the target error word in the error correction word list, and the target error word is replaced by the corresponding correct word, and the initial voice text is updated to the text formed after replacement by the voice text updating unit, and updating the target position to the next table position, and driving the comparison unit to perform comparison again between the error word at the target position in the error correction word table and the initial voice text, and finally, updating the target position to the tail position of the error correction word list by the corrected voice text generating unit, and then replacing the updated and replaced voice text to form the corrected voice text.
In one embodiment, the alignment unit comprises:
the error word conversion unit is used for acquiring each error word in the error correction word list and converting each error word into a pinyin character string;
the initial voice text conversion unit is used for converting the initial voice text into a pinyin character string to be corrected;
the pinyin character string comparison unit is used for comparing the pinyin character string converted by the error word at the target position in the error correction word list with the pinyin character string to be corrected to obtain the intersection of a plurality of pinyin characters;
and the target error word determining unit is used for determining the maximum sub-character string in each pinyin character intersection as the target error word.
In one embodiment, the apparatus for modifying a speech text further comprises:
the position information recording unit is used for recording the position information of the pinyin character corresponding to the target error word in the pinyin character string to be corrected;
the alignment unit further comprises:
and the coverage error prevention word replacing unit is used for searching a correct word corresponding to the target error word in the error correction word list and replacing the target error word with the corresponding correct word when the position information recorded at present is judged to be different from the position information recorded at the previous time.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing data such as an error correction word list, an equipment list, a user-defined dictionary and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. When being executed by the processor, the computer program implements the speech recognition control method in the embodiment corresponding to fig. 2 to 4, and for specific implementation of the speech recognition control method, reference may be made to the description of the embodiment corresponding to fig. 2 to 4, which is not described herein again.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
s20: acquiring an initial voice text, wherein the initial voice text is generated after voice recognition is carried out on user voice;
s40: acquiring a device list in a control range, wherein the device list comprises device words and device physical addresses corresponding to the device words; the equipment words comprise equipment category words and/or equipment nickname words, and the priority of the equipment nickname words is higher than that of the equipment category words;
s60: performing word segmentation processing on the initial voice text, and extracting equipment words and control words in the initial voice text;
s80: determining effective device words matched with the device words in the initial voice text according to the device list, determining target device words corresponding to the device words in the initial voice text from the effective device words, and determining device physical addresses corresponding to the target device words from the device list; the target equipment word refers to the equipment word with the highest priority in the effective equipment words;
s90: generating a control instruction according to a target device word corresponding to the last device word in the initial voice text, a physical address of the target device, the last control word in the initial voice text and a control instruction word list; the target device physical address is a device physical address corresponding to a target device word corresponding to the last device word, the control instruction word list is used for representing the relation between the initial voice text and the control instruction, and the control instruction is used for indicating the device corresponding to the last target device word to execute the action corresponding to the control word.
It should be noted that, in the computer device provided in the embodiment of the present application, when the processor runs the program stored in the memory, the steps in any one of the above method embodiments may be executed, so as to achieve the corresponding beneficial effects.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
s20: acquiring an initial voice text, wherein the initial voice text is generated after voice recognition is carried out on user voice;
s40: acquiring a device list in a control range, wherein the device list comprises device words and device physical addresses corresponding to the device words; the equipment words comprise equipment category words and/or equipment nickname words, and the priority of the equipment nickname words is higher than that of the equipment category words;
s60: performing word segmentation processing on the initial voice text, and extracting equipment words and control words in the initial voice text;
s80: determining effective device words matched with the device words in the initial voice text according to the device list, determining target device words corresponding to the device words in the initial voice text from the effective device words, and determining device physical addresses corresponding to the target device words from the device list; the target equipment word refers to the equipment word with the highest priority in the effective equipment words;
s90: generating a control instruction according to a target device word corresponding to the last device word in the initial voice text, a physical address of the target device, the last control word in the initial voice text and a control instruction word list; the target device physical address is a device physical address corresponding to a target device word corresponding to the last device word, the control instruction word list is used for representing the relation between the initial voice text and the control instruction, and the control instruction is used for indicating the device corresponding to the last target device word to execute the action corresponding to the control word.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (12)

1. A speech recognition control method, comprising:
acquiring an initial voice text, wherein the initial voice text is generated after voice recognition is carried out on user voice;
acquiring a device list in a control range, wherein the device list comprises device words and device physical addresses corresponding to the device words; the equipment words comprise equipment category words and/or equipment nickname words, and the priority of the equipment nickname words is higher than that of the equipment category words;
performing word segmentation processing on the initial voice text, and extracting equipment words and control words in the initial voice text;
determining effective device words matched with the device words in the initial voice text according to the device list, determining target device words corresponding to the device words in the initial voice text from the effective device words, and determining device physical addresses corresponding to the target device words from the device list; the target equipment word refers to the equipment word with the highest priority in the effective equipment words;
generating a control instruction according to a target device word corresponding to the last device word in the initial voice text, a physical address of the target device, the last control word in the initial voice text and a control instruction word list; the target device physical address is a device physical address corresponding to a target device word corresponding to the last device word, the control instruction word list is used for representing the relation between the initial voice text and the control instruction, and the control instruction is used for indicating the device corresponding to the last target device word to execute the action corresponding to the control word.
2. The speech recognition control method according to claim 1, wherein after the step of performing word segmentation on the initial speech text and extracting device words and control words in the initial speech text, and before the step of determining valid device words matching each of the device words in the initial speech text according to the device list, the method further comprises:
standardizing the device words and the control words in the initial voice text according to the device words and the control words extracted from the initial voice text and a pre-stored word meaning standardization dictionary to obtain standardized device words and standardized control words, and correspondingly replacing the device words and the control words in the initial voice text with the standardized device words and the standardized control words to obtain a standardized voice text;
updating the initial phonetic text to the normalized phonetic text.
3. The speech recognition control method of claim 1, wherein the step of performing the segmentation process on the initial speech text is preceded by:
performing field intention recognition according to the initial voice text;
and if the initial voice text is judged to be in the equipment control field, executing a step of performing word segmentation processing on the initial voice text.
4. The speech recognition control method of claim 1, wherein the device list further includes device location information; the steps of generating a control instruction by a target device word corresponding to the last device word in the initial voice text, a physical address of the target device, the last control word in the initial voice text and a word list of the control instruction further include:
matching device position information corresponding to the target device word according to the device position information in the device list;
and if the equipment corresponding to the target equipment word is judged not to be in the control range according to the equipment position information corresponding to the target equipment word, generating and sending a first type reply language, wherein the first type reply language is used for prompting that the equipment controlled by the voice is not in the control range.
5. The speech recognition control method of claim 1, further comprising:
if a control instruction is not generated according to a target device word corresponding to the last device word in the initial voice text, a physical address of the target device, the last control word in the initial voice text and a control instruction word list, generating and sending a second type reply word, wherein the second type reply word is used for prompting a user to provide new voice;
acquiring new voice provided by a user;
converting the new speech into a new speech text;
fusing the initialized voice text and the new voice text to generate a fused voice text;
and updating the initial voice text into the fused voice text, and executing the step of segmenting the initial voice text.
6. The speech recognition control method of any one of claims 1-5, wherein the step of tokenizing the initial speech text is preceded by:
correcting and updating the initial voice text in a mode of replacing error words with correct words according to the sequence of the lengths of the error words from long to short and according to the error words and the correct words in a pre-stored error correction word list in turn so as to generate a corrected voice text;
updating the initial voice text into the error correction voice text;
wherein words at the same position in the initial speech text are corrected and updated only once; the error correction word list comprises a plurality of key value pairs, wherein the keys are error words, and the values are correct words.
7. The speech recognition control method of claim 6, wherein each of the error words in the error correction vocabulary is arranged from long to short;
the step of correcting and updating the initial voice text in a manner of replacing error words with correct words according to the sequence of the lengths of the error words from long to short and according to each error word and correct word in a pre-stored error correction word list in sequence to generate a corrected voice text comprises the following steps:
comparing the error word at the target position in the error correction word list with the initial voice text to determine a target error word in the initial voice text; wherein, the initial value of the target position is the head position of the error correction word list;
searching a correct word corresponding to a target error word in the error correction word list, and replacing the target error word with the corresponding correct word;
updating the initial voice text into a text formed after replacement, updating the target position into a next table position, and executing the step of comparing the error word at the target position in the error correction word table with the initial voice text to determine the target error word in the initial voice text;
and updating the target position to the tail position of the error correction word list, and then using the voice text formed by updating and replacing as the correction voice text.
8. The method according to claim 7, wherein the step of comparing the error word at the target position in the error correction vocabulary with the initial speech text and determining the target error word in the initial speech text comprises the steps of:
acquiring each error word in the error correction word list, and converting each error word into a pinyin character string;
converting the initial voice text into a pinyin character string to be corrected;
comparing the pinyin character string converted by the error word at the target position in the error correction word list with the pinyin character string to be corrected to obtain an intersection of a plurality of pinyin characters;
and determining the maximum sub-character string in each pinyin character intersection as the target error word.
9. The speech recognition control method of claim 8, further comprising, after the step of determining the largest substring in each of the pinyin character intersections as the target error word:
recording the position information of the pinyin character corresponding to the target error word in the pinyin character string to be corrected;
the step of searching for the correct word corresponding to the target error word in the error correction word list and replacing the target error word with the corresponding correct word comprises the following steps:
and if the position information recorded at present is different from the position information recorded at the previous time, searching a correct word corresponding to the target error word in the error correction word list, and replacing the target error word with the corresponding correct word.
10. A speech recognition control apparatus, comprising:
the voice text acquisition module is used for acquiring an initial voice text, wherein the initial voice text is generated after voice recognition is carried out on user voice;
the device list acquiring module is used for acquiring a device list in a control range, wherein the device list comprises device words and device physical addresses corresponding to the device words; the equipment words comprise equipment category words and/or equipment nickname words, and the priority of the equipment nickname words is higher than that of the equipment category words;
the word segmentation module is used for carrying out word segmentation processing on the initial voice text and extracting equipment words and control words in the initial voice text;
the analysis module is used for determining effective device words matched with the device words in the initial voice text according to the device list, determining target device words corresponding to the device words in the initial voice text from the effective device words, and determining device physical addresses corresponding to the target device words from the device list; the target equipment word refers to the equipment word with the highest priority in the effective equipment words;
a control instruction generating module, configured to generate a control instruction according to a target device word corresponding to a last device word in the initial voice text, a physical address of the target device, a last control word in the initial voice text, and a control instruction word list; the target device physical address is a device physical address corresponding to a target device word corresponding to the last device word, the control instruction word list is used for representing the relation between the initial voice text and the control instruction, and the control instruction is used for indicating the device corresponding to the last target device word to execute the action corresponding to the control word.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the speech recognition control method according to any one of claims 1-9 are implemented when the program is executed by the processor.
12. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the speech recognition control method according to any one of claims 1 to 9.
CN201910931524.5A 2019-09-29 2019-09-29 Speech recognition control method and device, computer equipment and computer storage medium Active CN110767232B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910931524.5A CN110767232B (en) 2019-09-29 2019-09-29 Speech recognition control method and device, computer equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910931524.5A CN110767232B (en) 2019-09-29 2019-09-29 Speech recognition control method and device, computer equipment and computer storage medium

Publications (2)

Publication Number Publication Date
CN110767232A true CN110767232A (en) 2020-02-07
CN110767232B CN110767232B (en) 2022-03-29

Family

ID=69330853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910931524.5A Active CN110767232B (en) 2019-09-29 2019-09-29 Speech recognition control method and device, computer equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN110767232B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782885A (en) * 2019-09-29 2020-02-11 深圳和而泰家居在线网络科技有限公司 Voice text correction method and device, computer equipment and computer storage medium
CN111372110A (en) * 2020-04-13 2020-07-03 李小强 Television control method based on voice recognition
CN112466289A (en) * 2020-12-21 2021-03-09 北京百度网讯科技有限公司 Voice instruction recognition method and device, voice equipment and storage medium
CN112992144A (en) * 2021-04-21 2021-06-18 国网浙江省电力有限公司金华供电公司 Intelligent voice regulation and control method applied to electric power field
CN113314119A (en) * 2021-07-27 2021-08-27 深圳百昱达科技有限公司 Voice recognition intelligent household control method and device
CN113495489A (en) * 2020-04-07 2021-10-12 深圳爱根斯通科技有限公司 Automatic configuration method and device, electronic equipment and storage medium
CN113539252A (en) * 2020-04-22 2021-10-22 庄连豪 Barrier-free intelligent voice system and control method thereof
CN117636877A (en) * 2024-01-24 2024-03-01 广东铭太信息科技有限公司 Intelligent system operation method and system based on voice instruction

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105913847A (en) * 2016-06-01 2016-08-31 北京灵隆科技有限公司 Voice control system, user end device, server and central control unit
CN106019951A (en) * 2015-03-29 2016-10-12 朱保东 Intelligent household system and products thereof
JP2016180916A (en) * 2015-03-25 2016-10-13 日本電信電話株式会社 Voice recognition system, voice recognition method, and program
CN107864075A (en) * 2017-09-30 2018-03-30 深圳市艾特智能科技有限公司 Intelligent home device IP update methods, system, storage medium and computer equipment
CN108121528A (en) * 2017-12-06 2018-06-05 深圳市欧瑞博科技有限公司 Sound control method, device, server and computer readable storage medium
CN108332363A (en) * 2018-04-27 2018-07-27 奥克斯空调股份有限公司 A kind of band voice control function air conditioner and its control method
US10192554B1 (en) * 2018-02-26 2019-01-29 Sorenson Ip Holdings, Llc Transcription of communications using multiple speech recognition systems
CN109412908A (en) * 2018-10-19 2019-03-01 珠海格力电器股份有限公司 A kind of method and apparatus that voice shows controllable device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016180916A (en) * 2015-03-25 2016-10-13 日本電信電話株式会社 Voice recognition system, voice recognition method, and program
CN106019951A (en) * 2015-03-29 2016-10-12 朱保东 Intelligent household system and products thereof
CN105913847A (en) * 2016-06-01 2016-08-31 北京灵隆科技有限公司 Voice control system, user end device, server and central control unit
CN107864075A (en) * 2017-09-30 2018-03-30 深圳市艾特智能科技有限公司 Intelligent home device IP update methods, system, storage medium and computer equipment
CN108121528A (en) * 2017-12-06 2018-06-05 深圳市欧瑞博科技有限公司 Sound control method, device, server and computer readable storage medium
US10192554B1 (en) * 2018-02-26 2019-01-29 Sorenson Ip Holdings, Llc Transcription of communications using multiple speech recognition systems
CN108332363A (en) * 2018-04-27 2018-07-27 奥克斯空调股份有限公司 A kind of band voice control function air conditioner and its control method
CN109412908A (en) * 2018-10-19 2019-03-01 珠海格力电器股份有限公司 A kind of method and apparatus that voice shows controllable device

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782885A (en) * 2019-09-29 2020-02-11 深圳和而泰家居在线网络科技有限公司 Voice text correction method and device, computer equipment and computer storage medium
CN110782885B (en) * 2019-09-29 2021-11-26 深圳数联天下智能科技有限公司 Voice text correction method and device, computer equipment and computer storage medium
CN113495489A (en) * 2020-04-07 2021-10-12 深圳爱根斯通科技有限公司 Automatic configuration method and device, electronic equipment and storage medium
CN111372110A (en) * 2020-04-13 2020-07-03 李小强 Television control method based on voice recognition
CN113539252A (en) * 2020-04-22 2021-10-22 庄连豪 Barrier-free intelligent voice system and control method thereof
CN112466289A (en) * 2020-12-21 2021-03-09 北京百度网讯科技有限公司 Voice instruction recognition method and device, voice equipment and storage medium
CN112992144A (en) * 2021-04-21 2021-06-18 国网浙江省电力有限公司金华供电公司 Intelligent voice regulation and control method applied to electric power field
CN113314119A (en) * 2021-07-27 2021-08-27 深圳百昱达科技有限公司 Voice recognition intelligent household control method and device
CN117636877A (en) * 2024-01-24 2024-03-01 广东铭太信息科技有限公司 Intelligent system operation method and system based on voice instruction
CN117636877B (en) * 2024-01-24 2024-04-02 广东铭太信息科技有限公司 Intelligent system operation method and system based on voice instruction

Also Published As

Publication number Publication date
CN110767232B (en) 2022-03-29

Similar Documents

Publication Publication Date Title
CN110767232B (en) Speech recognition control method and device, computer equipment and computer storage medium
CN110782885B (en) Voice text correction method and device, computer equipment and computer storage medium
CN107644638B (en) Audio recognition method, device, terminal and computer readable storage medium
US11562736B2 (en) Speech recognition method, electronic device, and computer storage medium
CN106782526B (en) Voice control method and device
CN108183844B (en) Intelligent household appliance voice control method, device and system
JP2021018797A (en) Conversation interaction method, apparatus, computer readable storage medium, and program
CN107544271B (en) Terminal control method, device and computer readable storage medium
US10388277B1 (en) Allocation of local and remote resources for speech processing
CN107909998B (en) Voice instruction processing method and device, computer equipment and storage medium
CN107729433B (en) Audio processing method and device
US9349370B2 (en) Speech recognition terminal device, speech recognition system, and speech recognition method
US10535337B2 (en) Method for correcting false recognition contained in recognition result of speech of user
CN111797632B (en) Information processing method and device and electronic equipment
CN111326154B (en) Voice interaction method and device, storage medium and electronic equipment
CN110992937B (en) Language off-line identification method, terminal and readable storage medium
CN110689881A (en) Speech recognition method, speech recognition device, computer equipment and storage medium
CN112466289A (en) Voice instruction recognition method and device, voice equipment and storage medium
CN113495489A (en) Automatic configuration method and device, electronic equipment and storage medium
CN115002099A (en) Man-machine interactive file processing method and device for realizing IA (Internet of things) based on RPA (resilient packet Access) and AI (Artificial Intelligence)
CN110211576A (en) A kind of methods, devices and systems of speech recognition
US20220399013A1 (en) Response method, terminal, and storage medium
CN111128127A (en) Voice recognition processing method and device
CN116415590A (en) Intention recognition method and device based on multi-round query
CN108304497B (en) Terminal control method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200408

Address after: 1706, Fangda building, No. 011, Keji South 12th Road, high tech Zone, Yuehai street, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: Shenzhen shuliantianxia Intelligent Technology Co., Ltd

Address before: 518051 Shenzhen Nanshan High-tech South District, Shenzhen City, Guangdong Province, No. 6 South Science and Technology 10 Road, Shenzhen Institute of Space Science and Technology Innovation Building, Block D, 10th Floor, 1003

Applicant before: SHENZHEN H & T HOME ONLINE NETWORK TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant