CN116416990A

CN116416990A - Voice control method, voice control device, electronic equipment and storage medium

Info

Publication number: CN116416990A
Application number: CN202310610732.1A
Authority: CN
Inventors: 刘嵘
Original assignee: Apollo Zhilian Beijing Technology Co Ltd
Current assignee: Apollo Zhilian Beijing Technology Co Ltd
Priority date: 2023-05-26
Filing date: 2023-05-26
Publication date: 2023-07-11

Abstract

The disclosure provides a voice control method, which relates to the technical field of voice processing, in particular to the technical fields of voice recognition, voice interaction and natural language processing. The specific implementation scheme is as follows: in response to receiving the input speech, matching an input text corresponding to the input speech with a first set of instruction texts; determining respective element texts of at least one element on the current page in response to the input text not being successfully matched with the first instruction text set; for each element text, matching the input text with a second instruction text set according to the common text between the input text and the element text to obtain a matching result for the element; determining a target element from the at least one element according to the respective matching result of the at least one element; and performing a control operation for the target element. The disclosure also provides a voice control device, an electronic device and a storage medium.

Description

Voice control method, voice control device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of speech processing, and in particular to the field of speech recognition, speech interaction, and natural language processing. More particularly, the present disclosure provides a voice control method, apparatus, electronic device, and storage medium.

Background

Intelligent automobiles are becoming popular, and most of them include voice assistants that enable users to control the vehicle by voice, such as controlling opening and closing of windows, controlling navigation, controlling various vehicle applications, and so forth.

Disclosure of Invention

The disclosure provides a voice control method, a voice control device, voice control equipment and a storage medium.

According to a first aspect, there is provided a voice control method comprising: in response to receiving the input speech, matching an input text corresponding to the input speech with a first set of instruction texts; determining respective element texts of at least one element on the current page in response to the input text not being successfully matched with the first instruction text set; for each element text, matching the input text with a second instruction text set according to the common text between the input text and the element text to obtain a matching result for the element; determining a target element from the at least one element according to the respective matching result of the at least one element; and performing a control operation for the target element.

According to a second aspect, there is provided a speech control apparatus comprising: the first matching module is used for responding to the received input voice and matching the input text corresponding to the input voice with the first instruction text set; the element text determining module is used for determining respective element texts of at least one element on the current page in response to the fact that the input text is not successfully matched with the first instruction text set; the second matching module is used for matching the input text with a second instruction text set according to the public text between the input text and the element text for each element text to obtain a matching result for the element; the target element determining module is used for determining target elements from at least one element according to the respective matching results of the at least one element; and a control module for performing a control operation for the target element.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method provided according to the present disclosure.

According to a fifth aspect, there is provided a computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which, when executed by a processor, implements a method provided according to the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic illustration of a scenario of a speech control method according to one embodiment of the present disclosure;

FIG. 2 is a flow chart of a voice control method according to one embodiment of the present disclosure;

FIG. 3 is a flow chart of a method of determining candidate elements according to one embodiment of the present disclosure;

FIG. 4 is a flow chart of a method of determining candidate elements according to another embodiment of the present disclosure;

FIG. 5 is a block diagram of a voice control apparatus according to one embodiment of the present disclosure;

fig. 6 is a block diagram of an electronic device of a voice control method according to one embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

As intelligent automobiles are increasingly popular, the functionality that the intelligent cabin provides to the user is becoming increasingly rich. The number of situations in which drivers operate intelligent cabin central control systems during driving is increasing. For example: the current navigation route can be controlled by controlling the navigation application; the multimedia application is controlled, so that the user can switch and request the multimedia content according to personal preference; steering the vehicle, such as air conditioning control, window/sunroof/sunshade control, and the like; control bluetooth phone, can accurately dial specific contact.

Although most smart vehicles now incorporate voice assistants, the central control system can be controlled by voice. But its capabilities are quite limited and it is often difficult to fully implement all functional manipulations, such as the inability of nearly all voice assistants to achieve a percentage coverage of the vehicle's control intent by voice. Still have a lot of functions, need the driver to reach through the mode of touch-control, greatly increased the security risk.

Therefore, the demand of the full-scene voice touch screen is grown, and in particular, the voice touch screen is used for replacing a driver 'touch screen' by 'voice', so as to realize the operation of the central control system. What you can say is that the function on the screen can be controlled by voice, and is an effective solution for the full-field Jing Yuyin touch screen. The specific implementation can be that elements on a page are scanned to obtain a page element text, when a user initiates voice, the text is obtained through voice recognition, the text obtained through voice recognition is matched with the scanned page element text, and the position of the elements on a screen is simulated to be clicked after the matching, so that voice control is realized.

It can be said that fuzzy matching strategy can be adopted to match the voice recognition result with the page element. Fuzzy matching means that the length of the common text of the voice recognition text of the user and the page element text is larger than a preset value (for example, 3), and the voice of the user is considered to hit the page element. In practice, the fuzzy matching strategy still has a serious excessive recall problem, so that the recognition result of the voice touch screen is not hit, but is mishit.

Table 1 below shows the case of several false hits for fuzzy matching.

TABLE 1

As shown in table 1, for example, the page element is a text control, the text content of the text control is "atmosphere lamp", and clicking on the text control can turn on the atmosphere lamp setting page. The voice recognition result of the user is that the atmosphere lamp is turned on, and according to the fuzzy matching strategy which can be said, the voice recognition result can be successfully matched with the text control, namely, the voice of the user hits the atmosphere lamp control, so that the atmosphere lamp control is simulated to be clicked, and the atmosphere lamp setting page is turned on. This is not in agreement with the actual intention of the user to actually turn on the atmosphere lamp.

For example, the page element is a switch control, and the switch control is off, the text content of the switch control being "silent mode". The voice recognition result of the user is "turn off the mute mode", and according to the fuzzy matching strategy, the user voice hits the switch control of the mute mode, so that the simulated clicking operation is performed, and the mute mode is turned on again, which is different from the actual intention of the user to turn off the mute mode.

For example, the page element is an application icon, and the text content is "XXX" (the name of the application icon, for example, containing 3 words). The user's voice recognition result is "XXX is not to be opened", and according to the fuzzy matching strategy, the user's voice hits the application icon, thereby performing a simulated click operation to open the application, which is different from the real intention that the user does not actually open XXX.

Therefore, the excessive recall problem of fuzzy matching can cause more incorrect operations which are inconsistent with the actual intention of the user, so that the user experience is poor.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

Fig. 1 is a schematic view of a scenario of a voice control method according to one embodiment of the present disclosure.

As shown in fig. 1, the embodiment may be a display screen of an on-vehicle terminal, where a page in the display screen may include a plurality of page elements, and the page elements may include a text control, a switch control, an application icon, and the like.

For example, element 101 is a text control, and the text content of element 101 is a "home page". Clicking on element 101 may display a plurality of application icons. The application icons include, for example, element 104, element 105, and element 106, among others. Element 104 may be an application icon for a video playing software, element 105 may be an application icon for a music playing software, and element 106 may be an application icon for a navigation software.

Element 102 is a text control, the text content of element 102 is "air conditioner", and clicking on element 102 may open a setup page for air conditioner, such as setting an operation mode, setting a temperature, etc.

Element 103 is a switch control, the text content of element 103 is "silent mode", and clicking element 103 may turn the silent mode on or off.

According to the voice control method, the touch operation of a user can be replaced by voice, the input voice of the user is recognized as an input text, the input text is matched with the text content of each element, and the position of the target element on the screen is simulated and clicked to realize voice control in response to successful matching of the input text and the target element.

It should be noted that fig. 1 illustrates only a scenario in which embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but it does not mean that embodiments of the present disclosure may not be applied to other devices, systems, environments, or scenarios.

Fig. 2 is a flow chart of a voice control method according to one embodiment of the present disclosure.

As shown in fig. 2, the voice control method 200 includes operations S210 to S250.

In response to receiving the input speech, the input text corresponding to the input speech is matched with the first set of instruction text in operation S210.

For example, an input voice of a user is received, and voice recognition is performed on the input voice to obtain an input text.

The first instruction text set may include vehicle control type instructions, such as turn on an atmosphere light, turn off a window, turn on a wiper, turn off an interior light, turn on a bluetooth mode, turn on a HUD (Head Up Display), and the like. Such commands have a higher priority than what can be said to be a touch command. That is, the user's intent to issue such vehicle control type instructions is to control specific devices on the vehicle (e.g., windows, wipers, bluetooth phones, etc.), rather than to touch the screen.

Because the full-scene voice touch screen in the visible and i.e. the scene can contain elements such as atmosphere lamps, windows, bluetooth and the like, in order to avoid the situation that the vehicle control instructions are hit by the elements on the screen in the visible and i.e. the scene, the vehicle control instructions cannot be executed correctly, a blacklist can be set to intercept the vehicle control instructions.

The blacklist can contain text contents of the automobile control instructions, and can specifically contain verb texts and noun texts of various automobile control instructions. Verb text may include open, close, shut, etc. The noun text may include windows, atmosphere lights, bluetooth mode, etc.

After receiving the user voice, before matching with the visible and utterable page element, the input text recognized by the voice can be matched with the first instruction text set, various proper noun combinations in the blacklist can be matched one by one in the matching process, once the proper noun combinations in the blacklist are hit, the visible and utterable module can discard the matching result, and the input text of the user is sent to the vehicle control module to execute the corresponding vehicle control instruction.

The blacklist is used for intercepting the vehicle control instructions, so that the vehicle control instructions can be prevented from being seen and can be said to be hit, and the correct execution of the vehicle control instructions is ensured.

In operation S220, respective element texts of at least one element on the current page are determined in response to the input text not matching the first instruction text set successfully.

If the input text does not match the blacklist successfully, the input text may be matched to the page element.

The page elements may include text controls, switch controls, application icons, and the like. The text controls include text content, such as controls for setting the color of an atmosphere light, the text content of the controls being "atmosphere light". A control for setting the temperature or mode of the air conditioner, the text content of which is "air conditioner". The switch controls may include text content, buttons, or icons that are "on or off. For example, a switch control for turning on or off a mute mode, the text content of which is "mute mode". The application controls may include text content and application icons. For example, the application control for playing video, the text content may be the name of the video software.

In response to receiving the user's input speech, the elements on the current page may be scanned, obtaining the text content of each element as element text.

In operation S230, for each element text, the input text is matched with the second instruction text set according to the common text between the input text and the element text, so as to obtain a matching result for the element.

For each element text, a common text between the input text and the element text may be determined, the common text referring to characters commonly contained by the input text and the element text, and the common text may be a longest common character string commonly contained by the input text and the element text. For example, the input text may be "open drama", the element text may be "drama", and the common text of both is "drama".

If the length of the common text is greater than a threshold (e.g., 3), the fuzzy match is successful. The over recall problem for optimizing fuzzy matches follows. Further filtering may be performed using a preset whitelist.

The whitelist may include a second set of instruction text including verb text of the allowed instruction text other than the longest common text with the page element. For example, "open", "closed", "on", "off", "open", "closed", and the like.

Therefore, when matching the input text with the white list, the public text can be removed from the input text, the input text is split into at least one sub-text by taking the position of the public text as the interval, and then the sub-text is used for matching with the white list. If the sub-text is in the white list, the match is successful. The elements successfully matched through the white list can be used as candidate elements.

For example, the input text is "not to open XXX, close", the common text with the element text "XXX" is "XXX", the "XXX" is deleted from the input text, the remaining text is "not to open, close", and the remaining text can be split into two sub-texts of "not to open" and "close" with the position of the common text "XXX" as an interval. Using the two sub-texts to match the white list, e.g., the "close" sub-text is in the white list, the matching is successful and the page element corresponding to the element text "XXX" may be determined as a candidate element.

And the white list is used for storing the allowed action instructions in the visible and namely scene, and error instruction execution caused by fuzzy matching can be avoided through matching of the white list, so that the user experience is improved.

In operation S240, a target element is determined from the at least one element according to the respective matching result of the at least one element.

In operation S250, a control operation for the target element is performed.

For example, after matching the input text with all elements on the page is completed, at least one candidate element that matches successfully may be obtained, and a target element that most matches the current scene and the user's intent may be determined from at least one element based on the current application scene, the semantics of the input text, and the like.

After the target element is determined, the position of clicking the target element on the current screen can be simulated, and the effect of clicking the target element by voice control is realized.

According to the embodiment of the disclosure, the vehicle control type instructions are intercepted by using the first instruction text set (the blacklist), the action instructions which are not allowed in the visible and can be said scene are filtered by using the second instruction text set (the whitelist), the excessive recall problem of fuzzy matching can be optimized through matching of the blacklist and the whitelist, wrong instruction execution caused by excessive recall is avoided, and user experience is improved.

Fig. 3 is a flow chart of a method of determining candidate elements according to one embodiment of the present disclosure.

As shown in fig. 3, the voice control method includes operations S310 to S390.

In operation S310, an input voice is received, and voice recognition is performed on the input voice to obtain an input text.

In operation S320, the input text is matched with the blacklist.

The blacklist contains the vehicle control instructions with higher priority and can comprise a verb blacklist and a noun blacklist of the vehicle control instructions. When matching, all the blacklist noun combinations can be matched one by one.

In operation S330, it is determined whether the input text hits the blacklist. If not, operation S340 is performed, and if so, operation S390 is performed.

The vehicle control instructions can be intercepted by using the blacklist, so that the vehicle control instructions are prevented from being seen and can be said to be hit, and the correct execution of the vehicle control instructions is ensured.

In operation S340, a longest common text between the input text and the page element is determined, and the input text is split according to the longest common text, resulting in at least one sub-text.

For example, the longest common text in the input text may be replaced by a special symbol, and the rest text except the feature symbol in the input text is split at intervals of the special symbol to obtain at least one sub-text.

In operation S350, it is determined whether there are any more sub-texts that are not matched with the current white list, if yes, operation S360 is performed, otherwise, matching is completed, and the flow is ended.

The non-matching sub-text is matched with the whitelist in operation S360.

In operation S370, it is determined whether the current sub-text hits the white list, and if so, operation S380 is performed, otherwise, operation S350 is returned.

The sub-text may be some verbs such as open, close, etc. If the sub-text hits the white list, it is indicated that the action instruction in the input text is allowed.

In operation S380, the current element is determined as a candidate element.

In operation S390, the input text is transmitted to the vehicle control module to cause the vehicle control module to perform a vehicle control operation.

The present embodiment is a process of matching an input text with a current page element. The current page may contain multiple page elements, so that the input text needs to be matched with the multiple page elements one by one.

Fig. 4 is a flow chart of a method of determining candidate elements according to another embodiment of the present disclosure.

As shown in fig. 4, the voice control method includes operations S401 to S416.

In operation S401, an input voice is received, and voice recognition is performed on the input voice to obtain an input text Q1.

In operation S402, it is determined whether the input text Q1 is equal to a combination of the blacklist verb plus the blacklist noun. If so, the specification input text Q1 hits the blacklist, and operation S416 is performed. Otherwise, operation S403 is performed.

In operation S403, it is determined whether the input text Q1 is equal to a combination of a blacklist noun plus a blacklist verb. If so, the specification input text Q1 hits the blacklist, and operation S416 is performed. Otherwise, operation S404 is performed.

Operations S402 to S403 are steps of matching the input text Q1 with the blacklist. The blacklist may include a vehicle control class instruction set, and in particular, the blacklist may include a blacklist noun subset (i.e., noun text subset) and a blacklist verb subset (i.e., verb Wen Benzi set). Nouns of the vehicle control class instructions may be added to the blacklist noun sub-set and verbs of the vehicle control class instructions may be added to the blacklist verb sub-set.

Table 2 shows some of the noun text in the subset of blacklisted nouns.

TABLE 2

Table 3 shows part of the verb text in the blacklist verb sub-set.

TABLE 3 Table 3

The subset of blacklisted nouns as shown in table 2 and the text content in the subset of blacklisted verbs as shown in table 3 may be updated in real-time.

When blacklist matching is carried out, the verbs to be matched can be firstly taken out from the blacklist verb sub-set, and then the verbs to be matched and nouns in the blacklist noun sub-set are combined one by one to obtain a proper noun combined text set of the nouns before the verbs. Such as "turn on atmosphere lamp", "turn on HUD", etc. The input text Q1 is matched one by one with all the proper noun combined texts of the nouns preceding the verb. If the input text Q1 hits the proper noun combination text that is followed by any verb preceding noun, it may be determined that the input text Q1 hits the blacklist.

Under the condition that the input text Q1 does not hit the proper noun combination text of any preceding verb and following verb, the noun to be matched can be taken out from the blacklist noun sub-set, and then the noun to be matched and the verbs in the blacklist verb sub-set are combined one by one, so that the proper noun combination text set of the preceding verb and following verb of the noun is obtained. Such as "atmosphere light on", "HUD on", etc. The input text Q1 is matched one by one with all the proper noun combined texts of the noun preceding verb and the noun following verb. If the input text Q1 hits a proper noun combination text with any noun preceding verb followed, it may be determined that the input text Q1 hits a blacklist.

If the input text Q1 misses the proper noun combination text following any verb preceding and the proper noun combination text following any verb preceding is missed, it can be determined that the input text Q1 misses the blacklist.

In operation S414, it is determined whether there are more page elements that are not matched with the input text Q1. If so, operation S405 is performed. Otherwise, the matching of the input text Q1 of all page elements is finished, and the process is finished; or the current page has no page element to be matched with the input text Q1, and the process is ended.

In the event that it is determined that the input text Q1 does not hit the blacklist, the input text Q1 is matched with the page element. Since there may be one or more page elements, it is necessary to match the input text Q1 with at least one page element one by one.

In operation S405, a page element that has not been matched with the input text Q1 is fetched, the text content of the page element is taken as the current element text T1, and the longest common text between the input text Q1 and the current element text T1 is determined.

If the length of the longest common text is greater than a threshold (e.g., 3), it may be determined that the fuzzy match of the input text Q1 with the element text T1 is successful. Next, matching of the whitelist is performed.

In operation S406, the input text Q1 is split into at least one sub-text T2 according to the longest common text.

For example, in response to the length of the longest common text being greater than the threshold, the longest common text in the input text Q1 may be replaced with a special symbol, and then the remaining text other than the special symbol may be split at intervals of the special symbol, resulting in the sub-text T2. The sub-text T2 may have one or more.

In operation S407, it is determined whether the sub text T2 is empty. If so, operation S415 is performed. Otherwise, operation S408 is performed.

If the sub-text T2 is empty, it is stated that the input text Q1 is identical to the longest common text, the input text Q1 is completely contained in the element text or identical to the element text, which is the case of a full match. It may be determined that the input text Q1 hits the current page element.

In operation S408, it is determined whether there is any non-matching sub-text T2. If so, operation S409 is performed. Otherwise, it is indicated that all the sub-texts T2 are matched with the white list, and the operation returns to operation S404 to match with the next page element.

In case it is determined that the sub-text T2 is not empty, the sub-text T2 is matched with the whitelist. Since there may be one or more sub-texts T2, it is necessary to determine whether there are any non-matching sub-texts T2.

In operation S409, the unmatched current sub-text T2 is matched with the selection class whitelist.

The whitelist may include a selection class whitelist (selection class instruction text subset), an open class whitelist (open class instruction text subset), and a close class whitelist (close class instruction text subset). And taking out the unmatched sub-text T2 as the current sub-text T2, and matching the current sub-text T2 with the selected class white list.

The selection class whitelist includes, for example, verbs such as "select", "choose", and the like.

In operation S410, it is determined whether the current sub-text T2 hits the selection class white list. If so, operation S415 is performed, otherwise operation S411 is performed.

For example, the input text Q1 is "select drama" which is the longest common text matching the page element, "select" is the sub-text T2, "select" in the selection class whitelist, and thus it can be determined that the selection class whitelist is hit. The current element may be determined as a candidate element as a matching result (operation S415).

In case that the current sub-text T2 does not hit the selection class white list, operation S411 is performed.

In operation S411, the current sub-text T2 is matched with the open class whitelist.

The open class whitelist may include verbs of "open", and the like. The open class action is not limited to applications of switch class elements, and may be applied to any scenario, such as opening an application class element (e.g., XXX application icon), a text control class element (e.g., atmosphere light text control), etc. on a desktop.

In operation S412, it is determined whether the current sub-text T2 hits the open class whitelist. If yes, operation S415 is performed, otherwise operation S413 is performed.

In the case where the sub-text T2 hits the open class white list, it is also necessary to further determine a matching result according to the element type and state (operation S415). For example, in the case where the current element is of the switch type, the current element is determined as a candidate element as a matching result in response to the switch state being off. And if the switch state is in response to the on state, determining that the current matching is unsuccessful, and if the current matching result is null, not returning the candidate elements.

For example, the current element is the switch control of the mute mode and is in the off state, the sub text T2 is "on", the sub text T2 hits the open class white list, the success of the matching can be determined, the current element is determined as the candidate element, and the candidate element is used as the matching result.

For another example, the current element is the on-state switch control of the "mute mode", the sub text T2 is on, and the on-class white list is hit, but the simulated click of the "mute mode" switch control causes the "mute mode" to be turned off erroneously, so that if the current element is the on-state switch control, the current element is determined to be unsuccessful, and the candidate element is not returned, so that the erroneous off condition can be avoided.

In case that the sub-text T2 hits the open class white list, if the current element is not a switch type, since the open class instruction can support all types of elements (e.g., switches, icons, etc.), it can be determined that the current matching is successful, and the current element is determined as a candidate element as a matching result (operation S415).

In case that the sub-text T2 misses to open the class white list, operation S413 is performed.

In operation S413, the current sub-text T2 is matched with the closed class whitelist.

The close class whitelist may include verbs of "close", "close down", and the like.

In operation S414, it is determined whether the current sub-text T2 hits the closed class whitelist. If so, operation S415 is performed, otherwise operation S408 is returned.

In case that the sub-text T2 hits the closed class whitelist, it is also necessary to further determine a matching result according to the switch type and state (operation S415). For example, in the case where the current element is of the switch type, in response to the switch state being on, the element is determined as a candidate element as a matching result. And responding to the switch state as the off state, determining that the matching is unsuccessful, and not returning a matching result.

For example, the current element is the switch control of the mute mode and is in an on state, the sub text T2 is "off", the sub text T2 hits the off class white list, the successful matching can be determined, the element is determined as a candidate element, and the candidate element is used as a matching result.

For another example, the current element is the switch control of the "mute mode" and is in the off state, and the sub text T2 is "off", and although the class white list is hit to be closed, the "mute mode" is erroneously turned on by simulating clicking the "mute mode" switch control, so that the current matching is determined to be unsuccessful under the condition, and the matching result is not returned, so that the situation of being turned on by mistake can be avoided.

If the sub-text T2 does not hit the closed class whitelist, or if the switch state is off although the closed class whitelist is hit, the matching of the current sub-text T2 is ended. Returning to operation S408, it is determined whether there is still a next sub-text T2 to be matched.

In operation S416, the input text Q1 is transmitted to the vehicle control module to cause the vehicle control module to perform a vehicle control operation.

This step is performed in case the matching of the input text Q1 with the blacklist is successful. The successful matching of the input text Q1 with the blacklist indicates that the instruction corresponding to the input text is a car control instruction, and the user intends to control the car, not to touch the screen. Therefore, the input text is sent to the vehicle control module, and the vehicle control instruction is prevented from being seen and can be said to be hit, so that the correct execution of the vehicle control instruction is ensured.

In the embodiment, the input text is matched with at least one page element on the current page one by one, and at least one candidate element successfully matched is determined from the current page, so that the missing risk is avoided.

According to the embodiment, the blacklist is used for intercepting the vehicle control type instruction, and the whitelist is used for recalling at least one candidate element, so that the problem of excessive recall of fuzzy matching can be solved.

The hit results of this embodiment are shown in table 4 below, with respect to the false hit case of fuzzy matching.

TABLE 4 Table 4

As shown in table 4, the voice command of "turn on atmosphere lamp" is intercepted by the blacklist. The voice command of "off mute mode" is intercepted because the on-off state is off. The voice instruction "do not open XXX" is intercepted because "do not" is not in the white list.

Fig. 5 is a block diagram of a voice control apparatus according to one embodiment of the present disclosure.

As shown in fig. 5, the voice control apparatus 500 includes a first matching module 501, an element text determining module 502, a second matching module 503, a target element determining module 504, and a control module 505.

The first matching module 501 is configured to match, in response to receiving an input voice, an input text corresponding to the input voice with a first instruction text set.

The element text determining module 502 is configured to determine, in response to the input text not matching the first instruction text set successfully, an element text of each of the at least one element on the current page.

The second matching module 503 is configured to match, for each element text, the input text with the second instruction text set according to a common text between the input text and the element text, so as to obtain a matching result for the element.

The target element determining module 504 is configured to determine a target element from the at least one element according to a respective matching result of the at least one element.

The control module 505 is used to perform control operations for the target element.

The second matching module 503 includes a split sub-module and a first matching sub-module.

The splitting module is used for responding to the fact that the length of the common text is larger than a threshold value for each element text, and splitting the rest texts except the common text in the input text by taking the common text as an interval to obtain at least one sub-text.

The first matching sub-module is used for matching the sub-text with the second instruction text set for each sub-text, and determining the element corresponding to the element text as a candidate element as a matching result in response to successful matching of the sub-text with the second instruction text set.

According to an embodiment of the present disclosure, the second set of instruction text includes a selection class instruction text subset, an open class instruction text subset, and a close class instruction text subset. The first matching submodule comprises a matching unit, a first matching result determining unit, a second matching result determining unit and a third matching result determining unit.

The matching unit is used for matching the sub-text with the selected class instruction text subset, the opened class instruction text subset and the closed class instruction text subset respectively aiming at each sub-text.

The first matching result determining unit is used for determining an element corresponding to the element text as a candidate element as a matching result in response to successful matching of the sub-text and the selection class instruction text.

The second matching result determining unit is used for determining a matching result according to the type and the state of the element corresponding to the element text in response to successful matching of the sub-text and the open class instruction text subset.

The third matching result determining unit is used for determining a matching result according to the type and the state of the element corresponding to the element text in response to successful matching of the sub-text and the closing instruction text subset.

The second matching result determination unit includes a first type determination subunit, a first matching result determination subunit, and a second matching result determination subunit.

The first type determination subunit is configured to determine a type of an element corresponding to the element text.

The first matching result determining subunit is configured to determine a matching result according to the switching state of the element in response to the type of the element being a switching type.

The second matching result determination subunit is configured to determine the element as a candidate element as a matching result in response to the type of the element being a non-switching type.

A first matching result determining subunit, configured to determine a switching state of an element; in response to the switch state being off, the element is determined to be a candidate element as a result of the matching.

The third matching result determination unit includes a second type determination subunit and a third matching result determination subunit.

The second type determination subunit is configured to determine a type of an element corresponding to the element text.

The third matching result determining subunit is configured to determine a matching result according to the switching state of the element in response to the type of the element being a switching type.

A third matching result determination subunit, configured to determine a switching state of the element; in response to the switch state being on, the element is determined to be a candidate element as a result of the matching.

The speech control apparatus 500 further comprises a third matching module.

The third matching module is used for determining elements corresponding to the element texts as candidate elements as a matching result in response to the length of the common text being greater than a threshold value and the input text being the same as the common text.

The target element determination module 504 includes a candidate element determination sub-module and a target element determination sub-module.

The candidate element determination submodule is used for determining at least one candidate element from the matching result.

The target element determination submodule is used for determining a target element from at least one candidate element.

The first set of instruction text includes a set of verbs Wen Benzi and a subset of the text. The first matching module 501 includes a proper noun combination text set determination sub-module and a second matching sub-module.

And the proper noun combination text set determining sub-module is used for determining the proper noun combination text set according to the verb Wen Benzi set and the noun text sub-set.

The second matching sub-module is used for matching the input text with the noun combination text set.

According to an embodiment of the present disclosure, the proper noun combined text set includes at least one of a combined text of a verb preceding noun followed and a combined text of a noun preceding verb followed.

The first set of instruction text includes a vehicle control class instruction text, and the second set of instruction text includes an operation instruction associated with an element on the current page, the vehicle control class instruction having a higher priority than the operation instruction associated with the element on the current page.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in device 900 are connected to I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, such as a voice control method. For example, in some embodiments, the voice control method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the voice control method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the voice control method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A voice control method, comprising:

in response to receiving an input voice, matching an input text corresponding to the input voice with a first set of instruction texts;

determining respective element texts of at least one element on the current page in response to the input text not being successfully matched with the first instruction text set;

for each element text, matching the input text with a second instruction text set according to the common text between the input text and the element text to obtain a matching result for the element;

determining a target element from the at least one element according to the respective matching result of the at least one element; and

a control operation for the target element is performed.

2. The method of claim 1, wherein the matching the input text with a second set of instruction texts according to a common text between the input text and the element text for each element text, to obtain a matching result for the element comprises: for each of the text of the element,

Responding to the length of the public text being greater than a threshold value, and splitting the rest texts except the public text in the input text by taking the public text as an interval to obtain at least one sub-text; and

and matching the sub-text with the second instruction text set aiming at each sub-text, and determining an element corresponding to the element text as a candidate element as the matching result in response to successful matching of the sub-text with the second instruction text set.

3. The method of claim 2, wherein the second set of instruction text includes a selection of a subset of class instruction text, an opening of a subset of class instruction text, and a closing of a subset of class instruction text; the matching of the sub-text and the second instruction text set is performed for each sub-text, and in response to successful matching of the sub-text and the second instruction text set, the element corresponding to the element text is determined as a candidate element, and the matching result comprises: for each of the sub-texts,

matching the sub-text with the subset of the selected class instruction text, the subset of the opened class instruction text and the subset of the closed class instruction text respectively;

In response to successful matching of the sub-text and the selection class instruction text, determining an element corresponding to the element text as a candidate element as the matching result;

responding to successful matching of the sub-text and the open instruction text subset, and determining the matching result according to the type and state of the element corresponding to the element text; and

and responding to successful matching of the sub-text and the closing instruction text subset, and determining the matching result according to the type and the state of the element corresponding to the element text.

4. The method of claim 3, wherein said determining, in response to successful matching of the sub-text with the subset of open class instruction texts, the matching result according to the type and state of the element corresponding to the element text comprises:

determining the type of the element corresponding to the element text;

determining the matching result according to the switching state of the element in response to the type of the element being a switching type; and

and determining the element as a candidate element as the matching result in response to the type of the element being a non-switching type.

5. The method of claim 4, wherein the determining the match result from the switch state of the element in response to the type of the element being a switch type comprises:

Determining a switching state of the element; and

and in response to the switch state being off, determining the element as a candidate element as the matching result.

6. The method of claim 3, wherein said determining, in response to successful matching of the sub-text with the subset of closed class instruction texts, the matching result based on a type and a state of an element corresponding to the element text comprises:

determining the type of the element corresponding to the element text; and

and determining the matching result according to the switching state of the element in response to the type of the element being a switching type.

7. The method of claim 6, wherein the determining the match result from the switch state of the element in response to the type of the element being a switch type comprises:

determining a switching state of the element; and

and in response to the switch state being on, determining the element as a candidate element as the matching result.

8. The method of claim 1, further comprising:

for each element text, in response to the length of the common text being greater than a threshold and the input text being the same as the common text, determining an element corresponding to the element text as a candidate element as the matching result.

9. The method of any of claims 2 to 8, wherein the determining a target element from the at least one element according to the respective matching result of the at least one element comprises:

determining at least one candidate element from the matching result; and

the target element is determined from the at least one candidate element.

10. The method of claim 1, wherein the first set of instruction text comprises a set of verbs Wen Benzi and a subset of noun text; the matching, in response to receiving an input voice, input text corresponding to the input voice with a first set of instruction text includes:

determining a proper noun combined text set according to the verb Wen Benzi set and the noun text subset; and

and matching the input text with the proper noun combination text set.

11. The method of claim 10, wherein the proper noun combined text set includes at least one of a combined text of a verb preceding a noun followed by a noun, and a combined text of a noun preceding a verb followed by a verb.

12. The method of any of claims 1-11, wherein the first set of instruction text includes a vehicle control class instruction text, the second set of instruction text includes an operation instruction associated with an element on the current page, the vehicle control class instruction having a higher priority than the operation instruction associated with the element on the current page.

13. A voice control apparatus comprising:

the first matching module is used for responding to the received input voice and matching the input text corresponding to the input voice with a first instruction text set;

the element text determining module is used for determining respective element texts of at least one element on the current page in response to the fact that the input text is not successfully matched with the first instruction text set;

the second matching module is used for matching the input text with a second instruction text set according to the common text between the input text and the element text aiming at each element text to obtain a matching result aiming at the element;

the target element determining module is used for determining target elements from the at least one element according to the respective matching results of the at least one element; and

and the control module is used for executing control operation aiming at the target element.

14. The apparatus of claim 13, wherein the second matching module comprises:

the splitting module is used for responding to the fact that the length of the public text is larger than a threshold value aiming at each element text, and splitting the rest texts except the public text in the input text by taking the public text as an interval to obtain at least one sub-text; and

The first matching sub-module is used for matching the sub-text with the second instruction text set aiming at each sub-text, and determining an element corresponding to the element text as a candidate element as the matching result in response to successful matching of the sub-text with the second instruction text set.

15. The apparatus of claim 14, wherein the second set of instruction text comprises a selection of a subset of class instruction text, an opening of a subset of class instruction text, and a closing of a subset of class instruction text; the first matching submodule includes:

the matching unit is used for matching each sub-text with the sub-text, the sub-text is matched with the sub-set of the selected class instruction text, the sub-set of the opened class instruction text and the sub-set of the closed class instruction text respectively;

a first matching result determining unit, configured to determine, as a candidate element, an element corresponding to the element text in response to successful matching of the sub-text and the selection class instruction text, as the matching result;

a second matching result determining unit, configured to determine, in response to successful matching of the sub-text and the open instruction text subset, the matching result according to a type and a state of an element corresponding to the element text; and

And the third matching result determining unit is used for determining the matching result according to the type and the state of the element corresponding to the element text in response to successful matching of the sub-text and the closing instruction text subset.

16. The apparatus of claim 15, wherein the second match result determination unit comprises:

a first type determining subunit, configured to determine a type of an element corresponding to the element text;

a first matching result determining subunit, configured to determine, in response to the type of the element being a switch type, the matching result according to a switch state of the element; and

and the second matching result determining subunit is used for determining the element as a candidate element as the matching result in response to the type of the element being a non-switching type.

17. The apparatus of claim 16, wherein the first match result determination subunit is configured to determine a switch state of the element; and in response to the switch state being off, determining the element as a candidate element as the matching result.

18. The apparatus of claim 15, wherein the third match result determination unit comprises:

A second type determining subunit, configured to determine a type of an element corresponding to the element text; and

and the third matching result determining subunit is used for determining the matching result according to the switching state of the element in response to the type of the element being a switching type.

19. The apparatus of claim 18, wherein the third match result determination subunit is configured to determine a switch state of the element; and in response to the switch state being on, determining the element as a candidate element as the matching result.

20. The apparatus of claim 12, further comprising: and a third matching module, configured to determine, for each element text, an element corresponding to the element text as a candidate element as the matching result in response to the length of the common text being greater than a threshold value and the input text being the same as the common text.

21. The apparatus of any of claims 14 to 20, the target element determination module comprising:

a candidate element determination submodule for determining at least one candidate element from the matching result; and

a target element determination sub-module for determining the target element from the at least one candidate element.

22. The apparatus of claim 13, wherein the first set of instruction text comprises a set of verbs Wen Benzi and a subset of noun text; the first matching module includes:

a proper noun combined text set determining sub-module, configured to determine a proper noun combined text set according to the verb Wen Benzi set and the noun text subset; and

and the second matching sub-module is used for matching the input text with the proper noun combination text set.

23. The apparatus of claim 22, wherein the proper noun combined text set includes at least one of a combined text of a verb preceding a noun followed by a noun, and a combined text of a noun preceding a verb followed by a verb.

24. The apparatus of any of claims 13-23, the first set of instruction text comprising a vehicle control class instruction text, the second set of instruction text comprising an operation instruction associated with an element on the current page, the vehicle control class instruction having a higher priority than an operation instruction associated with an element on the current page.

25. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 12.

26. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 12.

27. A computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which, when executed by a processor, implements the method according to any one of claims 1 to 12.