CN116913272A

CN116913272A - Voice instruction matching method, device, equipment, vehicle and storage medium

Info

Publication number: CN116913272A
Application number: CN202310769078.9A
Authority: CN
Inventors: 欧阳能钧; 华鲸州; 刘嵘; 刘佳
Original assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Current assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-10-20

Abstract

The disclosure provides a voice instruction matching method, a device, equipment, a vehicle and a storage medium, relates to the technical field of data processing, and particularly relates to the field of voice recognition, the field of voice interaction and the like. The implementation scheme is as follows: acquiring a current field to be matched, wherein the current field to be matched is a field containing at least one word in a voice instruction; and matching the current field to be matched, and determining the next text of the current field to be matched in the voice instruction as a new current field to be matched under the condition that the current field to be matched is successfully matched. When matching the voice command, a plurality of fields to be matched in the same voice command can be matched, after the fields to be matched are successfully matched, the matched fields to be matched can be ignored in the next time, and a new round of matching can be directly performed from the characters behind the fields to be matched, so that the voice command can be matched and responded for a plurality of times, the success rate is high, and the user experience is improved.

Description

Voice instruction matching method, device, equipment, vehicle and storage medium

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to the field of speech recognition, the field of speech interaction, and the like, and in particular, to a method, an apparatus, a device, a vehicle, and a storage medium for matching speech instructions.

Background

Currently, with the popularization of intelligent automobiles, an on-board voice assistant has been changed from an early voice remote controller to a voice assistant of an automobile owner, and can conveniently initiate functions of navigation, song requesting and the like by voice. Meanwhile, the ecology in the vehicle is gradually enriched, but independent information islands are arranged among all applications, and the cost of the voice assistant for opening the ecology in the vehicle is high. To solve such problems, it is common practice in the industry to capture page elements of an ecological application and register them with a voice assistant, and once the voice triggers the same vocabulary, a corresponding click or slide effect is simulated near the position of the corresponding vocabulary by an action engine, so that the voice interaction mode is called as what can be said. How to improve the visual and so-called performance gradually becomes a research hot spot.

Disclosure of Invention

The disclosure provides a voice instruction matching method, a voice instruction matching device, voice instruction matching equipment, a vehicle and a storage medium.

According to an aspect of the present disclosure, there is provided a voice instruction matching method, including: acquiring a current field to be matched, wherein the current field to be matched is a field containing at least one word in a voice instruction; and matching the current field to be matched, and determining the next text of the current field to be matched in the voice instruction as a new current field to be matched under the condition that the current field to be matched is successfully matched.

According to another aspect of the present disclosure, there is provided a voice instruction matching apparatus including: the acquisition unit is used for acquiring a current field to be matched, wherein the current field to be matched is a field containing at least one word in the voice instruction; the matching unit is used for matching the current field to be matched, and determining the next text of the current field to be matched in the voice instruction as a new current field to be matched under the condition that the current field to be matched is successfully matched.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a vehicle including: an electronic device as claimed in any preceding claim.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present disclosure.

According to the voice instruction matching method, the device, the equipment, the vehicle and the storage medium, the current field to be matched is obtained, and the current field to be matched is a field containing at least one word in the voice instruction; and matching the current field to be matched, and determining the next text of the current field to be matched in the voice instruction as a new current field to be matched under the condition that the current field to be matched is successfully matched. When matching the voice command, a plurality of fields to be matched in the same voice command can be matched, after the fields to be matched are successfully matched, the matched fields to be matched can be ignored in the next time, and a new round of matching can be directly performed from the characters behind the fields to be matched, so that the voice command can be matched for a plurality of times, and a plurality of operation commands are responded according to the result of the matching for a plurality of times, namely, a plurality of times of response operation is realized, the matching success rate is higher, and the user experience is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of a system to which a voice command matching method according to an embodiment of the present disclosure is applied;

FIG. 2 is a flow chart of a voice instruction matching method provided in accordance with an embodiment of the present disclosure;

FIG. 3 is a flow chart of a voice instruction matching method provided in accordance with yet another embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a voice command matching method provided according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a voice command matching method provided in accordance with yet another embodiment of the present disclosure;

FIG. 6 is a block diagram of a voice command matching apparatus provided in accordance with an embodiment of the present disclosure;

fig. 7 is a block diagram of an electronic device for implementing a voice instruction matching method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The embodiment of the disclosure provides a voice instruction matching method, a voice instruction matching device, voice instruction matching equipment, a vehicle and a storage medium. Specifically, the voice command matching method of the embodiment of the disclosure may be executed by an electronic device, where the electronic device may be a terminal or a server. The terminal can be smart phones, tablet computers, notebook computers, intelligent voice interaction equipment, intelligent household appliances, wearable intelligent equipment, aircrafts, intelligent vehicle-mounted terminals and other equipment, and the terminal can also comprise a client, wherein the client can be an audio client, a video client, a browser client, an instant messaging client or an applet and the like. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms, and the like.

In the related art, when the speech of the user is recognized in the what you see is what you say is the way, the speech engine needs to detect the tail point of the speech to send back ASR (Automatic Speech Recognition ) text to the what you say is the module to execute, so when the user speaks a plurality of page elements on the screen continuously, only one of them will be matched, that is, only one operation instruction in the speech instruction can be responded by one interaction, even one cannot be successfully matched, that is, the interaction cannot be completed. Therefore, the user has to match the voice recognition module when using, recite a word, pause for the word to match successfully, and recite the next word, so the user experience is poor.

In addition, although the voice detection in the related art solves the problem of wide application, the method is also limited by the speed of the voice recognition terminal itself because of waiting for detecting the voice tail point, each interaction is usually 2-3 seconds, the speed of voice interaction is slow, and the efficiency is low.

In order to solve at least one of the above problems, the present disclosure provides a voice instruction matching method, apparatus, device, vehicle and storage medium, where a current field to be matched is a field containing at least one text in a voice instruction by acquiring the current field to be matched; and matching the current field to be matched, and determining the next text of the current field to be matched in the voice instruction as a new current field to be matched under the condition that the current field to be matched is successfully matched. When matching the voice command, a plurality of fields to be matched in the same voice command can be matched, after the fields to be matched are successfully matched, the matched fields to be matched can be ignored in the next time, and a new round of matching can be directly performed from the characters behind the fields to be matched, so that the voice command can be matched for a plurality of times, and a plurality of operation commands are responded according to the result of the matching for a plurality of times, namely, a plurality of times of response operation is realized, the matching success rate is higher, and the user experience is improved.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic structural diagram of a system to which a voice instruction matching method according to an embodiment of the present disclosure is applied. Referring to fig. 1, the system includes a terminal 110, a server 120, and the like; the terminal 110 and the server 120 are connected through a network, for example, a wired or wireless network connection.

Wherein the terminal 110 may be used to display a graphical user interface. The terminal is used for interacting with a user through a graphical user interface, for example, the terminal downloads and installs a corresponding client and operates, for example, the terminal invokes a corresponding applet and operates, for example, the terminal presents a corresponding graphical user interface through a login website, and the like. The terminal 110 may be provided with a plurality of control controls, the user may read the names of the control controls through a voice command, and the terminal 110 may click or select a corresponding control according to the voice command. In the embodiment of the present disclosure, the terminal 110 may obtain a current field to be matched, where the current field to be matched is a field containing at least one text in a voice instruction; and matching the current field to be matched, determining the next text of the current field to be matched in the voice instruction as a new current field to be matched under the condition that the current field to be matched is successfully matched, and performing corresponding response operation, such as clicking or selecting a corresponding control. The server 120 may identify the voice command, obtain a current field to be matched, and send the field to be matched to the terminal 110.

Although the recognition of the voice command by the server 120 is described as an example, the recognition of the voice command may be performed by the terminal 110. The application may be an application installed on a desktop, an application installed on a mobile terminal, an applet embedded in an application, or the like.

It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principles of the present disclosure, and embodiments of the present disclosure are not limited in any way in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.

The following is a detailed description. It should be noted that the following description order of embodiments is not a limitation of the priority order of embodiments.

FIG. 2 is a flow chart of a voice instruction matching method provided in accordance with an embodiment of the present disclosure; referring to fig. 2, an embodiment of the disclosure provides a voice command matching method, which includes the following steps S201 to S202.

Step S201, a current field to be matched is obtained, wherein the current field to be matched is a field containing at least one text in a voice instruction.

Step S202, matching the current field to be matched, and determining the next text of the current field to be matched in the voice instruction as a new current field to be matched under the condition that the current field to be matched is successfully matched.

The voice command can be a voice command continuously spoken by the user, and the voice command can be detected in real time or in whole. Taking the complete voice command as a common skylight lamplight atmosphere lamp as an example, the voice command can be a complete voice command waiting for one-time acquisition of a voice tail point, namely the common skylight lamplight atmosphere lamp. Alternatively, the voice command may be acquired in real-time each time a word is detected, in which case, the voice command can be sequentially 'normal', 'common day' and the like according to the time sequence "common skylight", "common skylight lamp", "common skylight light") the voice command can be updated in real time according to time.

The field to be matched may be a field of one or more consecutive words in the voice instruction. The voice instruction can comprise a plurality of different fields to be matched, and the field to be matched which is required to be matched currently is the current field to be matched.

The method for obtaining the field to be matched currently may be described in the following embodiments, which are not described here.

After the current field to be matched is obtained, the current field to be matched can be matched, taking the current field to be matched as a common example, if the current display interface of the terminal can be provided with a common control (the name is common), the common can be matched with the name of the control in the current display interface during matching, and at the moment, the fact that the common name of the current field to be matched is consistent with the name of the common control is found, namely, the fact that the current field to be matched is successfully matched is indicated.

At this time, a word located after "usual" in the voice command may be used as a new current field to be matched, and a new match may be performed on the new current field to be matched. Taking a complete voice instruction of 'common skylight lamplight atmosphere lamp' as an example, after 'common' is successfully matched, the 'day' after 'common' can be used as a new field to be matched currently. And then matching the new current field to be matched 'day', namely, the successfully matched field can be ignored in a new round of matching, so that the situation that a plurality of characters (common days) are connected together to cause matching failure is avoided, and the success rate of multiple rounds of matching in a voice instruction is improved.

It can be understood that the field to be matched successfully can be used as an operation instruction to operate the terminal, and the new current field to be matched and the current field to be matched are different fields in a voice instruction. In this embodiment, since a plurality of fields to be matched can be matched in one voice command, and after the current field to be matched is successfully matched, the next round of matching can ignore the previous successfully matched field to be matched, so that the influence of the previous field on the next round of matching result can be reduced, under the condition of realizing multiple times of matching on one voice command, the success rate of matching can be improved, and further, the user can execute multiple times of response operation through one voice command, and the user experience is improved.

FIG. 3 is a flow chart of a voice instruction matching method provided in accordance with yet another embodiment of the present disclosure; referring to fig. 3, in some embodiments, a voice command matching method 300 is further provided, which includes the following steps S301 to S304.

Step S301, a current field to be matched is obtained, wherein the current field to be matched is a field containing at least one text in a voice instruction.

Step S302, matching the current field to be matched, and determining the next text of the current field to be matched in the voice instruction as a new current field to be matched under the condition that the current field to be matched is successfully matched.

Step S303, under the condition that the matching of the current field to be matched fails, determining the initial character of the current field to be matched as the initial character of the new current field to be matched, and determining the next character of the current field to be matched in the voice instruction as the end character of the new current field to be matched.

Step S304, determining a new current field to be matched based on the initial text of the new current field to be matched and the end text of the new current field to be matched.

The implementation manner of step S301 and step S302 is the same as that of step S201 and step S202, and specific reference may be made to the above-mentioned embodiments.

In step S303, under the condition that the matching of the current field to be matched fails, the initial text of the current field to be matched is used as a new initial text, then the next text of the current field to be matched is used as a new end text, then a field from the new initial text to the new end text is obtained from the voice command, the field is used as a new current field to be matched, and then the next round of matching is performed on the new current field to be matched, so that the matching is performed on the fields to be matched in the voice command until the tail point of the voice command is detected.

Taking a complete voice instruction of 'common skylight lamplight atmosphere lamp' as an example, a current display interface of the terminal can be provided with a common control (with a common name), if a field to be matched is 'normal', when matching, the current field to be matched cannot be matched with the name (common name) of the control in the current display interface because 'normal' can not be matched, and at the moment, the fact shows that the matching of the current field to be matched fails.

Then, the initial word (i.e. "normal") of the current to-be-matched instruction can be used as a new initial word, then the next word "use" of the current to-be-matched field "normal" can be used as a new ending word, and the new current to-be-matched field from "normal" to "use" can be determined from the voice instruction, i.e. "normal". The next round of matching is then performed using "usual".

It can be understood that, in the case that the current field to be matched has only one text, the current field to be matched, the start text of the current field to be matched, and the end text of the current field to be matched are the same.

In this embodiment, under the condition that the matching of the current field to be matched is unsuccessful, the current field to be matched can be reserved, and a new current field to be matched is generated by combining the current field to be matched with the next text of the current field to be matched, so that any text in the voice instruction can not be omitted, one voice instruction can include a plurality of operation instructions, multiple response operations are realized, voice efficiency is improved, and user experience is improved.

In some embodiments, the obtaining the current field to be matched in step S201 may include: acquiring current characters in the voice instruction in real time; determining a current field to be matched corresponding to the current text, wherein the ending text of the current field to be matched is the current text, and the starting text of the current field to be matched is the first text in the voice instruction or the next text of the last successfully matched field to be matched in the voice instruction.

In this embodiment, the voice command may be updated in real time according to the time sequence, and the user may recognize the received voice command in real time while speaking the voice command, so that the voice command may be recognized word by word in real time, that is, each time a word of the voice command is recognized, the word may be used as the current word.

It will be appreciated that the current field to be matched may include one or more words, with the last word, i.e., the end word, being the current word. The first word of the current field to be matched, i.e. the initial word, may be selected to be different according to the situation, for example, the first word in the voice command may be the first word, or it may be the next word of the field to be matched that is successfully matched last time.

The selection of the initial text of the current field to be matched can be explained as follows.

In some embodiments, the method 200 may further comprise: and under the condition that a field to be matched successfully in the last time exists before the current text, determining the initial text of the current field to be matched as the next text of the field to be matched successfully in the last time in the voice instruction. Or if the field to be matched which is successfully matched for the last time does not exist before the current word, determining the initial word of the field to be matched for the last time as the first word in the voice instruction.

Taking the complete voice command "common skylight" as an example, the current display interface of the terminal may have a plurality of controls including a common control (named common), a skylight control (named skylight).

When the voice command is acquired, according to the time sequence, the current word acquired for the first time is "normal", since "normal" is the first word of the voice command, no word exists before the first word, and no field to be matched which is successfully matched for the last time exists, at this time, the initial word of the current field to be matched can be the first word "normal" in the voice command, and since the end word of the current field to be matched, namely, the current word or the "normal", the current field to be matched in the voice command is "normal", namely, the initial word and the end word of the current field to be matched are the same.

Then, the "normal" can be matched, and as the "normal" cannot be matched with the control of the current display interface, namely, the matching fails, under the condition that the matching fails, the initial character "normal" of the current field to be matched can be used as the initial character of the new current field to be matched. Meanwhile, a new current word (namely, the next word of the current field to be matched) can be obtained, namely, the "use" in the voice instruction is used as the ending word of the new current field to be matched. The new current field to be matched is therefore "usual". Moreover, since the field to be matched which is successfully matched last time does not exist before the new current word is used, the initial word of the new current field to be matched which is commonly used is still the first word in the voice instruction.

Then, the "common" can be used as the current field to be matched, and the matching is performed, because the "common" can be matched with the common control of the current display interface, namely, the matching is successful. At this time, a new current word (the next word of the current field to be matched), i.e., "day", may be acquired. The current field to be matched can then be ignored as "usual" with its next word "day" as the new current field to be matched. That is, since the current word "day" is preceded by the field to be matched "frequently used" which is successfully matched last time, the initial word of the new current field to be matched is the next word "day" of "frequently used", the end word of the new current field to be matched is the new current word "day", that is, the initial word and the end word of the new current field to be matched are the same, and both the initial word and the end word are "days", that is, the new current field to be matched is "day".

And then the "day" is used as the current field to be matched for matching. Because the "day" cannot be matched with the control of the current display interface, namely the matching fails, under the condition of the matching failure, the initial word "day" of the current field to be matched can be used as the initial word of the new current field to be matched. Meanwhile, a new current character (namely, the next character of the current field to be matched) can be acquired, namely, a window in the voice instruction is acquired, and the window is used as the ending character of the new current field to be matched. The new current field to be matched is therefore "skylight". And, because there is a field to be matched that has been matched successfully last time before the new current word "window", the initial word of the new current field to be matched "skylight" is still the next word "day" of the field to be matched "commonly used" that has been matched successfully last time.

Step S301 to step S304 are repeatedly performed until a voice tail point is detected, i.e. the voice command ends.

By determining the initial text of the current field to be matched according to the successful condition of the last matching, a corresponding current field to be matched can be generated for each current text in the voice instruction, any text in the voice instruction is not omitted, the voice instruction can contain operation instructions of a plurality of operations, multiple response operations are realized, voice efficiency is improved, and user experience is improved.

According to the embodiment, each current word of the voice command can be identified in real time, real-time matching is performed based on the current field to be matched corresponding to the current word, no tail point of the voice command is required to be waited, and matching can be performed in real time, namely, in the process of acquiring the voice command of a user, matching of the field to be matched is performed simultaneously, the voice response speed is high, and the efficiency is high. The user does not need to recite one word, pauses, waits for the word to be successfully matched, recites the next word, and user experience is improved.

In some embodiments, the current field to be matched has a target tag for indicating the starting text of the current field to be matched; in step S302, when the matching of the current field to be matched is successful, determining the next text of the current field to be matched in the voice instruction as a new current field to be matched, including: under the condition that the current field to be matched is successfully matched, updating the position of the target mark so that the updated target mark is used for indicating the next text of the current field to be matched; and determining the text indicated by the updated target mark as a new current field to be matched.

It will be appreciated that the start word of the field to be matched may be determined by means of an indication of a target mark, for example, the current field to be matched includes a plurality of words, and the target mark indicates the start word of the current field to be matched. Under the condition that the current field to be matched has only one word, the current field to be matched is the initial word of the current field to be matched, and the target mark can still indicate the initial word of the current field to be matched.

The target mark may be a pointer that can be moved or a numerical mark that can be changed, etc. The pointer may point to the initial word of the current field to be matched, or the indication position may be indicated below the initial word or before the initial word, i.e. the position between two adjacent words.

If each word in the voice command can be sequentially stored in the array, the target mark can also be a number mark of each word in the array, for example, the target mark is 1, which can indicate that the initial word of the current field to be matched is the first word in the array, and the target mark is 3, which indicates that the initial word of the current field to be matched is the third word in the array (voice command).

In addition, the initial position of the target mark may be the first text in the instruction, and the initial position of the target mark may be the position indicated by the target mark when no field to be matched is determined.

After the current field to be matched is successfully matched, the target mark can be updated, so that the target mark is changed from the initial character indicated in the current field to be matched to the next character indicated in the current field to be matched, the character indicated by the updated target mark is used as the new current field to be matched when the new current field to be matched is determined, and the character indicated by the updated target mark is also the initial character of the new current field to be matched because the new current field to be matched only contains one character when the matching is successful, namely the target mark can always indicate the initial character of the current field to be matched, so that the initial character of the current field to be matched can be determined by using the target mark when the current field to be matched is determined.

In addition, in the case of acquiring the current text in the voice instruction in real time, the new current field to be matched may be determined as a field between the text indicated by the updated target mark and the new current text.

In this embodiment, the initial text of the field to be matched is determined by the target mark, and after each successful matching of the current field to be matched, the initial text of the field to be matched (the new current field to be matched) to be matched in the next round can be determined only by moving the position of the target mark. The initial characters of the fields to be matched are the characters indicated by the target marks each time. Because the target mark is only an indication effect, the content of each initial character is not required to be recorded, so that the memory occupation is less, the realization is more convenient, and the response operation to the voice instruction is more convenient and quick.

In some embodiments, in step S303, in the case that the matching of the current field to be matched fails, determining the initial text of the current field to be matched as the initial text of the new current field to be matched includes: and under the condition that the matching of the current field to be matched fails, keeping the position of the target mark unchanged, and determining the character indicated by the target mark as the initial character of the new current field to be matched.

It can be appreciated that, in the case that the matching of the current field to be matched fails, the position of the target mark may remain unchanged, that is, the indicated text is unchanged, so that the initial text of the new current field to be matched is still the position indicated by the target mark, that is, the initial text of the new current field to be matched is the same as the initial text of the current field to be matched.

In this embodiment, the initial text of each field to be matched in the voice command can be determined by the position of the target mark, so that the implementation is more convenient, the occupied memory is less, and the quick response operation to the voice command is more facilitated.

FIG. 4 is a schematic diagram of a voice command matching method provided according to an embodiment of the present disclosure; referring to fig. 4, taking a voice command of "common skylight light atmosphere lamp" as an example, the current display interface may have multiple controls such as a common control, a skylight control, a light control, and an atmosphere lamp control.

The target mark may be a pointer, which may be indicated between two letters, or before the first letter, so that the target mark may be indicated after the first letter of the target mark.

The initial position of the pointer is s0, the first current field to be matched of the voice instruction is "constant", and the pointer position is still s0 because the first field to be matched cannot be successfully matched.

The second current field to be matched is the field "commonly used" from s0 to "used", since "commonly used" matches the name of the commonly used control, at which point the pointer moves from s0 to s2.

The third current field to be matched is the field "day" from s2 to "day", since it cannot be successfully matched, the pointer position is still s2.

The fourth currently to-be-matched field is the field "skylight" between s2 and "window", since "skylight" matches the name of the skylight control, the pointer is moved from s2 to s4.

Similarly, the fifth current field to be matched is "lamp", and the pointer position is unchanged and remains s4. The sixth current field to be matched is "light", and the pointer position is moved from s4 to s6. The seventh current field to be matched is "atmosphere", and the pointer position is unchanged and still s6. The eighth current field to be matched is "atmosphere" and the pointer position is unchanged and still s6. The ninth field to be currently matched is "atmosphere light", and the pointer position is moved from s6 to s9.

In other embodiments, in the case that the matching of the current field to be matched is successful in step S302, determining the next text of the current field to be matched in the voice instruction as the new current field to be matched includes: deleting the current field to be matched from the voice character set under the condition that the current field to be matched is successfully matched, so as to update the voice character set, wherein the voice character set is a set for recording characters in voice instructions; and determining the first word of the updated phonetic word set as a new current field to be matched.

The voice command can be stored in a voice word set after being recognized, the voice command set can be updated in real time along with the voice command, and when the voice command is converted into a text, the text can be stored in the voice word set when one text is recognized. The phonetic text set may be in the form of an array or stack, etc.

It can be appreciated that when the current field to be matched is successfully matched, the current field to be matched can be deleted from the phonetic text set, so that the first text in the phonetic text set is the new current field to be matched. And under the condition of successful matching, only one character exists in the new current field to be matched, and the first character in the voice character set is also the initial character of the new current field to be matched.

In this embodiment, under the condition that the matching of the current field to be matched is successful, the current field to be matched can be deleted from the voice text set, so that the first text in the voice text set can be always matched with the next (new) current field to be matched after the successful matching, the memory space can be saved, and the quick response operation to the voice instruction can be realized more conveniently.

In some embodiments, in step S303, in the case that the matching of the current field to be matched fails, determining the initial text of the current field to be matched as the initial text of the new current field to be matched, and determining the next text of the current field to be matched in the voice instruction as the end text of the new current field to be matched includes: under the condition that the matching of the current field to be matched fails, keeping the first character of the voice character set unchanged, and determining the first character of the voice character set as the initial character of the new current field to be matched; and determining the next character of the current field to be matched in the voice character set as the ending character of the new current field to be matched.

It can be appreciated that in the case of failure in matching the current field to be matched, the first word in the phonetic word set may remain unchanged, and the first word in the phonetic word set is the starting word of the new current field to be matched.

Because the voice command can be stored in the voice word set, the next word of the current field to be matched in the voice command is the next word of the current field to be matched in the voice word set.

Therefore, the first word in the voice word set can be the initial word of the current field to be matched under the condition of successful matching or failed matching, so that each time the initial word of the current field to be matched is determined, the first word in the voice word set can be obtained, the method is simpler, the words in the voice word set can be deleted along with successful matching, the voice word set occupies less memory, and the quick operation response of voice is facilitated.

FIG. 5 is a schematic diagram of a voice command matching method provided in accordance with yet another embodiment of the present disclosure; referring to fig. 5, taking a voice command of "common skylight light atmosphere lamp" as an example, the current display interface may have multiple controls such as a common control, a skylight control, a light control, and an atmosphere lamp control. Taking the example that the voice text set is updated in real time according to the voice command, in fig. 5, the text sets in the voice text set are updated according to the time sequence sequentially from top to bottom.

The first word of the voice command is "normal" received by the voice word set, the voice word set is "normal" at this time, the current field to be matched is "normal" at this time, and the first word of the voice word set is still "normal" at this time because the field to be matched cannot be successfully matched.

The second word "used" of the voice command is received by the voice word set, the voice word set is "commonly used" at this time, and the field to be matched is the field "commonly used" between the first word "commonly used" and "commonly used" in the voice word set at this time.

The third word "day" of the voice command is received by the voice word set, and the "common" is deleted, so that the voice word set is "day" at this time, the current field to be matched is "day", and the first word "day" in the voice word set is kept unchanged because the field to be matched cannot be successfully matched.

The fourth word window of the voice command is received by the voice word set, the voice word set is a skylight because the day is unchanged, the field to be matched is a field skylight between the first word day of the voice word set and the window, and the skylight is deleted from the voice word set because the matching can be successfully performed.

Similarly, the phonetic text set receives the fifth text "light", and the phonetic text set is "light" at this time, and the "light" is used as the current field to be matched, and the first text "light" of the phonetic text set remains unchanged when the matching fails.

The phonetic text set receives the sixth text light, the phonetic text set is the light at this time, the light is used as the field to be matched currently, the matching is successful, and then the light is deleted from the phonetic text set.

The voice word set receives a seventh word atmosphere, the voice word set is the atmosphere at this time, the atmosphere is used as a field to be matched currently, the matching is failed, and the first word atmosphere of the voice word set is unchanged.

The voice character set receives the eighth character 'surrounding', the voice character set is 'atmosphere', the 'atmosphere' is used as a field to be matched currently, the matching fails, and the first character 'atmosphere' of the voice character set is unchanged.

The voice text set receives a ninth text "lamp", the voice text set is an "atmosphere lamp" at this time, the "atmosphere lamp" is used as a field to be matched currently, matching is successful, and then the "atmosphere lamp" is deleted from the voice text set.

In some embodiments, the matching of the current field to be matched in step S202, where the matching of the current field to be matched is successful, determining the next text of the current field to be matched in the voice instruction as the new current field to be matched may include the following steps: matching the current field to be matched with names of a plurality of operation controls in the display interface; under the condition that the current field to be matched is successfully matched with the names of the target operation controls in the plurality of operation controls, selecting the target operation controls; and determining the next text of the current field to be matched in the voice instruction as a new current field to be matched.

It can be understood that the display interface may be a current display interface of the terminal, where the current display interface may include a plurality of operation controls, and a user may simulate touch control or mouse operation through voice to operate the corresponding operation controls, that is, the user may implement the selection operation of the operation controls by reciting names of the corresponding operation controls.

For the current display interface, the names of all operation controls of the current display interface can be acquired first, and the names can be displayed on the corresponding operation controls in a text mode, so that a user can conveniently think of the names. The target operation control is an operation control with names which can be successfully matched with the current field to be matched in the operation controls.

For example, the current field to be matched is "commonly used", and the matched target operation control is a commonly used control. After the current field to be matched is obtained and the common field is obtained, names of all operation controls of the current display interface are traversed to be matched, at the moment, the current field to be matched is successfully matched with the common control, after the matching is successful, the terminal can click on the corresponding common control, and the common menu page is opened, so that the visual and speaking effect is realized.

It will be appreciated that matching may refer to the current field to be matched "commonly" being in complete agreement with the name of the commonly used control, and in other embodiments, matching may also refer to the current field to be matched containing the name of the target operation control as well as other operation words. For example, the current field to be matched is "click commonly used", i.e., includes both the name "commonly used" of the target operation control and the operation word "click" of the execution operation of the target operation control.

In some embodiments, the operation of the target operation control and the detection and matching of the voice instruction can be processed by different threads, so that the processing and voice response speed can be improved, and the user experience is further improved.

According to the embodiment, the selection operation is performed on the target operation control corresponding to the current field to be matched successfully, so that a user can quickly realize various operations without manually operating the terminal, and convenience and use feeling of the terminal are improved.

In some embodiments, the display interface is a display interface of an in-vehicle terminal. It can be appreciated that the method 200 can be used for a vehicle-mounted terminal, and the vehicle-mounted terminal is not movable due to the fixed position of the vehicle-mounted terminal, and the user is inconvenient to manually operate the vehicle-mounted terminal during driving.

In one embodiment, as shown in fig. 4, the speech command matching method can achieve the very fast matching, and the following are the results shown by ASR temporary on-screen text at different times:

"often"

"commonly used"

"commonly used day"

"usual skylights"

"usual skylight Lamp"

"usual skylight light"

"usual skylight light atmosphere"

"light atmosphere for common skylight"

"light atmosphere lamp for common skylight"

The voice instruction matching method comprises the following specific implementation steps:

If there are three buttons (controls) on the display interface, namely, a common skylight, and a lamplight, if the user voice command is switched back and forth between the three buttons quickly, the VAD (Voice activity detection ) is not cut off, and a Query (Query sentence) can be stuck together, for example, a Query is continuously input, and the common skylight lamplight skylight is commonly used.

If the extremely fast matching strategy is not available, the results of other strategies in the related technology are not hit, namely the results of the common skylight light cannot be hit.

The matching rules of the voice instruction provided in this embodiment are different, and it can match intermediate results, and each time the matching hits (succeeds), the hit field (the field to be matched currently) is deleted.

For example, when a "normal" (field to be matched currently) word is received, simple and quick full-match scanning is performed on the registered interface control, and if the [ normal ] button is not available on the interface, the intermediate result gives up hit (matching failure).

When two words of "commonly used" (currently to be matched fields) are received, the two words are matched with the [ commonly used ] button on the interface, and then clicking the [ commonly used ] button is simulated immediately.

Then a virtual index (target mark) may be set to s2, which indicates that the first two words of the Query have been used, the next ASR intermediate result is "day commonly used" (the voice command is currently "day commonly used"), and the remaining words minus this virtual index are "day" (i.e., the word preceding the target mark is ignored, and the current field to be matched is determined to be "day" starting from the next word indicated by the target mark). When the ASR intermediate result is "usual skylights", the remainder of subtracting the virtual index is "skylights" (i.e., ignoring the text preceding the target mark, determining that the current field to be matched is "skylights" from the next text indicated by the target mark), then the ASR intermediate result is matched with the [ skylights ] button, so that the [ skylights ] are clicked immediately.

The virtual index is then set to s4, and so on.

In this embodiment, multiple results are matched through one voice interaction, that is, one voice command can be matched to multiple page elements (controls), multiple response operations are performed, multiple voice recognition commands do not need to be waited, the speed is high, and the single matching is within 10ms by adopting the above-mentioned extremely fast matching strategy.

In addition, in the voice recognition, the method can be said (matched) through quick finding of multiple responses, so that a vehicle owner can control the vehicle machine very sensitively when using the vehicle-mounted voice assistant, and the voice can control the vehicle machine as sensitively as a hand.

It can be appreciated that, in the related art, the speech engine needs to detect the tail point of the speech to transmit the ASR text back to the viewer, so that if the user continuously speaks a plurality of elements on the screen, only one of the elements is hit, and even none of the elements is hit. The user has to coordinate with the voice VAD, recite one word, pause, and recite the next, which is a very bad experience.

In the embodiment, before the voice engine detects the tail point, the voice temporary on-screen result is matched completely, so that the voice tail point is not required to be waited, the hit execution speed is very high, and the method is simple and easy to implement.

FIG. 6 is a block diagram of a voice command matching apparatus provided in accordance with an embodiment of the present disclosure; referring to fig. 6, an embodiment of the disclosure further provides a voice command matching apparatus 600, which includes the following units.

The obtaining unit 601 is configured to obtain a current field to be matched, where the current field to be matched is a field containing at least one text in the voice instruction.

The matching unit 602 is configured to match the current field to be matched, and determine a next text of the current field to be matched in the voice instruction as a new current field to be matched if the current field to be matched is successfully matched.

In some embodiments, the matching unit 602 is further configured to: under the condition that the matching of the current field to be matched fails, determining the initial character of the current field to be matched as the initial character of the new current field to be matched, and determining the next character of the current field to be matched in the voice instruction as the ending character of the new current field to be matched; and determining the new current field to be matched based on the initial text of the new current field to be matched and the end text of the new current field to be matched.

In some embodiments, the current field to be matched has a target tag therein for indicating a start word of the current field to be matched; the matching unit 602 is further configured to: under the condition that the current field to be matched is successfully matched, updating the position of the target mark so that the updated target mark is used for indicating the next text of the current field to be matched; and determining the text indicated by the updated target mark as a new current field to be matched.

In some embodiments, the matching unit 602 is further configured to: and under the condition that the matching of the current field to be matched fails, keeping the position of the target mark unchanged, and determining the character indicated by the target mark as the initial character of the new current field to be matched.

In some embodiments, the matching unit 602 is further configured to: deleting the current field to be matched from the voice character set under the condition that the current field to be matched is successfully matched, so as to update the voice character set, wherein the voice character set is a set for recording characters in voice instructions; and determining the first word of the updated phonetic word set as a new current field to be matched.

In some embodiments, the matching unit 602 is further configured to: under the condition that the matching of the current field to be matched fails, keeping the first character of the voice character set unchanged, and determining the first character of the voice character set as the initial character of the new current field to be matched; and determining the next character of the current field to be matched in the voice character set as the ending character of the new current field to be matched.

In some embodiments, the obtaining unit 601 is further configured to: acquiring current characters in the voice instruction in real time; determining a current field to be matched corresponding to the current text, wherein the ending text of the current field to be matched is the current text, and the starting text of the current field to be matched is the first text in the voice instruction or the next text of the last successfully matched field to be matched in the voice instruction.

In some embodiments, the matching unit 602 is further configured to: matching the current field to be matched with names of a plurality of operation controls in the display interface; under the condition that the current field to be matched is successfully matched with the names of the target operation controls in the plurality of operation controls, selecting the target operation controls; and determining the next text of the current field to be matched in the voice instruction as a new current field to be matched.

In some embodiments, the display interface is a display interface of an in-vehicle terminal.

For descriptions of specific functions and examples of each module and sub-module of the apparatus in the embodiments of the present disclosure, reference may be made to the related descriptions of corresponding steps in the foregoing method embodiments, which are not repeated herein.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a vehicle, a readable storage medium and a computer program product.

An embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments described above.

The disclosed embodiments provide a vehicle including: the electronic device of the above embodiment. The vehicle can be a vehicle such as an automobile, a truck or a rail vehicle.

The disclosed embodiments provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any of the above embodiments.

The disclosed embodiments provide a computer program product comprising a computer program which, when executed by a processor, implements the method of any of the embodiments described above.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, such as a voice instruction matching method. For example, in some embodiments, the voice instruction matching method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of the voice instruction matching method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the voice instruction matching method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions, improvements, etc. that are within the principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A voice command matching method, comprising:

acquiring a current field to be matched, wherein the current field to be matched is a field containing at least one word in a voice instruction;

and matching the current field to be matched, and determining the next text of the current field to be matched in the voice instruction as a new current field to be matched under the condition that the current field to be matched is successfully matched.

2. The method of claim 1, further comprising:

Under the condition that the matching of the current field to be matched fails, determining the initial character of the current field to be matched as the initial character of the new current field to be matched, and determining the next character of the current field to be matched in the voice instruction as the ending character of the new current field to be matched;

and determining the new current field to be matched based on the initial text of the new current field to be matched and the ending text of the new current field to be matched.

3. The method of claim 2, wherein the current field to be matched has a target mark indicating a start word of the current field to be matched;

under the condition that the current field to be matched is successfully matched, determining the next text of the current field to be matched in the voice instruction as a new current field to be matched, wherein the method comprises the following steps:

under the condition that the current field to be matched is successfully matched, updating the position of the target mark so that the updated target mark is used for indicating the next text of the current field to be matched;

and determining the updated text indicated by the target mark as the new current field to be matched.

4. The method of claim 3, wherein determining the starting literal of the current to-be-matched field as the starting literal of the new current to-be-matched field if the current to-be-matched field fails, comprises:

and under the condition that the matching of the current field to be matched fails, keeping the position of the target mark unchanged, and determining the character indicated by the target mark as the initial character of the new current field to be matched.

5. The method of claim 2, wherein determining a next literal of the current field to be matched in the voice instruction as a new current field to be matched if the current field to be matched is successful comprises:

deleting the current field to be matched from a voice character set under the condition that the current field to be matched is successfully matched, so as to update the voice character set, wherein the voice character set is a set for recording characters in the voice instruction;

and determining the first word of the updated voice word set as the new current field to be matched.

6. The method of claim 5, wherein determining the starting word of the current to-be-matched field as the starting word of the new current to-be-matched field and determining the next word of the current to-be-matched field in the voice instruction as the ending word of the new current to-be-matched field if the current to-be-matched field fails, comprises:

Under the condition that the matching of the current field to be matched fails, keeping the first character of the voice character set unchanged, and determining the first character of the voice character set as the initial character of the new current field to be matched;

and determining the next text of the current field to be matched in the voice text set as the ending text of the new current field to be matched.

7. The method according to any of claims 1-6, obtaining a current field to be matched, comprising:

acquiring current characters in the voice instruction in real time;

and determining the current field to be matched corresponding to the current character, wherein the ending character of the current field to be matched is the current character, and the starting character of the current field to be matched is the first character in the voice instruction or the next character of the last successfully matched field to be matched in the voice instruction.

8. The method of any of claims 1-6, wherein matching the current field to be matched, in the event that the current field to be matched is successful, determining a next literal of the current field to be matched in the voice instruction as a new current field to be matched, comprises:

Matching the current field to be matched with names of a plurality of operation controls in a display interface;

under the condition that the current field to be matched is successfully matched with the names of the target operation controls in the plurality of operation controls, selecting the target operation controls;

and determining the next text of the current field to be matched in the voice instruction as a new current field to be matched.

9. The method of claim 8, wherein the display interface is a display interface of an in-vehicle terminal.

10. A voice command matching apparatus comprising:

the device comprises an acquisition unit, a matching unit and a matching unit, wherein the acquisition unit is used for acquiring a current field to be matched, and the current field to be matched is a field containing at least one word in a voice instruction;

the matching unit is used for matching the current field to be matched, and determining the next text of the current field to be matched in the voice instruction as a new current field to be matched under the condition that the current field to be matched is successfully matched.

11. The apparatus of claim 10, wherein the matching unit is further to:

12. The apparatus of claim 11, wherein the current field to be matched has a target mark indicating a start literal of the current field to be matched;

the matching unit is further configured to:

13. The apparatus of claim 12, wherein the matching unit is further to:

14. The apparatus of claim 11, wherein the matching unit is further configured to:

15. The apparatus of claim 14, wherein the matching unit is further configured to: under the condition that the matching of the current field to be matched fails, keeping the first character of the voice character set unchanged, and determining the first character of the voice character set as the initial character of the new current field to be matched; and determining the next text of the current field to be matched in the voice text set as the ending text of the new current field to be matched.

16. The apparatus according to any of claims 10-15, the acquisition unit further to:

acquiring current characters in the voice instruction in real time;

and determining the current field to be matched corresponding to the current text, wherein the ending text of the current field to be matched is the current text, and the starting text of the current field to be matched is the first text in the voice instruction or the next text of the last successfully matched field to be matched in the voice instruction.

17. The apparatus of any of claims 10-15, wherein the matching unit is further to:

18. The apparatus of claim 17, wherein the display interface is a display interface of an in-vehicle terminal.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

20. A vehicle, comprising: the electronic device of claim 19.

21. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.

22. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-9.