CN115512701A - Voice instruction registration method and related device - Google Patents

Voice instruction registration method and related device Download PDF

Info

Publication number
CN115512701A
CN115512701A CN202211096411.6A CN202211096411A CN115512701A CN 115512701 A CN115512701 A CN 115512701A CN 202211096411 A CN202211096411 A CN 202211096411A CN 115512701 A CN115512701 A CN 115512701A
Authority
CN
China
Prior art keywords
voice
voice instruction
keyword
user
vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211096411.6A
Other languages
Chinese (zh)
Inventor
周力为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pateo Connect Nanjing Co Ltd
Original Assignee
Pateo Connect Nanjing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pateo Connect Nanjing Co Ltd filed Critical Pateo Connect Nanjing Co Ltd
Priority to CN202211096411.6A priority Critical patent/CN115512701A/en
Publication of CN115512701A publication Critical patent/CN115512701A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60RVEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
    • B60R16/00Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
    • B60R16/02Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
    • B60R16/023Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for transmission of signals between vehicle parts or subsystems
    • B60R16/0231Circuits relating to the driving or the functioning of the vehicle
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60RVEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
    • B60R16/00Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
    • B60R16/02Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
    • B60R16/037Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
    • B60R16/0373Voice control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/08Interaction between the driver and the control system
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Mechanical Engineering (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Transportation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the application discloses a voice instruction registration method, which is applied to a vehicle and comprises the following steps: responding to a voice keyword and a semantic intention of a first voice instruction received from a cloud end, associating the voice keyword and the semantic intention of the first voice instruction, and storing the voice keyword and the semantic intention to the local, wherein the first voice instruction is a voice instruction sent by a user after the user speaks a preset awakening word. The method and the device are beneficial to improving the accuracy of voice control.

Description

Voice instruction registration method and related device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and a related apparatus for registering a voice instruction.
Background
With the development of the technology, the voice control technology is widely applied, and a user can conveniently and conveniently control equipment to execute specific operations through voice instructions. However, the current voice control methods mainly include two methods, one method needs to wake up a voice assistant, then sends a voice command after wake-up to a cloud for recognition, and executes corresponding actions according to recognition results, and the method has a good recognition effect, but has a slow execution speed and needs to be realized by depending on a network; the other method is to adopt a local off-line database, execute corresponding actions when keywords are detected through preset keywords and corresponding execution actions, and the preset keywords are difficult to adapt to the speaking habits of each user
Disclosure of Invention
The embodiment of the application provides a voice instruction registration method and a related device, aiming at improving the accuracy of voice control.
In a first aspect, an embodiment of the present application provides a method for registering a voice instruction, where the method is applied to a vehicle, and includes:
responding to a voice keyword and a semantic intention of a first voice instruction received from a cloud, associating the voice keyword and the semantic intention of the first voice instruction, and storing the voice keyword and the semantic intention to the local, wherein the first voice instruction is a voice instruction sent by a user after the user speaks a preset awakening word.
In a second aspect, an embodiment of the present application provides a voice instruction registration apparatus, which is applied to a vehicle, and includes:
the storage unit is used for responding to the voice keywords and the semantic intention of the first voice instruction received from the cloud, associating the voice keywords and the semantic intention of the first voice instruction, and storing the voice keywords and the semantic intention to the local, wherein the first voice instruction is a voice instruction sent by a user after the user speaks a preset awakening word.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, where the programs include instructions for performing the steps in the first aspect of the embodiment of the present application.
In a fourth aspect, an embodiment of the present application provides a computer storage medium storing a computer program for electronic data exchange, where the computer program makes a computer perform part or all of the steps described in the first aspect of the embodiment.
It can be seen that, in this embodiment, for a first voice instruction sent by a user after the user speaks a preset wakeup word, the vehicle can associate and store the voice keyword and the semantic intention of the first voice instruction to the local in response to the voice keyword and the semantic intention of the first voice instruction received from the cloud, so that the vehicle locally stores the voice keyword and the semantic intention which conform to the habit of the user, and the accuracy of voice control is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1A is a schematic structural diagram of a voice command registration system according to an embodiment of the present application;
fig. 1B is a diagram illustrating an exemplary composition of an electronic device according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of a voice instruction registration method according to an embodiment of the present application;
fig. 3A is a block diagram illustrating functional units of a voice command registration apparatus according to an embodiment of the present disclosure;
fig. 3B is a block diagram illustrating functional units of another apparatus for registering voice commands according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the foregoing drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The embodiments of the present application will be described below with reference to the accompanying drawings.
The technical scheme of the application can be applied to the voice instruction registration system 10 shown in fig. 1A, where the voice instruction registration system 10 includes a vehicle 100 and a cloud server 200, and the vehicle 100 is in communication connection with the cloud server 200.
In specific implementation, a user can send a first voice instruction to the vehicle 100 after speaking a preset wake-up word, the vehicle 100 receives the first voice instruction sent by the user after speaking the wake-up word, and then sends the first voice instruction 100 to the cloud server 200, the cloud server 200 analyzes the first voice instruction and returns a voice keyword and a voice intention of the first voice instruction to the vehicle 100, and after responding to the voice keyword and the semantic intention of the first voice instruction received from the cloud server 200, the vehicle 100 can associate the voice keyword and the semantic intention of the first voice instruction and store the associated voice keyword and the intention to the local of the vehicle 100, so as to complete registration of the first voice instruction.
The vehicle 100 may be provided with an electronic device 101, and the action performed by the vehicle 100 in this application may be specifically performed by the electronic device 101.
The constituent structure of the electronic device 100 in this application may be as shown in fig. 1B, and the electronic device 100 may include a processor 110, a memory 120, a communication interface 130, and one or more programs 121, where the one or more programs 121 are stored in the memory 120 and configured to be executed by the processor 110, and the one or more programs 121 include instructions for executing any step of the above method embodiments.
The communication interface 130 is used for supporting communication between the electronic device 100 and other devices. The Processor 110 may be, for example, a Central Processing Unit (CPU), a general purpose Processor, a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, units, and circuits described in connection with the disclosure of the embodiments of the application. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, and the like.
The memory 120 may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and direct bus RAM (DR RAM).
In a specific implementation, the processor 110 is configured to perform any one of the steps performed by the vehicle in the method embodiments described below, and when performing data transmission, such as sending, the communication interface 130 is optionally invoked to perform the corresponding operation.
It should be noted that the structural diagram of the electronic device 100 is merely an example, and more or fewer devices may be specifically included, which is not limited herein.
Referring to fig. 2, fig. 2 is a flowchart illustrating a voice command registration method according to an embodiment of the present application, where the method can be applied to the vehicle 100 shown in fig. 1A, and as shown in fig. 2, the voice command registration method includes:
step 201, in response to receiving a voice keyword and a semantic intention of a first voice instruction from a cloud, associating the voice keyword and the semantic intention of the first voice instruction, and storing the voice keyword and the semantic intention to the local.
The first voice instruction is a voice instruction sent by a user after the user speaks a preset awakening word.
In specific implementation, the vehicle 100 locally stores a preset wake-up word, after detecting that the user speaks the preset wake-up word, if receiving a voice command (i.e., a first voice command) sent by the user, the vehicle may send the first voice command to a cloud terminal (e.g., the cloud terminal server 200 shown in fig. 1A) in an audio stream manner, analyze the audio stream by the cloud terminal to obtain a voice keyword and a semantic intention of the first voice command, return the voice keyword and the semantic intention of the first voice command obtained by the analysis to the vehicle, and after receiving the voice keyword and the semantic intention of the first voice command returned by the cloud terminal, the vehicle may store the voice keyword and the semantic intention of the first voice command in a local association manner, thereby completing voice registration of the first voice command. Subsequently, under the condition that the user does not use the preset awakening words or the vehicle is in an offline network state, the vehicle does not need to interact with the cloud terminal on the basis of the locally registered voice keywords which accord with the voice habit of the user and the associated semantic intention, and voice service can be accurately and quickly provided for the user through local analysis and query.
Wherein, the voice keyword of the first voice instruction may include: the semantic intention can be a semantic structure obtained by analyzing the audio stream of the first voice instruction by the cloud, and the semantic structure can be a language which can be recognized by a vehicle and can execute control operation according to the language. For example, after the user speaks a preset wake up word and speaks a first voice control instruction "open a front window" as an example, after the vehicle transmits an audio stream to the cloud, the cloud may analyze that a text corresponding to the audio stream represents "open the front window", so that "open the front window" may be used as a voice keyword of the first voice control instruction, and meanwhile, a semantic intention of the audio stream analyzed by the cloud may be { "target": "window", "operation": open "," value ": front _ left" }.
It can be seen that, in this embodiment, for a first voice instruction sent by a user after speaking a preset wakeup word, the vehicle can respond to the voice keyword and the semantic intention of the first voice instruction received from the cloud, associate the voice keyword and the semantic intention of the first voice instruction and store the associated voice keyword and semantic intention to the local, so that the voice keyword and semantic intention conforming to the habit of the user are locally stored in the vehicle, and the accuracy of voice control is improved.
In one possible example, prior to associating the voice keyword of the first voice instruction with a semantic intent, the method further comprises: and judging whether the voice keywords meet a preset rule, and if so, associating the voice keywords of the first voice instruction with the semantic intention.
In the embodiment of the application, the voice service provided by the vehicle after the user speaks the preset awakening word is called as awakening voice service for short, and the voice service provided by the vehicle when the user does not speak the preset awakening word or is in an offline network state is called as awakening-free voice service for short.
In the specific implementation, it is considered that a user may speak a lot of voice instructions in the process of using the wake-up voice service, but not all the voice instructions may be reused in the wake-up-free voice service, and even the user may send some audio streams (for example, a chat when the user is riding a car) that are not voice instructions to the cloud end when using the wake-up voice service, at this time, if each voice keyword and semantic intention returned by the cloud end in the wake-up voice service are stored, a large amount of useless data may be locally stored in the vehicle, thereby causing waste of a vehicle storage space, and meanwhile, since the number of locally registered voice instructions is too many, the user may easily wake up by mistake when speaking a locally stored voice keyword in a wake-up-free scene, thereby affecting user experience, and by setting a preset rule, the vehicle only stores the voice keyword and semantic intention that meet the preset rule, which may save the vehicle storage space, and at the same time, it is also beneficial to reduce the occurrence of false wake-up situations.
As can be seen, in this example, after the voice keyword and the semantic intention of the first voice instruction from the cloud are received, when it is determined that the voice keyword meets the preset rule, the voice keyword and the semantic intention of the first voice instruction are associated and stored, which is beneficial to saving a vehicle storage space and reducing the occurrence of false wake-up.
In one possible example, the determining whether the voice keyword satisfies a preset rule includes: and judging whether the voice keyword has a definite operation attribute, wherein the definite operation object is a controllable component in the vehicle.
The vehicle can accurately control the in-vehicle components according to the semantic intentions corresponding to the voice keywords, such as voice keywords like 'open front windows of the vehicle', 'open air conditioners', 'open a trunk', and the like, and is similar to encyclopedic questions and answers of who is Zhang three or chatting content of 'I hungry', the vehicle cannot realize control over the in-vehicle components according to the semantic intentions corresponding to the voice keywords, and the in-vehicle components belong to the voice keywords without explicit operation attributes.
As can be seen, in this example, when the voice keyword has an explicit operation attribute and an explicit operation object is a controllable component in the vehicle, the voice keyword and the semantic intention are only stored in an associated manner, which is beneficial to saving a storage space of the vehicle and reducing an occurrence of a false wake-up condition.
In one possible example, the determining whether the voice keyword satisfies a preset rule includes: and judging whether the voice keywords have definite operation attributes and do not have variable elements, wherein the definite operation objects are controllable components in the vehicle.
The variable elements include, but are not limited to, names (e.g., names of people, places, or books), temperature, time, and other information.
For example, although the vehicle can control the in-vehicle components according to the speech keyword such as "play song of singer a", the speech keyword is not saved by the vehicle because the variable element of "singer a" (singer name) exists in the speech keyword. In practical application, for the voice keywords containing variable elements, because the content number corresponding to the variable elements is huge, for example, thousands of names of singers may exist, under the condition that a user does not hear the song of the singer a, the user receives a voice instruction that the friend recommends to use the wake-up voice service to speak out the song of the singer a, after audition, the user finds that the user does not like the style of the singer a, the user cannot speak out the voice instruction to play the song of the singer a when using the wake-up-free voice service subsequently, if the vehicle stores the voice keywords corresponding to each singer, a large number of voice keywords which the user cannot use may be stored, and meanwhile, because too many voice instructions are registered locally, the user is likely to mistakenly wake up when speaking out the locally stored voice keywords in a scene without waking up, user experience is further influenced, vehicle storage space is wasted, and the efficiency of local query during voice control is also influenced, the vehicle only stores the voice keywords with specific operation attributes and without the variable elements, which is beneficial to avoiding such problems.
In addition, in other embodiments, considering that part of the voice keywords with variable elements may be the voice keywords commonly used by the user, for example, the user likes the song of the singer B particularly, and often says a control instruction of "playing the song of the singer B", the cloud may record the voice keywords analyzed each time, record the number of times each voice keyword is analyzed, and transmit the number of times each voice keyword is analyzed to the vehicle along with the voice keywords and the semantic intention, and the vehicle may judge that the voice keywords meet the preset rule for the voice keywords with specific operation attributes and variable elements when the number of times each voice keyword is analyzed is greater than the preset number, thereby realizing the registration of the voice keywords commonly used by the user and containing variable elements, and further meeting the user habit without increasing the number of locally stored voice keywords too much.
Therefore, in the example, when the voice keyword has a definite operation attribute and does not have a variable element, and the object of the definite operation is an in-vehicle component, the voice keyword and the semantic intention are associated and stored, which is beneficial to saving a vehicle storage space and reducing the occurrence of false wake-up.
In one possible example, the method further comprises: responding to a received second voice instruction sent by a user, analyzing the second voice instruction, and acquiring a voice keyword corresponding to the second voice instruction; the second voice instruction is a voice instruction sent by the user under the condition that the preset awakening word is not spoken; and searching the voice keyword corresponding to the second voice command locally, if the voice keyword is found, acquiring the associated semantic intention, and controlling the components in the vehicle according to the associated semantic intention.
In the specific implementation, under the condition that the user does not speak the preset awakening words, the vehicle can directly search the voice keywords locally no matter whether the vehicle is in an offline network state, and under the condition that the voice keywords are searched, the semantic intention related to the voice keywords is obtained, and the internal parts of the vehicle are controlled according to the related semantic intention. Therefore, the step of vehicle and cloud data interaction can be omitted, and the efficiency of voice control is improved.
As can be seen, in this example, in response to receiving a second voice instruction sent by a user without speaking a preset wake-up word, the vehicle may directly analyze the second voice instruction and obtain a voice keyword corresponding to the second voice instruction, and when the voice keyword corresponding to the second voice is found locally, obtain an associated semantic intention, and control the vehicle components according to the associated semantic intention, that is, when the user does not speak the preset wake-up word, the vehicle may provide a voice control service for the user accurately and quickly through local analysis and query.
In one possible example, the method further comprises: responding to a received second voice instruction sent by a user, analyzing the second voice instruction, and acquiring a voice keyword corresponding to the second voice instruction; the second voice instruction is a voice instruction sent by a user when the vehicle is in an offline network state; and locally searching the voice keyword corresponding to the second voice instruction, if the voice keyword is found, acquiring the associated semantic intention, and controlling the components in the vehicle according to the associated semantic intention.
In the specific implementation, under the condition that the vehicle is in an off-line network state, no matter whether a user speaks a preset awakening word or not, the vehicle can directly search the voice keyword locally, obtain the semantic intention associated with the voice keyword under the condition that the voice keyword is searched, and control the vehicle internal parts according to the associated semantic intention. At the moment, based on the voice keywords which are locally stored in the vehicle and accord with the language habits of the user, the vehicle can accurately provide voice services for the user without depending on the cloud.
In this example, the vehicle may respond to a second voice command sent by the user in an offline state, directly analyze the second voice command and obtain a voice keyword corresponding to the second voice command, and when the voice keyword corresponding to the second voice command is found locally, obtain the associated semantic intention, and control the in-vehicle component according to the associated semantic intention, that is, in the offline network state, the vehicle may accurately and quickly provide the voice control service for the user through local analysis and query.
In one possible example, the method further comprises: screening out target voice keywords from the local, wherein the target voice keywords are voice keywords which are more than preset time length away from the last searched time; deleting the target voice keyword and the associated voice intention from the local.
The vehicle can set that the operation of screening the target voice keywords is executed once every first preset time, so that the voice keywords which are locally stored and have the last use time too long from the current moment can be deleted regularly.
As can be seen, in this example, for a target voice keyword that is stored locally by the vehicle and is not used by the user for a long time, the vehicle may delete the target keyword and the voice intention associated therewith locally, thereby saving storage resources of the vehicle.
In one possible example, word banks corresponding to different users are locally included; the responding to the voice keyword and the semantic intention of the first voice instruction received from the cloud end, associating the voice keyword and the semantic intention of the first voice instruction, and storing the voice keyword and the semantic intention to the local, comprises: responding to a voice keyword and a semantic intention of a first voice instruction received from a cloud end, and acquiring voiceprint information corresponding to the first voice instruction; acquiring identity information of the user according to voiceprint information corresponding to the first voice instruction; and after the voice key words of the first voice instruction are associated with the semantic intention, storing the voice key words to a word bank corresponding to the identity information of the user.
The voiceprint information corresponding to the first voice command can be obtained by analyzing the vehicle locally, or the vehicle can also directly receive the voiceprint information corresponding to the first voice command from the cloud. The vehicle can inquire the mapping table of the voiceprint information and the user identity information stored locally or at the cloud, so that the identity information of the user sending the first voice instruction can be determined according to the voiceprint information corresponding to the first voice instruction.
The voice keywords and the semantic intentions corresponding to different users are stored in a distinguishing manner, so that when the wake-up-free voice service is provided for different users subsequently, the voice keywords can be queried only from the word stock corresponding to the identity information of the user, the data volume stored in the word stock corresponding to each user is reduced, and the query efficiency can be improved.
Particularly, after the voice keywords and the semantic intentions are stored in the word banks corresponding to the identity information of the users, the vehicles can also upload the voice keywords and the semantic intentions stored in each word bank to the cloud respectively every second preset time interval, the cloud stores the data in the word banks and the identity information corresponding to the word banks in an associated manner, and after the users replace the vehicles, the word banks can be conveniently downloaded from the cloud to new vehicles for use according to the voiceprint information.
In addition, after the target voice keyword is screened out from the local, the target voice keyword, the semantic intention related to the target voice keyword and the identity information of the user corresponding to the word bank where the target voice keyword is located can be sent to the cloud, the identity information is stored in a related mode by the cloud, notification information can be returned to the vehicle after the cloud is stored, the vehicle deletes the local target voice keyword and the semantic intention when receiving the notification information, and a follow-up user can also download the local deleted voice keyword and the semantic intention from the cloud.
Furthermore, the voice keywords and the semantic intentions uploaded by the vehicle at every second preset time interval, and the pre-deleted target keywords and the semantic intentions uploaded by the vehicle can be stored separately, namely the identity information of one user can be stored in a cloud end, one word bank is kept synchronous with the data of a local word bank of the vehicle in use by the user in real time, the other word bank stores the deleted keywords and the semantic intentions corresponding to the identity information of the user, and the user can select to download the words and the semantic intentions as required.
As can be seen, in this example, the vehicle locally includes word banks corresponding to different users, and the vehicle can acquire the identity information of the user according to the voiceprint information corresponding to the first voice instruction, and store the voice keyword of the first voice instruction and the semantic intention into the word bank corresponding to the identity information of the user after associating the voice keyword with the semantic intention, so as to reduce the data amount of the voice instruction word bank corresponding to each user, which is beneficial to improving the efficiency of voice control while ensuring the accuracy of voice control.
The present application may perform division of function units on the electronic device according to the method example, for example, each function unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit. It should be noted that, in the embodiment of the present application, the division of the unit is schematic, and is only one logic function division, and when the actual implementation is realized, another division manner may be provided.
Fig. 3A is a block diagram illustrating functional units of a voice command registration apparatus according to an embodiment of the present disclosure. The voice instruction registration apparatus 30 can be applied to the vehicle 100 shown in fig. 1A, and the voice instruction registration apparatus 30 includes:
the saving unit 301 is configured to, in response to a voice keyword and a semantic intention of a first voice instruction received from the cloud, associate the voice keyword and the semantic intention of the first voice instruction, and save the associated voice keyword and the semantic intention to the local, where the first voice instruction is a voice instruction sent by a user after the user speaks a preset wakeup word.
In a possible example, the apparatus 30 is further configured to determine whether the voice keyword of the first voice instruction satisfies a preset rule before associating the voice keyword with the semantic intent, and if so, associate the voice keyword of the first voice instruction with the semantic intent.
In a possible example, in the aspect of determining whether the voice keyword satisfies the preset rule, the apparatus 30 is specifically configured to: and judging whether the voice keyword has a definite operation attribute, wherein the definite operation object is a controllable component in the vehicle.
In one possible example, in the aspect of determining whether the speech keyword satisfies the preset rule, the apparatus 30 is specifically configured to: and judging whether the voice keyword has a definite operation attribute and does not have a variable element, wherein the definite operation object is a controllable component in the vehicle.
In one possible example, the apparatus 30 is further configured to: responding to a received second voice instruction sent by a user, analyzing the second voice instruction, and acquiring a voice keyword corresponding to the second voice instruction; the second voice instruction is a voice instruction sent by the user under the condition that the preset awakening word is not spoken; and locally searching the voice keyword corresponding to the second voice instruction, if the voice keyword is found, acquiring the associated semantic intention, and controlling the components in the vehicle according to the associated semantic intention.
In one possible example, the apparatus 30 is further configured to: responding to a second voice instruction sent by a user, analyzing the second voice instruction, and acquiring a voice keyword corresponding to the second voice instruction; the second voice instruction is a voice instruction sent by a user when the vehicle is in an offline network state; and searching the voice keyword corresponding to the second voice command locally, if the voice keyword is found, acquiring the associated semantic intention, and controlling the components in the vehicle according to the associated semantic intention.
In one possible example, the apparatus 30 is further configured to: screening out target voice keywords from the local, wherein the target voice keywords are voice keywords which are more than preset duration from the last searched time; deleting the target voice keyword and the associated voice intention from the local.
In one possible example, word banks corresponding to different users are locally included; the saving unit 301 is specifically configured to: responding to a voice keyword and a semantic intention of a first voice instruction received from a cloud end, and acquiring voiceprint information corresponding to the first voice instruction; acquiring the identity information of the user according to the voiceprint information corresponding to the first voice instruction; and after the voice key words of the first voice instruction are associated with the semantic intention, storing the voice key words to a word bank corresponding to the identity information of the user.
In the case of using an integrated unit, a block diagram of functional units of the voice instruction registration apparatus 30 provided in the embodiment of the present application is shown in fig. 3B. In fig. 3B, the voice instruction registering device 30 includes: a processing module 310 and a communication module 311. Processing module 310 is used to control and manage the actions of voice command registration apparatus 30, e.g., the steps performed by saving unit 301, and/or other processes for performing the techniques described herein. The communication module 311 is used to support the interaction between the voice command registering apparatus 30 and other devices. As shown in fig. 3B, voice instruction registration apparatus 30 may further include a storage module 312, and storage module 312 is used for storing program codes and data of voice instruction registration apparatus 30.
The Processing module 310 may be a Processor or a controller, and for example, may be a Central Processing Unit (CPU), a general-purpose Processor, a Digital Signal Processor (DSP), an ASIC, an FPGA or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure of the embodiments of the application. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others. The communication module 311 may be a transceiver, an RF circuit or a communication interface, etc. The storage module 312 may be a memory.
All relevant contents of each scene related to the method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again. The voice command registration apparatus 31 can perform the steps performed by the vehicle in the voice command registration method shown in fig. 2.
Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enabling a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.
It should be noted that for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps of the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, the memory including: flash Memory disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing embodiments have been described in detail, and specific examples are used herein to explain the principles and implementations of the present application, where the above description of the embodiments is only intended to help understand the method and its core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A voice command registration method, applied to a vehicle, includes:
responding to a voice keyword and a semantic intention of a first voice instruction received from a cloud end, associating the voice keyword and the semantic intention of the first voice instruction, and storing the voice keyword and the semantic intention to the local, wherein the first voice instruction is a voice instruction sent by a user after the user speaks a preset awakening word.
2. The method of claim 1, wherein prior to associating the voice keyword of the first voice instruction with a semantic intent, the method further comprises:
and judging whether the voice keywords meet a preset rule or not, and if so, associating the voice keywords of the first voice instruction with the semantic intention.
3. The method of claim 2, wherein the determining whether the speech keyword satisfies a predetermined rule comprises:
and judging whether the voice keywords have definite operation attributes or not, wherein the definite operation objects are controllable components in the vehicle.
4. The method of claim 2, wherein the determining whether the speech keyword satisfies a predetermined rule comprises:
and judging whether the voice keyword has a definite operation attribute and does not have a variable element, wherein the definite operation object is a controllable component in the vehicle.
5. The method of claim 1, further comprising:
responding to a received second voice instruction sent by a user, analyzing the second voice instruction, and acquiring a voice keyword corresponding to the second voice instruction; the second voice instruction is a voice instruction sent by the user under the condition that the preset awakening word is not spoken;
and locally searching the voice keyword corresponding to the second voice instruction, if the voice keyword is found, acquiring the associated semantic intention, and controlling the components in the vehicle according to the associated semantic intention.
6. The method of claim 1, further comprising:
responding to a received second voice instruction sent by a user, analyzing the second voice instruction, and acquiring a voice keyword corresponding to the second voice instruction; the second voice instruction is a voice instruction sent by a user when the vehicle is in an offline network state;
and searching the voice keyword corresponding to the second voice command locally, if the voice keyword is found, acquiring the associated semantic intention, and controlling the components in the vehicle according to the associated semantic intention.
7. The method of claim 6, further comprising:
screening out target voice keywords from the local, wherein the target voice keywords are voice keywords which are more than preset duration from the last searched time;
deleting the target speech keyword and the associated speech intent locally.
8. The method according to claims 1-7, characterized in that locally comprises word banks respectively corresponding to different users;
the responding to the voice keywords and the semantic intention of the first voice instruction received from the cloud end, associating the voice keywords and the semantic intention of the first voice instruction, and storing the voice keywords and the semantic intention locally comprises the following steps:
responding to a voice keyword and a semantic intention of a first voice instruction received from a cloud end, and acquiring voiceprint information corresponding to the first voice instruction;
acquiring the identity information of the user according to the voiceprint information corresponding to the first voice instruction;
and after the voice key words of the first voice instruction are associated with the semantic intention, storing the voice key words to a word bank corresponding to the identity information of the user.
9. A voice instruction registration apparatus, applied to a vehicle, comprising:
the storage unit is used for responding to the voice keywords and the semantic intention of the first voice instruction received from the cloud, associating the voice keywords and the semantic intention of the first voice instruction, and storing the voice keywords and the semantic intention to the local, wherein the first voice instruction is a voice instruction sent by a user after the user speaks a preset awakening word.
10. An electronic device, characterized in that the electronic device comprises:
comprising a processor, memory, and one or more programs stored in the memory and configured to be executed by the processor, the programs including instructions for performing the steps in the method of any of claims 1-8.
CN202211096411.6A 2022-09-08 2022-09-08 Voice instruction registration method and related device Pending CN115512701A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211096411.6A CN115512701A (en) 2022-09-08 2022-09-08 Voice instruction registration method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211096411.6A CN115512701A (en) 2022-09-08 2022-09-08 Voice instruction registration method and related device

Publications (1)

Publication Number Publication Date
CN115512701A true CN115512701A (en) 2022-12-23

Family

ID=84503038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211096411.6A Pending CN115512701A (en) 2022-09-08 2022-09-08 Voice instruction registration method and related device

Country Status (1)

Country Link
CN (1) CN115512701A (en)

Similar Documents

Publication Publication Date Title
CN109785840B (en) Method and device for identifying natural language, vehicle-mounted multimedia host and computer readable storage medium
US9275641B1 (en) Platform for creating customizable dialog system engines
US20190287524A1 (en) Method and apparatus for determining speech interaction satisfaction
CN109243468B (en) Voice recognition method and device, electronic equipment and storage medium
WO2018149209A1 (en) Voice recognition method, electronic device, and computer storage medium
DE112016004863T5 (en) Parameter collection and automatic dialog generation in dialog systems
WO2019047849A1 (en) News processing method, apparatus, storage medium and computer device
CN103345467A (en) Speech translation system
CN111831795B (en) Multi-round dialogue processing method and device, electronic equipment and storage medium
CN109408799B (en) Semantic decision method and system
CN111261151A (en) Voice processing method and device, electronic equipment and storage medium
CN109359211B (en) Voice interaction data updating method, device, computer equipment and storage medium
CN108804525A (en) A kind of intelligent Answering method and device
CN112652302B (en) Voice control method, device, terminal and storage medium
CN116226344A (en) Dialogue generation method, dialogue generation device, and storage medium
CN112017663A (en) Voice generalization method and device and computer storage medium
CN111144132A (en) Semantic recognition method and device
CN110797012A (en) Information extraction method, equipment and storage medium
CN110970027B (en) Voice recognition method, device, computer storage medium and system
CN115512701A (en) Voice instruction registration method and related device
CN115862604A (en) Voice wakeup model training and voice wakeup method, device and computer equipment
CN110704592B (en) Statement analysis processing method and device, computer equipment and storage medium
CN115691470A (en) Dialect-based vehicle-mounted interaction method, device and system
CN111814484B (en) Semantic recognition method, semantic recognition device, electronic equipment and readable storage medium
CN112447178A (en) Voiceprint retrieval method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination