CN109003611B - Method, apparatus, device and medium for vehicle voice control - Google Patents

Method, apparatus, device and medium for vehicle voice control Download PDF

Info

Publication number
CN109003611B
CN109003611B CN201811150983.1A CN201811150983A CN109003611B CN 109003611 B CN109003611 B CN 109003611B CN 201811150983 A CN201811150983 A CN 201811150983A CN 109003611 B CN109003611 B CN 109003611B
Authority
CN
China
Prior art keywords
text
vehicle
instructions
wake
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811150983.1A
Other languages
Chinese (zh)
Other versions
CN109003611A (en
Inventor
张佳雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Zhilian Beijing Technology Co Ltd
Original Assignee
Apollo Zhilian Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Apollo Zhilian Beijing Technology Co Ltd filed Critical Apollo Zhilian Beijing Technology Co Ltd
Priority to CN201811150983.1A priority Critical patent/CN109003611B/en
Publication of CN109003611A publication Critical patent/CN109003611A/en
Application granted granted Critical
Publication of CN109003611B publication Critical patent/CN109003611B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60RVEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
    • B60R16/00Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
    • B60R16/02Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
    • B60R16/037Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
    • B60R16/0373Voice control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

Embodiments of the present disclosure relate to a method, apparatus, device, and computer-readable storage medium for vehicle voice control. The method includes acquiring text generated by recognizing speech input by a user by a vehicle; dividing the text into a plurality of text portions based on the identity information of the user; generating a set of instructions by determining one or more vehicle-executable instructions associated with each text portion; and causing the vehicle to execute at least a portion of the set of instructions. The technical scheme disclosed by the invention can improve the efficiency and accuracy of voice recognition in the vehicle-mounted scene, thereby improving the voice interaction experience of a user.

Description

Method, apparatus, device and medium for vehicle voice control
Technical Field
The present disclosure relates generally to the field of information processing, and more particularly, to methods, apparatus, devices, and computer-readable storage media for vehicle voice control.
Background
Currently, in a vehicle-mounted interconnection scenario, as voice recognition and echo cancellation technologies mature increasingly, the frequency of operations performed by users using voice also increases increasingly. The voice interaction is also developed from single-round voice interaction to multi-round voice interaction, so that the voice interaction process is smoother. However, the number of instructions that a user can operate in a single voice interaction is still limited to a single instruction, resulting in inefficient use of voice recognition. Also, a plurality of operations that the user often performs cannot be simply and conveniently completed. In addition, it is also difficult for the user to call up each application in the in-vehicle system by voice. These deficiencies degrade the user's voice interaction experience.
Disclosure of Invention
According to an example embodiment of the present disclosure, a scheme for voice control of a vehicle is provided.
In a first aspect of the disclosure, a method for voice control of a vehicle is provided. The method includes obtaining text generated by a vehicle recognizing speech input by a user. The method also includes partitioning the text into a plurality of text portions based on the identity information of the user. Further, the method includes generating a set of instructions by determining one or more vehicle-executable instructions associated with each text portion. Still further, the method includes causing the vehicle to execute at least a portion of the set of instructions.
In a second aspect of the present disclosure, an apparatus for voice control of a vehicle is provided. The apparatus includes an acquisition module configured to acquire text generated by a vehicle recognizing speech input by a user. The apparatus also includes a dividing module configured to divide the text into a plurality of text portions based on the identity information of the user. Further, the apparatus includes a generation module configured to generate a set of instructions by determining one or more vehicle-executable instructions associated with each text portion. Still further, the apparatus includes an execution module configured to cause the vehicle to execute at least a portion of the set of instructions.
In a third aspect of the present disclosure, an electronic device is provided. The electronic device includes one or more processors; and storage means for storing the one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method according to the first aspect of the disclosure.
In a fourth aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements a method according to the first aspect of the present disclosure.
It should be understood that the statements herein reciting aspects are not intended to limit the critical or essential features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements, and wherein:
FIG. 1 illustrates a schematic diagram of an example environment in which embodiments of the present disclosure can be implemented;
FIG. 2 illustrates a schematic flow diagram of a process or method for vehicle voice control, according to some embodiments of the present disclosure;
FIG. 3 shows a schematic block diagram of an apparatus for vehicle voice control, in accordance with some embodiments of the present disclosure; and
FIG. 4 illustrates a schematic block diagram of a computing device capable of implementing various embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
In describing embodiments of the present disclosure, the terms "include" and its derivatives should be interpreted as being inclusive, i.e., "including but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.
As mentioned above, in the current vehicle-mounted interconnection scenario, the user cannot effectively utilize voice recognition, cannot directly invoke multiple operations, and cannot invoke each application in the vehicle-mounted system, so that the voice interaction experience of the user is reduced.
Embodiments of the present disclosure propose a scheme for vehicle voice control. In the scheme, a text generated by recognizing a voice input by a user by a vehicle is acquired; dividing the text into a plurality of text portions based on the identity information of the user; generating a set of instructions by determining one or more vehicle-executable instructions associated with each text portion; and causing the vehicle to execute at least a portion of the set of instructions. In this way, a plurality of instructions which are intended to be executed by the user can be identified based on the identity information of the user, so that the efficiency and the accuracy of instruction identification are improved, and the voice interaction experience of the user is greatly improved.
Embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings.
Fig. 1 illustrates a schematic diagram of an example environment 100 in which various embodiments of the present disclosure can be implemented. As shown, the example environment 100 includes a vehicle 110, a user 120, and a computing device 130. Vehicle 110 may be any entity capable of movement, such as a motor vehicle, a non-motor vehicle, and the like. Although the vehicle 100 is described in the text by way of example, it should be understood that the vehicle may sometimes be replaced by any entity that does not move, for example a household appliance such as a television, an air conditioner, a refrigerator, a microwave oven, etc.
Vehicle 110 includes an in-vehicle computing device 112, a voice capture device 114, and a storage device 116. The in-vehicle computing device 112 may be any suitable computing device, whether centralized or distributed, including but not limited to personal computers, servers, clients, hand-held or laptop devices, multiprocessors, microprocessors, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed clouds, combinations thereof, and the like.
The voice capture device 114 may be any capture device capable of collecting voice from the user 120. Examples of voice capture device 114 include, but are not limited to, an in-vehicle microphone, an in-vehicle camera with a microphone, and the like. Further, storage device 116 may be any storage device for storing data related to vehicle 110.
In some embodiments, the voice capture device 114 may capture voice from the user 120 and provide the captured voice to the in-vehicle computing device 112. The in-vehicle computing device 112 may convert the acquired speech into text and identify one or more vehicle-executable instructions referred to in the text. The vehicle-executable instructions may operate various applications in the in-vehicle system. For example, the vehicle-executable instructions may indicate "open navigation," "open music," etc., such that a navigation application, music application, etc., in the in-vehicle system may be opened.
In some embodiments, the storage device 116 may store a wake statement. The wake statement is typically not, but is associated with, vehicle-executable instructions. The in-vehicle computing device 112 may retrieve the wake statement from the storage device 116 and compare the wake statement to the text. When the wake statement matches the text, the in-vehicle computing device 112 may retrieve a set of instructions (also may be referred to as a workflow) corresponding to the wake statement. For example, the in-vehicle computing device 112 may obtain the set of instructions from the computing device 130. Alternatively, the in-vehicle computing device 112 may retrieve the set of instructions from the storage device 116. The in-vehicle computing device 112 may then execute the set of instructions.
The computing device 130 may be remote or local to the vehicle 110. Computing device 130 may be any suitable computing device, whether centralized or distributed, including but not limited to personal computers, servers, clients, hand-held or laptop devices, multiprocessors, microprocessors, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed clouds, combinations thereof, and the like. The computing device 130 may communicate with the vehicle 110, and in particular the in-vehicle computing device 112 therein, such as through a wired and/or wireless connection.
Conversely, when the wake statement does not match the text, the in-vehicle computing device 112 may send the text to the computing device 130. In some embodiments, computing device 130 may perform multiple layers of processing on the text. For example, computing device 130 may perform two or three layers of processing on text. Specifically, in a first layer of processing, the computing device 130 may divide the text into a plurality of text portions based on the identity information of the user 120. The identity information of the user 120 may indicate a wake-up statement associated with the user 120, a connection word specific to the user 120, and the like. Further, in the first layer of processing, the computing device 130 may also use generic connective to divide the text into a plurality of text portions. Common connectives are words commonly used to segment text such as "and," "and," and the like.
In the second layer of processing, computing device 130 may apply parsing to at least one of the plurality of text portions resulting from the first layer of processing to obtain one or more text portions. In some embodiments, in addition to at least one of the plurality of text portions resulting from the first layer processing, the computing device 130 may also apply a grammar analysis to the text itself.
Further, the computing device 130 may determine instructions corresponding to the one or more text portions resulting from the dividing. In some embodiments, the computing device 130 may only determine instructions corresponding to the portion of text resulting from the second layer of processing. Alternatively, the computing device 130 may determine instructions corresponding to portions of text resulting from both the first layer of processing and the second layer of processing.
In some embodiments, the computing device 130 may convert the text portion into a machine semantic expression according to the deep semantic analysis and determine an instruction corresponding to the text portion based on the machine semantic expression.
Deep semantic analysis may find a corresponding semantic role for each predicate of a statement to convert the statement into a machine semantic expression, such as a predicate logic expression (e.g., lambda calculus expression), a dependency-based combined semantic expression (dependency-based semantic representation), and so on. An exemplary chinese-english statement is given below, along with a first order predicate logic expression corresponding thereto:
chinese listing all rivers in Colorado
English Name all the drivers in Colorado
Semantic expression, answer (river (loc _2(stateid ('colorado'))))
Methods of deep semantic analysis include, but are not limited to, knowledge-base (or database) based semantic analysis, supervised semantic analysis, and semi-supervised or unsupervised semantic analysis. In knowledge base based semantic analysis, a series of facts are recorded in the knowledge base in the form of triples or the like. For a given statement, semantic analysis converts the statement into a series of tuples defined in a knowledge base through a conversion technique, and constitutes an entity relationship graph.
In supervised semantic analysis, the supervised semantic analysis requires the use of artificially labeled semantic analysis corpora. And in the manually labeled semantic analysis corpus, manually labeling a semantic expression of each statement.
In semi-supervised or unsupervised semantic analysis, unsupervised semantic analysis does not require the use of manually labeled semantic analysis corpora, but only entity names/relationship names etc. in the knowledge base, and also does not utilize the fact of records in the knowledge base. Unsupervised semantic analysis does not utilize manually labeled semantic analysis corpora, but typically employs the Expectation Maximization (EM) algorithm. In each iteration of the algorithm, semantic analysis is performed on the sentences, and the sentences with high confidence degrees and semantic analysis results thereof are selected as a self-training data set.
In addition to the first and second layers of processing described above, in some embodiments, computing device 130 may also perform a third layer of processing that may assist in parsing the portions of text that were not parsed from the first two layers. In a third tier of processing, the computing device 130 may apply a regional analysis specific to a region to the text portions of the partitioned text portions that are not associated with instructions executable by the vehicle 110. In some embodiments, the computing device 130 may apply regional analysis to portions of text resulting from the second layer of processing that are not associated with instructions executable by the vehicle 110. Alternatively, the computing device 130 may apply regional analysis to portions of text resulting from both the first and second layers of processing that are not associated with instructions executable by the vehicle 110.
For example, computing device 130 may obtain the geographic location at which user 120 was located when the voice was input, such as via a Global Positioning System (GPS) onboard vehicle 110. Based on the geographic location, computing device 130 may apply a regional analysis to the text portions of the partitioned text portions that are not associated with instructions executable by vehicle 110. Regional analysis may include, but is not limited to, dialect analysis.
The regional analysis can use the region where the user is located to read the word meaning knowledge base, the semantic analysis knowledge base and the manually labeled exclusive semantic analysis corpus specific to the region to divide the text part. The word sense knowledge base may include predicates and nouns specific to the region that can be used to identify unidentified words and word sense disambiguations. The semantic analysis knowledge base may include semantic expressions specific to the region. The special semantic analysis corpus labeled manually can be a semantic expression with timeliness generated according to news and hot spots in the region.
Computing device 130 may then determine instructions corresponding to the one or more text portions resulting from the third layer of processing. In some embodiments, the computing device 130 may convert the text portion resulting from the third layer of processing into a machine semantic expression according to the deep semantic analysis described above, and determine an instruction corresponding to the text portion based on the machine semantic expression.
In some embodiments, the computing device 130 may also remove duplicate instructions from the instructions corresponding to the text portions resulting from the first, second, and third layer processes. The act of removing duplicate instructions may be performed when instructions are determined for each layer of processing, or may be performed when instructions are determined for all layers of processing.
In some embodiments, computing device 130 may determine the available instruction sets such that only instructions of the determined instructions that belong to the available instruction sets are executed by vehicle 110. For example, vehicle 110 may be set with the available instruction sets that it supports. In this case, computing device 130 may obtain an identification of vehicle 110 and determine an available instruction set corresponding to the identification.
In this way, the voice input by the user can be processed in multiple layers based on the identity information, the general connecting words, the grammar analysis and the regional analysis of the user 120 so as to determine a plurality of instructions which the user 120 intends to execute, so that the efficiency and the accuracy of instruction recognition are greatly improved, and the voice interaction experience of the user is greatly improved.
FIG. 2 illustrates a schematic flow diagram of a process or method 200 for vehicle voice control, according to some embodiments of the present disclosure. For example, the method 200 may be performed at the computing device 130 as shown in fig. 1 or other suitable system. For example, method 200 may be performed by, or associated with, an in-vehicle computing device 112 in vehicle 110. Moreover, method 200 may also include additional steps not shown and/or may omit steps shown, as the scope of the disclosure is not limited in this respect.
At 210, computing device 130 obtains text generated by vehicle 110 recognizing speech input by user 120. For example, the speech input by the user 120 and the text generated by it may be "i want to go home, call wife". In certain embodiments, the computing device 130 may obtain the text in the event that the vehicle 100 is unable to determine one or more vehicle-executable instructions. For example, as described above, the storage device 116 of the vehicle 110 may store the wake statement. The wake statement is not, but is associated with, vehicle-executable instructions. For example, the wake statement may be "i want to go home" and is associated with the instructions "open navigation" and "open music".
Because the text ("i go home, call wife") does not match the stored wake-up sentence ("i go home"), the text is not recognized by the vehicle 110, thereby disabling the vehicle 110 from executing the "open navigation," "open music," and "call wife" instructions that the user 120 intends to execute. In this case, the vehicle 110 sends the text to the computing device 130 for recognition of the text by the computing device 130. In this way, speech recognition may be performed more accurately with limited computing capabilities of vehicle 110, thereby saving the computing resources required by vehicle 100.
At 220, the computing device 130 divides the text into a plurality of text portions based on the identity information of the user 120. In some embodiments, the computing device 130 may identify the wake sentences associated with the user 120 based on in the text and, if a wake sentence is identified, partition each wake sentence out of the text as a text portion (also referred to as a "first text portion"). The wake-up statement may be a default for the system or may be set by the user 120. Although the wake-up statements are described as being associated with a single user, the wake-up statements may be associated with multiple users or all users. For example, a system default wake-up statement may be applicable to all users.
Assume the text is "call wife after i want to go home", and the wake-up statement is "i want to go home". In this case, the computing device 130 may recognize the wake sentence "i want to go home" from the text "call wife after i want to go home", and partition the wake sentence "i want to go home" from the text as the first text portion. In this way, the user 120 may be enabled to easily perform an operation involving a plurality of instructions through the wake-up sentence, thereby improving the efficiency of voice interaction.
In some embodiments, the computing device 130 may identify conjunctions in the text that are specific to the user 120 and partition the text based on the conjunctions. The conjunctions specific to the user 120 may be pre-set by the user 120 or learned by the computing device 130 from the user's historical speech input. For example, assume that the connectives specific to user 120 are "after completion". In this case, the computing device 130 may recognize the connection word as "after complete" from the text "call wife after i want to go home" and divide the text into "i want to go home" and "call wife" based on the connection word. In this way, the text may be partitioned based on user settings or user habits, thereby improving the user experience level.
Further, in some embodiments, computing device 130 may also use generic connective words to divide the text into multiple text portions. Common connectives are words commonly used to segment text such as "and," "and," and the like. For example, the computing device 130 may recognize the general connection as "and" from the text "i want to go home and call wife," and divide the text into "i want to go home" and "call wife" based on the general connection.
Then, a syntactic analysis is applied to at least one of the divided text portions to obtain one or more text portions (also referred to as "second text portions"). Assuming the text is "i want to go home to play weather call wife," the computing device 130 may recognize the wake sentence "i want to go home" and divide the text into two text portions "i want to go home" and "play weather call wife. In this case, the computing device 130 may apply parsing to one of the two text portions, "play weather call wife.
For example, the computing device 130 may divide the text portion "play weather call wife" based on the "predicate guest" parsing. Since "play" and "call" are predicates in grammar and "weather" and "wife" are objects in grammar, the computing device 130 may divide the text portion "play weather call wife" into two second text portions "play weather" and "call wife". In this way, the text may be further partitioned based on grammar, thereby increasing the accuracy of speech recognition.
Further, the computing device 130 may determine instructions corresponding to the one or more text portions resulting from the dividing. In some embodiments, the computing device 130 may convert the text portion into a machine semantic expression according to the deep semantic analysis and determine an instruction corresponding to the text portion based on the machine semantic expression. For example, the computing device 130 may determine that the instructions corresponding to the text portions "i want to go home", "play weather", and "call wife" indicate "open navigation", "open music", "play weather", and "call wife".
However, at times, the computing device 130 may not be able to determine instructions corresponding to some of the divided text portions, possibly due to the dialect of the particular area in which the user 120 is located. In this case, in some embodiments, computing device 130 may also obtain a geographic location at which user 120 is located when inputting speech and apply dialect analysis to second ones of the one or more second portions of text that are not associated with the one or more vehicle-executable instructions based on the geographic location.
For example, when the user 120 enters speech while the vehicle 110 is located within the Chongqing city, the computing device 130 may apply dialect analysis for Chongqing to a second portion of text that is not associated with vehicle-executable instructions based on the geographic location being within the Chongqing city. Assuming that the text portion "call counselor" is not associated with vehicle-executable instructions, the computing device 130 may perform a dialect analysis for the text portion "call counselor" as "call wife".
Computing device 130 may then determine an instruction to which one or more text portions resulting from the dialect analysis correspond. In some embodiments, as described above, the computing device 130 may convert the text portion resulting from the dialect analysis into a machine semantic expression according to the deep semantic analysis and determine an instruction corresponding to the text portion based on the machine semantic expression. In this way, regional extension can be performed for the region where the user 120 is located, improving accuracy and efficiency of speech recognition.
At 230, the computing device 130 may generate a set of instructions by determining one or more vehicle-executable instructions associated with each text portion. As described above, in some embodiments, the computing device 130 may convert the divided text portion into a machine semantic expression according to deep semantic analysis and determine an instruction corresponding to the text portion based on the machine semantic expression. The determined instruction may generate a set of instructions. For example, the determined instructions may generate a set of sequential instructions in the order of the text portion corresponding thereto in the text.
Further, in certain embodiments, the computing device 130 may generate the set of instructions by removing duplicate instructions from the one or more vehicle-executable instructions. Assuming that the text is "i want to go home and open music," the computing device 130 may determine, based on the method described above, that the instruction associated with the text indicates "open navigation," open music, "and" open music. Obviously, there are two repeated instructions "open music". In this case, the computing device 130 may generate an instruction set containing only one "open navigation" instruction and one "open music" instruction by removing one repeated instruction "open music". In this way, repeated operations may be avoided, thereby improving the user experience.
At 240, computing device 130 may cause vehicle 110 to execute at least a portion of the set of instructions. In some embodiments, the available instruction sets supported by vehicle 110 may be set. In this case, computing device 130 may obtain an identification of vehicle 110 and determine an available set of instructions corresponding to the identification, thereby causing vehicle 110 to execute at least a portion of the set of instructions belonging to the available set of instructions.
For example, user 120 or the manufacturer of vehicle 110 may set the available instruction set for vehicle 110 to not include "open music". In this case, even if computing device 130 determines that the instruction "turn music on," computing device 130 does not cause vehicle 110 to execute the instruction. In this way, the operation that the vehicle 110 is expected to be able to perform can be set, thereby improving the safety and flexibility of the in-vehicle system.
In this way, it is possible to perform processing of multiple levels of the voice input by the user based on the identity information of the user 120, the common connecting word, the grammar analysis, and the dialect analysis, and to remove the duplicate instruction after determining the instruction and to cause the vehicle 110 to perform only the operation that is permitted to be performed. The method not only improves the efficiency and the accuracy of instruction identification, but also improves the safety and the flexibility of the vehicle-mounted system, thereby greatly improving the voice interaction experience of users.
Fig. 3 shows a schematic block diagram of an apparatus 300 for vehicle voice control, according to some embodiments of the present disclosure. In conjunction with the description of fig. 1 and 2, the apparatus 300 shown in fig. 3 comprises: an acquisition module 310 configured to acquire text generated by recognizing a voice input by a user by a vehicle; a dividing module 320 configured to divide the text into a plurality of text portions based on the identity information of the user; a generation module 330 configured to generate a set of instructions by determining one or more vehicle-executable instructions associated with each text portion; and an execution module 340 configured to cause the vehicle to execute at least a portion of the set of instructions.
In an embodiment of the present disclosure, the obtaining module 310 includes: a text acquisition module configured to acquire the text in response to the vehicle being unable to determine the one or more vehicle-executable instructions.
In an embodiment of the present disclosure, the dividing module 320 includes: a wake sentence identification module configured to identify a wake sentence associated with the user in the text, the wake sentence not being instructions executable by the one or more vehicles but being associated with instructions executable by the one or more vehicles; and a wake statement dividing module configured to divide each wake statement from the text as the first text portion in response to identifying the wake statement.
In an embodiment of the present disclosure, the dividing module 320 further includes: a connecting word recognition module configured to recognize a connecting word specific to the user in the text; and a connecting word dividing module configured to divide the text based on the connecting word.
In an embodiment of the present disclosure, the dividing module 320 further includes: a parsing module configured to apply parsing to at least one of the plurality of text portions resulting from the division to obtain one or more second text portions.
In an embodiment of the present disclosure, the dividing module 320 further includes: a location acquisition module configured to acquire a geographic location at which the user is located when inputting the voice; and a dialect analysis module configured to apply a dialect analysis to a second text portion of the one or more second text portions that is not associated with the one or more vehicle-executable instructions based on the geographic location.
In an embodiment of the present disclosure, the generating module 330 includes: an instruction set generation module configured to generate the instruction set by removing duplicate instructions of the one or more vehicle-executable instructions.
In an embodiment of the present disclosure, the execution module 340 includes: an identification acquisition module configured to acquire an identification of the vehicle; a determination module configured to determine a set of available instructions corresponding to the identification; and an instruction execution module configured to cause the vehicle to execute at least a portion of the set of instructions belonging to the set of available instructions.
In an embodiment of the present disclosure, the apparatus 300 further comprises: a wake statement generation module configured to generate a wake statement indicative of the at least a portion of the instruction.
Fig. 4 shows a schematic block diagram of an example device 400 that may be used to implement embodiments of the present disclosure. As shown, device 400 includes a Central Processing Unit (CPU)401 that may perform various appropriate actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM)402 or loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data required for the operation of the device 400 can also be stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
A number of components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, or the like; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408 such as a magnetic disk, optical disk, or the like; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Processing unit 401 performs various methods and processes described above, such as process 200. For example, in some embodiments, process 200 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM 403 and executed by CPU 401, one or more steps of process 200 described above may be performed. Alternatively, in other embodiments, CPU 401 may be configured to perform process 200 in any other suitable manner (e.g., by way of firmware).
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), and the like.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (18)

1. A method of voice control of a vehicle, comprising:
acquiring a text generated by recognizing a voice input by a user by a vehicle;
dividing the text into a plurality of text portions based on the identity information of the user;
generating a set of instructions by determining one or more vehicle-executable instructions associated with each text portion; and
causing the vehicle to execute at least a portion of the set of instructions,
wherein dividing the text comprises:
identifying in the text a wake statement associated with the user that is not, but is associated with, the one or more vehicle-executable instructions; and
in response to identifying the wake statements, dividing each wake statement out of the text as a first text portion;
wherein the vehicle-executable instructions associated with the first text portion are determined based on an association of the wake statement with the vehicle-executable instructions.
2. The method of claim 1, wherein obtaining the text comprises:
the text is retrieved in response to the vehicle being unable to determine the one or more vehicle-executable instructions.
3. The method of claim 1, wherein dividing the text comprises:
identifying, in the text, a connection word that is specific to the user; and
the text is partitioned based on the connection words.
4. The method of claim 1, wherein dividing the text comprises:
applying a syntactic analysis to at least one of the plurality of text portions resulting from the dividing to obtain one or more second text portions.
5. The method of claim 4, wherein dividing the text comprises:
acquiring the geographical position where the user is located when inputting the voice; and
applying, based on the geographic location, a dialect analysis to a second text portion of the one or more second text portions that is not associated with the one or more vehicle-executable instructions.
6. The method of claim 1, wherein generating the set of instructions comprises:
the set of instructions is generated by removing duplicate instructions of the one or more vehicle-executable instructions.
7. The method of claim 1, wherein causing the vehicle to execute at least a portion of the set of instructions comprises:
acquiring an identification of the vehicle;
determining a set of available instructions corresponding to the identification; and
causing the vehicle to execute at least a portion of the set of instructions that are part of the set of available instructions.
8. The method of claim 1, further comprising:
generating a wake-up statement indicative of the at least a portion of the instructions.
9. A vehicle voice-controlled apparatus comprising:
an acquisition module configured to acquire text generated by a vehicle recognizing a voice input by a user;
a dividing module configured to divide the text into a plurality of text portions based on the identity information of the user;
a generation module configured to generate a set of instructions by determining one or more vehicle-executable instructions associated with each text portion; and
an execution module configured to cause the vehicle to execute at least a portion of the set of instructions,
wherein the dividing module comprises:
a wake sentence identification module configured to identify a wake sentence associated with the user in the text, the wake sentence not being, but being associated with, the one or more vehicle-executable instructions; and
a wake statement dividing module configured to divide each wake statement from the text as a first text portion in response to identifying the wake statement;
wherein the vehicle-executable instructions associated with the first text portion are determined based on an association of the wake statement with the vehicle-executable instructions.
10. The apparatus of claim 9, wherein the acquisition module comprises:
a text acquisition module configured to acquire the text in response to the vehicle being unable to determine the one or more vehicle-executable instructions.
11. The apparatus of claim 9, wherein the partitioning module comprises:
a connecting word recognition module configured to recognize a connecting word specific to the user in the text; and
a connecting word dividing module configured to divide the text based on the connecting words.
12. The apparatus of claim 9, wherein the partitioning module comprises:
a parsing module configured to apply parsing to at least one of the plurality of text portions resulting from the division to obtain one or more second text portions.
13. The apparatus of claim 12, wherein the partitioning module comprises:
a location acquisition module configured to acquire a geographic location at which the user is located when inputting the voice; and
a dialect analysis module configured to apply a dialect analysis to a second text portion of the one or more second text portions that is not associated with the one or more vehicle-executable instructions based on the geographic location.
14. The apparatus of claim 9, wherein the generating means comprises:
an instruction set generation module configured to generate the instruction set by removing duplicate instructions of the one or more vehicle-executable instructions.
15. The apparatus of claim 9, wherein the execution module comprises:
an identification acquisition module configured to acquire an identification of the vehicle;
a determination module configured to determine a set of available instructions corresponding to the identification; and
an instruction execution module configured to cause the vehicle to execute at least a portion of the set of instructions belonging to the set of available instructions.
16. The apparatus of claim 9, further comprising:
a wake statement generation module configured to generate a wake statement indicating the at least a portion of the instructions.
17. An electronic device, the electronic device comprising:
one or more processors; and
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method according to any one of claims 1-8.
18. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN201811150983.1A 2018-09-29 2018-09-29 Method, apparatus, device and medium for vehicle voice control Active CN109003611B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811150983.1A CN109003611B (en) 2018-09-29 2018-09-29 Method, apparatus, device and medium for vehicle voice control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811150983.1A CN109003611B (en) 2018-09-29 2018-09-29 Method, apparatus, device and medium for vehicle voice control

Publications (2)

Publication Number Publication Date
CN109003611A CN109003611A (en) 2018-12-14
CN109003611B true CN109003611B (en) 2022-05-27

Family

ID=64589614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811150983.1A Active CN109003611B (en) 2018-09-29 2018-09-29 Method, apparatus, device and medium for vehicle voice control

Country Status (1)

Country Link
CN (1) CN109003611B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767758B (en) * 2019-01-11 2021-06-08 中山大学 Vehicle-mounted voice analysis method, system, storage medium and device
CN110400562B (en) * 2019-06-24 2022-03-22 歌尔科技有限公司 Interactive processing method, device, equipment and audio equipment
JP7274376B2 (en) * 2019-07-18 2023-05-16 本田技研工業株式会社 AGENT DEVICE, CONTROL METHOD OF AGENT DEVICE, AND PROGRAM
CN110633476B (en) * 2019-09-27 2024-04-05 北京百度网讯科技有限公司 Method and device for acquiring knowledge annotation information
CN111324202A (en) * 2020-02-19 2020-06-23 中国第一汽车股份有限公司 Interaction method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104145304A (en) * 2012-03-08 2014-11-12 Lg电子株式会社 An apparatus and method for multiple device voice control
CN106471570A (en) * 2014-05-30 2017-03-01 苹果公司 Order single language input method more
CN107199971A (en) * 2017-05-03 2017-09-26 深圳车盒子科技有限公司 Vehicle-mounted voice exchange method, terminal and computer-readable recording medium
CN107204185A (en) * 2017-05-03 2017-09-26 深圳车盒子科技有限公司 Vehicle-mounted voice exchange method, system and computer-readable recording medium
CN107680591A (en) * 2017-09-21 2018-02-09 百度在线网络技术(北京)有限公司 Voice interactive method, device and its equipment based on car-mounted terminal
CN108091329A (en) * 2017-12-20 2018-05-29 江西爱驰亿维实业有限公司 Method, apparatus and computing device based on speech recognition controlled automobile

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9313739B2 (en) * 2012-10-23 2016-04-12 Qualcomm Incorporated Systems and methods for low power wake up signal and operations for WLAN
CN105654943A (en) * 2015-10-26 2016-06-08 乐视致新电子科技(天津)有限公司 Voice wakeup method, apparatus and system thereof
CN106815507A (en) * 2015-11-30 2017-06-09 中兴通讯股份有限公司 Voice wakes up implementation method, device and terminal
CN107527614B (en) * 2016-06-21 2021-11-26 瑞昱半导体股份有限公司 Voice control system and method thereof
WO2018157388A1 (en) * 2017-03-03 2018-09-07 深圳前海达闼云端智能科技有限公司 Wake-up method and device for robot, and robot
CN107578776B (en) * 2017-09-25 2021-08-06 咪咕文化科技有限公司 Voice interaction awakening method and device and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104145304A (en) * 2012-03-08 2014-11-12 Lg电子株式会社 An apparatus and method for multiple device voice control
CN106471570A (en) * 2014-05-30 2017-03-01 苹果公司 Order single language input method more
CN107199971A (en) * 2017-05-03 2017-09-26 深圳车盒子科技有限公司 Vehicle-mounted voice exchange method, terminal and computer-readable recording medium
CN107204185A (en) * 2017-05-03 2017-09-26 深圳车盒子科技有限公司 Vehicle-mounted voice exchange method, system and computer-readable recording medium
CN107680591A (en) * 2017-09-21 2018-02-09 百度在线网络技术(北京)有限公司 Voice interactive method, device and its equipment based on car-mounted terminal
CN108091329A (en) * 2017-12-20 2018-05-29 江西爱驰亿维实业有限公司 Method, apparatus and computing device based on speech recognition controlled automobile

Also Published As

Publication number Publication date
CN109003611A (en) 2018-12-14

Similar Documents

Publication Publication Date Title
CN109003611B (en) Method, apparatus, device and medium for vehicle voice control
US10490186B2 (en) Parameter collection and automatic dialog generation in dialog systems
CN109841212B (en) Speech recognition system and speech recognition method for analyzing commands with multiple intents
US20230206911A1 (en) Processing natural language using machine learning to determine slot values based on slot descriptors
US20190163691A1 (en) Intent Based Dynamic Generation of Personalized Content from Dynamic Sources
CN107656996B (en) Man-machine interaction method and device based on artificial intelligence
US20160275148A1 (en) Database query method and device
US11830482B2 (en) Method and apparatus for speech interaction, and computer storage medium
CN110415679B (en) Voice error correction method, device, equipment and storage medium
US10482876B2 (en) Hierarchical speech recognition decoder
US8719025B2 (en) Contextual voice query dilation to improve spoken web searching
US11069351B1 (en) Vehicle voice user interface
WO2020233363A1 (en) Speech recognition method and device, electronic apparatus, and storage medium
EP3799640A1 (en) Semantic parsing of natural language query
CN113486170B (en) Natural language processing method, device, equipment and medium based on man-machine interaction
CN115455161A (en) Conversation processing method, conversation processing device, electronic equipment and storage medium
CN111312230B (en) Voice interaction monitoring method and device for voice conversation platform
US11062700B1 (en) Query answering with controlled access knowledge graph
CN111428011B (en) Word recommendation method, device, equipment and storage medium
US20220020361A1 (en) Systems and methods for fast filtering of audio keyword search
US11195102B2 (en) Navigation and cognitive dialog assistance
CN114625889A (en) Semantic disambiguation method and device, electronic equipment and storage medium
CN111883126A (en) Data processing mode selection method and device and electronic equipment
US11538480B1 (en) Integration of speech processing functionality with organization systems
CN114880351B (en) Recognition method and device of slow query statement, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20211018

Address after: 100176 101, floor 1, building 1, yard 7, Ruihe West 2nd Road, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Applicant after: Apollo Zhilian (Beijing) Technology Co.,Ltd.

Address before: 100080 No.10, Shangdi 10th Street, Haidian District, Beijing

Applicant before: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant