WO2022205211A1 - Method and apparatus for controlling vehicle running and vehicle - Google Patents
Method and apparatus for controlling vehicle running and vehicle Download PDFInfo
- Publication number
- WO2022205211A1 WO2022205211A1 PCT/CN2021/084731 CN2021084731W WO2022205211A1 WO 2022205211 A1 WO2022205211 A1 WO 2022205211A1 CN 2021084731 W CN2021084731 W CN 2021084731W WO 2022205211 A1 WO2022205211 A1 WO 2022205211A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vehicle
- user
- slot value
- driving
- training
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 113
- 238000012545 processing Methods 0.000 claims description 101
- 238000012549 training Methods 0.000 claims description 86
- 230000007613 environmental effect Effects 0.000 claims description 76
- 230000015654 memory Effects 0.000 claims description 54
- 230000004913 activation Effects 0.000 claims description 16
- 230000003190 augmentative effect Effects 0.000 claims description 10
- 238000003672 processing method Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 description 35
- 238000004891 communication Methods 0.000 description 28
- 238000010586 diagram Methods 0.000 description 26
- 230000006870 function Effects 0.000 description 20
- 230000033001 locomotion Effects 0.000 description 12
- 238000004590 computer program Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 8
- 230000002093 peripheral effect Effects 0.000 description 8
- 230000006399 behavior Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 7
- 238000013473 artificial intelligence Methods 0.000 description 6
- 230000003993 interaction Effects 0.000 description 6
- 241001465754 Metazoa Species 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000010267 cellular communication Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 239000000446 fuel Substances 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 2
- ATUOYWHBWRKTHZ-UHFFFAOYSA-N Propane Chemical compound CCC ATUOYWHBWRKTHZ-UHFFFAOYSA-N 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000002485 combustion reaction Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- HBBGRARXTFLTSG-UHFFFAOYSA-N Lithium ion Chemical compound [Li+] HBBGRARXTFLTSG-UHFFFAOYSA-N 0.000 description 1
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 229910001416 lithium ion Inorganic materials 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 239000003208 petroleum Substances 0.000 description 1
- 239000001294 propane Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W60/00—Drive control systems specially adapted for autonomous road vehicles
- B60W60/001—Planning or execution of driving tasks
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W30/00—Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
- B60W30/18—Propelling the vehicle
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2555/00—Input parameters relating to exterior conditions, not covered by groups B60W2552/00, B60W2554/00
- B60W2555/20—Ambient conditions, e.g. wind or rain
Definitions
- the present application relates to the field of automatic driving, and more particularly, to a method, device and vehicle for controlling the driving of a vehicle.
- Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
- artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence.
- Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
- Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
- Autopilot is a mainstream application in the field of artificial intelligence.
- Autopilot technology relies on the cooperation of computer vision, radar, monitoring devices and global positioning systems to allow motor vehicles to achieve autonomous driving without the need for human active operation.
- Autonomous vehicles use various computing systems to help transport users from one location to another. Some autonomous vehicles may require some initial or continuous input from a user, such as a pilot, driver, or passenger.
- An autonomous vehicle permits the operator to switch from a manual operating mode to an autonomous driving mode or a mode in between. Since automatic driving technology does not require humans to drive motor vehicles, it can theoretically effectively avoid human driving errors, reduce the occurrence of traffic accidents, and improve the efficiency of highway transportation. Therefore, autonomous driving technology is getting more and more attention.
- the driving basis of autonomous vehicles is based on the preset destination and the surrounding environment of the vehicle obtained by various sensors, and finally sends the user to the corresponding destination through the planned route.
- the user may have some temporary intentions that are different from driving to the destination according to the visual information around the vehicle. If you are close to the car in front, you need to keep your distance, etc.
- the user under the existing autonomous driving technology, if the user generates the above temporary intention, he can only temporarily take over the control of the vehicle through manual intervention, and then execute his own temporary intention. Since the vehicle has been switched to manual driving mode at this time, users can no longer enjoy the more worry-free and safer driving experience brought by autonomous driving technology.
- Level 5 (as defined by the Society of Automotive Engineers (SAE) on the level of automation)
- SAE Society of Automotive Engineers
- the present application provides a method, device and vehicle for controlling the driving of a vehicle, which can improve the user's sense of experience in the process of automatic driving.
- a method for controlling the driving of a vehicle is provided, and the method for controlling the driving of a vehicle provided by the present application can be executed by an electronic device supporting the driving of the vehicle.
- An electronic device refers to a computer system that can be abstracted.
- the electronic device supporting the control of the running of the vehicle may also be referred to as the device for controlling the running of the vehicle.
- the device for controlling the driving of the vehicle may be the whole machine of the electronic device, or may be part of the device in the electronic device, for example: a chip related to the function of controlling the driving of the vehicle, such as a system chip.
- the system chip is also called system on chip (system on chip, SOC), or SOC chip.
- the device for controlling the driving of the vehicle may be a terminal device or an in-vehicle device such as an in-vehicle computer, an in-vehicle machine, a mobile phone, etc. in the vehicle, or a processor, System-on-a-chip or other types of in-vehicle chips.
- an in-vehicle device such as an in-vehicle computer, an in-vehicle machine, a mobile phone, etc. in the vehicle, or a processor, System-on-a-chip or other types of in-vehicle chips.
- the method includes: in the automatic driving mode of the vehicle, acquiring user instructions; acquiring environmental information around the vehicle; performing multi-modal understanding on the user instructions and the environmental information around the vehicle to determine the user's driving intention; according to the user's driving intention, Generate autonomous driving control commands for the vehicle.
- the user's driving intention in the automatic driving mode of the vehicle, can be determined by acquiring user instructions and environmental information around the vehicle, and performing multi-modal understanding of the user instructions and environmental information around the vehicle; According to the user's driving intention, an automatic driving control command for the vehicle is generated.
- the user's temporary driving intention can be executed, and the user does not need to manually take over the control to execute the temporary driving intention, so that the user's experience in the process of automatic driving can be improved.
- the driving intent includes at least one intent, each of the at least one intent includes n slots, and each of the n slots includes Slot name, slot value and classification of slot value, n is greater than or equal to 0, n is an integer.
- the intent includes at least one of: stopping, overtaking, decelerating, following, and turning.
- the slot name includes at least one of: a parking position, a speed value, an overtaking or following object, and a steering orientation.
- the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value,
- the enumeration slot value indicates that the slot value is a predefined enumeration value
- the text slot value indicates that the slot value is a substring in the user command or the text generated according to the user command
- the environment slot value indicates The slot value is identified in the environment information according to the content mentioned in the user instruction.
- the environment class slot value includes an image class slot value, and the image can reflect the environment around the vehicle. Therefore, the image-type slot value may indicate that the slot value is an identification made in the image information according to the content mentioned in the user instruction.
- generating an automatic driving control instruction for the vehicle according to the user's driving intention includes: judging whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; The intent is feasible, and the autonomous driving control instructions for the vehicle are generated.
- prompt information may be generated and sent to the user.
- the prompt information may include the reason why the driving intention is not feasible.
- the driving intention after the driving intention is determined, it is judged whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; if the driving intention is feasible, the automatic driving control instruction for the vehicle is regenerated. In this way, it is possible to avoid violation of traffic laws or other problems when executing the user's driving intention in the automatic driving mode, which ensures the user experience and the safety of automatic driving during the automatic driving process.
- the user instruction includes any one or more of a user voice instruction, a user text instruction, and a user air gesture instruction.
- the user voice command or the user air gesture command can be converted into a user text command, and then the text command and the user gesture command can be converted into user text commands.
- the multimodal understanding of the surrounding environment information can also be performed directly on the user's voice command or the user's gesture command in the air, which is not limited in this application.
- the method further includes: sending a photographing activation signal to a photographing device to activate the photographing device to photograph environmental information around the vehicle; acquiring the environmental information around the vehicle includes: Obtain the environmental information around the vehicle photographed by the photographing device according to the photographing activation signal.
- the environmental information photographed by the photographing device may also be recorded as image information.
- the acquired environmental information may be not only image information captured by a photographing device, but also environmental information acquired by lidar, vehicle-mounted sensors, and/or Internet of Vehicles, etc., which is not limited in this application.
- acquiring environmental information around the vehicle includes: acquiring environmental information around the vehicle periodically photographed by the photographing device.
- the user's driving intention is presented to the user through an augmented reality-head-up display AR-HUD or a central control screen.
- the user's driving intention may be presented to the user in the form of augmented reality-head-up display AR-HUD or a central control screen, so that the user can timely judge the correctness of the multimodal understanding result.
- a device for controlling the driving of a vehicle includes an acquisition unit and a processing unit.
- the acquisition unit is used to acquire user instructions; the acquisition unit is further used to acquire information around the vehicle. environmental information; the processing unit is used for multimodal understanding of user instructions and environmental information around the vehicle to determine the user's driving intention; the processing unit is also used for generating automatic driving control instructions for the vehicle according to the user's driving intention.
- the driving intent includes at least one intent, each intent in the at least one intent includes n slots, and each of the n slots includes Slot name, slot value and classification of slot value, n is greater than or equal to 0, n is an integer.
- the intent includes at least one of: stopping, overtaking, decelerating, following, and turning.
- the slot name includes at least one of: a parking position, a speed value, an overtaking or following object, and a steering orientation.
- the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value, wherein the enumeration type slot value Indicates that the slot value is a predefined enumeration value, the text type slot value indicates that the slot value is a substring in the user instruction or the text generated according to the user instruction, and the environment type slot value indicates that the slot value is based on the user instruction.
- the mentioned content is identified in the environmental information.
- the processing unit is further configured to: determine whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; if the driving intention is feasible, generate an automatic driving control for the vehicle instruction.
- the user instructions include: any one or more of user voice instructions, user text instructions, and user air gesture instructions.
- the device further includes: a sending unit, where the sending unit is configured to send a photographing activation signal to the photographing device, so as to activate the photographing device to photograph environmental information around the vehicle;
- the acquiring unit is further configured to: acquire the environmental information around the vehicle photographed by the photographing device according to the photographing activation signal.
- the acquiring unit is further configured to: acquire environmental information around the vehicle periodically photographed by the photographing device.
- the user's driving intention is presented to the user through an augmented reality-head-up display AR-HUD or a central control screen.
- a training method for a multimodal processing module including: acquiring training data, the training data includes training input data and training target data, the training input data includes user instructions and environmental information around the vehicle, and the training target data Including the driving intention corresponding to the training input data; training the multimodal processing module according to the training input data and the training target data.
- the driving intent includes at least one intent, each intent in the at least one intent includes n slots, and each of the n slots includes Slot name, slot value and classification of slot value, n is greater than or equal to 0, n is an integer.
- the intent includes at least one of: stopping, overtaking, decelerating, following, and turning.
- the slot name includes at least one of: a parking position, a speed value, an overtaking or following object, and a steering orientation.
- the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value, wherein the enumeration type slot value Indicates that the slot value is a predefined enumeration value, the text type slot value indicates that the slot value is a substring in the user instruction or the text generated according to the user instruction, and the environment type slot value indicates that the slot value is based on the user instruction.
- the mentioned content is identified in the environmental information.
- a fourth aspect provides a training device for a multimodal processing module, including an acquisition unit and a processing unit, the acquisition unit is used to acquire training data, the training data includes training input data and training target data, and the training input data includes user instructions and the environment information around the vehicle, the training target data includes the driving intention corresponding to the training input data; the processing unit is used for training the multimodal processing module according to the training input data and the training target data.
- the driving intent includes at least one intent, each intent in the at least one intent includes n slots, and each of the n slots includes a slot Bit name, slot value and classification of slot value, n is greater than or equal to 0, n is an integer.
- the intent includes at least one of: stopping, overtaking, decelerating, following, and turning.
- the slot name includes at least one of: a parking location, a speed value, an overtaking or following object, and a steering orientation.
- the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value, wherein the enumeration type slot value Indicates that the slot value is a predefined enumeration value, the text type slot value indicates that the slot value is a substring in the user instruction or the text generated according to the user instruction, and the environment type slot value indicates that the slot value is based on the user instruction.
- the mentioned content is identified in the environmental information.
- another method for controlling the driving of a vehicle comprising: in an automatic driving mode of the vehicle, acquiring a user instruction; acquiring environmental information around the vehicle; and determining the user according to the user instruction and the environmental information The driving intention of the vehicle; at least according to the driving intention of the user, an automatic driving control instruction for the vehicle is generated; based on the automatic driving control instruction, the vehicle is controlled to drive.
- the user's driving intention in the automatic driving mode of the vehicle, can be determined by acquiring the user's instruction and the environmental information around the vehicle, and according to the user's instruction and the environmental information; of the autopilot control commands.
- the user's temporary driving intention can be executed, and the user does not need to manually take over the control to execute the temporary driving intention, so that the user's experience in the process of automatic driving can be improved.
- determining the user's driving intention according to the user instruction and the environment information includes: performing multimodal understanding on the user instruction and the environment information; The result of multimodal understanding determines the user's driving intention.
- the user instruction includes at least one of a user voice instruction, a user text instruction, and a user air gesture instruction.
- the driving intent includes at least one intent, each of the at least one intent includes n slots, and each of the n slots includes Slot name, slot value and classification of slot value, n is greater than or equal to 0, n is an integer.
- the intent includes at least one of: stopping, overtaking, decelerating, following, and turning.
- the slot name includes at least one of: a parking position, a speed value, an overtaking or following object, and a steering orientation.
- the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value,
- the enumeration slot value indicates that the slot value is a predefined enumeration value
- the text slot value indicates that the slot value is a substring in the user command or the text generated according to the user command
- the environment slot value indicates The slot value is identified in the environment information according to the content mentioned in the user instruction.
- an automatic driving control instruction for the vehicle is generated according to the user's driving intention; including: judging whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; The driving intention is feasible, and the automatic driving control command for the vehicle is generated.
- prompt information may be generated and sent to the user.
- the prompt information may include the reason why the driving intention is not feasible.
- the driving intention after the driving intention is determined, it is judged whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; if the driving intention is feasible, the automatic driving control instruction for the vehicle is regenerated. Therefore, it is possible to avoid violation of traffic laws or other problems when executing the user's driving intention in the automatic driving mode, thereby ensuring the user experience in the automatic driving process and the safety of automatic driving.
- the user's instruction to be acquired is to acquire the user's text instruction
- the user's natural voice instruction or the user's airspace instruction may be acquired first. Gesture commands; then convert natural voice commands or user air gesture commands into text commands.
- the method further includes: sending a photographing activation signal to a photographing device to activate the photographing device to photograph environmental information around the vehicle; acquiring the environmental information around the vehicle includes: Obtain the environmental information around the vehicle photographed by the photographing device according to the photographing activation signal.
- acquiring the environmental information around the vehicle includes: acquiring the environmental information around the vehicle periodically photographed by the photographing device.
- the user's driving intention is presented to the user through an augmented reality-head-up display AR-HUD or a central control screen.
- the user's driving intention may be presented to the user in the form of augmented reality-head-up display AR-HUD or a central control screen, so that the user can timely judge the correctness of the multimodal understanding result.
- another apparatus for controlling the running of a vehicle including various modules capable of implementing the method for controlling the running of a vehicle in the fifth aspect or any possible implementation manner of the fifth aspect.
- a seventh aspect provides a processing method for a multimodal processing module, where the multimodal processing module is obtained by training according to the third aspect or the training method in any possible implementation manner of the third aspect; the processing method includes: The multimodal processing module obtains input data, and the input data includes user instructions and environmental information around the vehicle; the multimodal processing module outputs the driving intention according to the input data.
- a multimodal processing module is provided, wherein the multimodal processing module is obtained by training according to the third aspect or the training method in any possible implementation manner of the third aspect; the multimodal processing module is obtained by training.
- the processing module includes: an acquisition unit for acquiring input data, where the input data includes user instructions and environmental information around the vehicle; and a processing unit for outputting driving intentions according to the input data.
- an autonomous driving vehicle including the device in the second aspect or any possible implementation of the second aspect; and/or, including the fourth aspect or any possible implementation of the fourth aspect and/or, including the above sixth aspect or the device in any possible implementation manner of the sixth aspect; and/or, including the above eighth aspect or the module in any possible implementation manner of the eighth aspect;
- a tenth aspect provides a device for controlling the driving of a vehicle, characterized by comprising a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions to execute the above-mentioned first aspect or The method for controlling the driving of a vehicle in any possible implementation manner of the first aspect; and/or, calling the program instructions to execute the fifth aspect or any possible implementation manner of the fifth aspect. Another way to control the movement of a vehicle.
- a training device for a multimodal processing module includes a processor and a memory, the memory is used for storing program instructions, and the processor is used for calling the program instructions to execute the above
- the third aspect or the method for training the multimodal processing module in any possible implementation manner of the third aspect is provided, characterized in that it includes a processor and a memory, the memory is used for storing program instructions, and the processor is used for calling the program instructions to execute the above.
- a twelfth aspect provides a system, where the system includes the above-mentioned second aspect or the apparatus in any possible implementation manner of the second aspect; and/or, includes the above-mentioned sixth aspect or any possible implementation of the sixth aspect device in the manner.
- the system may be a vehicle, or may be an on-board system on a vehicle, which is not limited in this application.
- a thirteenth aspect provides a computer program product containing instructions, which, when the computer program product runs on a computer, causes the computer to execute the control in the first aspect or any possible implementation manner of the first aspect A method for driving a vehicle; and/or, executing the another method for controlling the driving of a vehicle in the fifth aspect or any possible implementation manner of the fifth aspect.
- a fourteenth aspect provides a computer program product containing instructions, when the computer program product runs on a computer, the computer program product causes the computer to execute the third aspect or any of the possible implementations of the third aspect.
- the training method of the modality processing module is not limited to:
- a fifteenth aspect provides a computer-readable storage medium, where the computer-readable medium stores program code for execution by a device, the program code including the first aspect or any possibility for executing the first aspect
- a sixteenth aspect provides a computer-readable storage medium, where the computer-readable medium stores program code for execution by a device, the program code including the third aspect or any possibility for executing the third aspect.
- the training method of the multimodal processing module in the implementation manner of .
- a seventeenth aspect provides a chip, the chip includes a processor and a data interface, the processor reads instructions stored in a memory through the data interface, and executes the first aspect or any possibility of the first aspect The method for controlling the driving of a vehicle in the implementation manner of the above; and/or, executing the another method for controlling the driving of a vehicle in the fifth aspect or any possible implementation manner of the fifth aspect.
- the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method for controlling vehicle driving in the first aspect or any possible implementation manner of the first aspect; and/or, execute the fifth aspect or any possible implementation manner of the fifth aspect. Said another method of controlling the running of a vehicle.
- a chip in an eighteenth aspect, includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes the third aspect or any possibility of the third aspect.
- the training method of the multimodal processing module in the implementation manner of .
- the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the training method of the multimodal processing module in the third aspect or any possible implementation manner of the third aspect.
- FIG. 1 is a functional block diagram of a vehicle provided by an embodiment of the present application.
- FIG. 2 is an exemplary diagram of an automatic driving system to which an embodiment of the present application is applicable;
- FIG. 3 is an example diagram of an application of a cloud-side command to an autonomous driving vehicle according to an embodiment of the present application
- FIG. 4 is an example diagram of a method for controlling the driving of a vehicle provided by an embodiment of the present application
- FIG. 5 is an example diagram of a system architecture provided by an embodiment of the present application.
- FIG. 6 is an example diagram of a specific implementation provided by an embodiment of the present application.
- FIG. 7 is an exemplary diagram of another specific implementation manner provided by an embodiment of the present application.
- FIG. 8 is an exemplary diagram of a multimodal processing method provided by an embodiment of the present application.
- FIG. 9 is an exemplary diagram of another multimodal processing method provided by an embodiment of the present application.
- FIG. 10 is an example diagram of a training method for a multimodal processing module provided by an embodiment of the present application.
- FIG. 11 is an example diagram of an application scenario provided by an embodiment of the present application.
- FIG. 12 is an example diagram of a device for controlling the driving of a vehicle provided by an embodiment of the present application.
- 13 is a training device for a multimodal processing module provided by an embodiment of the present application.
- FIG. 14 is a schematic structural diagram of an apparatus provided by an embodiment of the present application.
- FIG. 15 is an example diagram of a computer program product provided by an embodiment of the present application.
- FIG. 1 is a functional block diagram of a vehicle provided by an embodiment of the present application.
- the vehicle 100 is configured in a fully or partially autonomous driving mode.
- the vehicle 100 can control itself while in an autonomous driving mode, and can determine the current state of the vehicle and its surroundings through human manipulation, determine the possible behavior of at least one other vehicle in the surrounding environment, and determine the other vehicles perform The confidence level corresponding to the likelihood of the possible behavior, the vehicle 100 is controlled based on the determined information.
- the vehicle 100 may be placed to operate without human interaction.
- Vehicle 100 may include various subsystems, such as travel system 102 , sensor system 104 , control system 106 , one or more peripherals 108 and power supply 110 , computer system 112 , and user interface 116 .
- vehicle 100 may include more or fewer subsystems, and each subsystem may include multiple elements. Additionally, each of the subsystems and elements of the vehicle 100 may be interconnected by wire or wirelessly.
- the travel system 102 may include components that provide powered motion for the vehicle 100 .
- travel system 102 may include engine 118 , energy source 119 , transmission 120 , and wheels/tires 121 .
- the engine 118 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a gasoline engine and electric motor hybrid engine, an internal combustion engine and an air compression engine hybrid engine.
- Engine 118 converts energy source 119 into mechanical energy.
- Examples of energy sources 119 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity.
- the energy source 119 may also provide energy to other systems of the vehicle 100 .
- Transmission 120 may transmit mechanical power from engine 118 to wheels 121 .
- Transmission 120 may include a gearbox, a differential, and a driveshaft.
- transmission 120 may also include other devices, such as clutches.
- the drive shaft may include one or more axles that may be coupled to one or more wheels 121 .
- the sensor system 104 may include several sensors that sense information about the environment surrounding the vehicle 100 .
- the sensor system 104 may include a positioning system 122 (the positioning system may be a global positioning system (GPS) system, a Beidou system or other positioning systems), an inertial measurement unit (IMU) 124, Radar 126 , laser rangefinder 128 and camera 130 .
- the sensor system 104 may also include sensors of the internal systems of the vehicle 100 being monitored (eg, an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensor data from one or more of these sensors can be used to detect objects and their corresponding characteristics (position, shape, orientation, velocity, etc.). This detection and identification is a critical function for the safe operation of the autonomous vehicle 100 .
- the positioning system 122 may be used to estimate the geographic location of the vehicle 100 .
- the IMU 124 is used to sense position and orientation changes of the vehicle 100 based on inertial acceleration.
- IMU 124 may be a combination of an accelerometer and a gyroscope.
- Radar 126 may utilize radio signals to sense objects within the surrounding environment of vehicle 100 . In some embodiments, in addition to sensing objects, radar 126 may be used to sense the speed and/or heading of objects.
- the laser rangefinder 128 may utilize laser light to sense objects in the environment in which the vehicle 100 is located.
- the laser rangefinder 128 may include one or more laser sources, laser scanners, and one or more detectors, among other system components.
- Camera 130 may be used to capture multiple images of the surrounding environment of vehicle 100 .
- Camera 130 may be a still camera or a video camera.
- Control system 106 controls the operation of the vehicle 100 and its components.
- Control system 106 may include various elements including steering system 132 , throttle 134 , braking unit 136 , sensor fusion algorithms 138 , computer vision system 140 , route control system 142 , and obstacle avoidance system 144 .
- the steering system 132 is operable to adjust the heading of the vehicle 100 .
- it may be a steering wheel system.
- the throttle 134 is used to control the operating speed of the engine 118 and thus the speed of the vehicle 100 .
- the braking unit 136 is used to control the deceleration of the vehicle 100 .
- the braking unit 136 may use friction to slow the wheels 121 .
- the braking unit 136 may convert the kinetic energy of the wheels 121 into electrical current.
- the braking unit 136 may also take other forms to slow the wheels 121 to control the speed of the vehicle 100 .
- Computer vision system 140 may be operable to process and analyze images captured by camera 130 in order to identify objects and/or features in the environment surrounding vehicle 100 .
- the objects and/or features may include traffic signals, road boundaries and obstacles.
- Computer vision system 140 may use object recognition algorithms, Structure from Motion (SFM) algorithms, video tracking, and other computer vision techniques.
- SFM Structure from Motion
- the computer vision system 140 may be used to map the environment, track objects, estimate the speed of objects, and the like.
- the route control system 142 is used to determine the travel route of the vehicle 100 .
- the route control system 142 may combine data from the sensors 138 , the GPS 122 , and one or more predetermined maps to determine a driving route for the vehicle 100 .
- the obstacle avoidance system 144 is used to identify, evaluate, and avoid or otherwise traverse potential obstacles in the environment of the vehicle 100 .
- control system 106 may additionally or alternatively include components other than those shown and described. Alternatively, some of the components shown above may be reduced.
- Peripherals 108 may include a wireless communication system 146 , an onboard computer 148 , a microphone 150 and/or a speaker 152 .
- peripherals 108 provide a means for a user of vehicle 100 to interact with user interface 116 .
- the onboard computer 148 may provide information to the user of the vehicle 100 .
- User interface 116 may also operate on-board computer 148 to receive user input.
- the onboard computer 148 can be operated via a touch screen.
- peripheral devices 108 may provide a means for vehicle 100 to communicate with other devices located within the vehicle.
- microphone 150 may receive audio (eg, voice commands or other audio input) from a user of vehicle 100 .
- speakers 152 may output audio to a user of vehicle 100 .
- Wireless communication system 146 may wirelessly communicate with one or more devices, either directly or via a communication network.
- wireless communication system 146 may use 3G cellular communications such as code division multiple access (CDMA), global system for mobile communications (GSM), general packet radio service , GPRS), or 4G cellular communications, such as long term evolution (LTE), or 5G cellular communications.
- the wireless communication system 146 may communicate with a wireless local area network (WLAN) using WiFi.
- the wireless communication system 146 may communicate directly with the device using an infrared link, Bluetooth, or the like.
- Other wireless protocols, such as various vehicle communication systems, for example, wireless communication system 146 may include one or more dedicated short range communications (DSRC) devices, which may include communication between vehicles and/or roadside stations public and/or private data communications.
- DSRC dedicated short range communications
- the power supply 110 may provide power to various components of the vehicle 100 .
- the power source 110 may be a rechargeable lithium-ion or lead-acid battery.
- One or more battery packs of such a battery may be configured as a power source to provide power to various components of the vehicle 100 .
- power source 110 and energy source 119 may be implemented together, such as in some all-electric vehicles.
- Computer system 112 may include at least one processor 113 that executes instructions 115 stored in a non-transitory computer-readable medium such as memory 114 .
- Computer system 112 may also be multiple computing devices that control individual components or subsystems of vehicle 100 in a distributed fashion.
- the processor 113 may be any conventional processor, such as a commercially available CPU. Alternatively, the processor may be a dedicated device such as an ASIC or other hardware-based processor.
- FIG. 1 functionally illustrates the processor, memory, and other elements of the computer 110 in the same block, one of ordinary skill in the art will understand that the processor, computer, or memory may actually include a processor, a computer, or a memory that may or may not Multiple processors, computers, or memories stored within the same physical enclosure.
- the memory may be a hard drive or other storage medium located within an enclosure other than computer 110 .
- reference to a processor or computer will be understood to include reference to a collection of processors or computers or memories that may or may not operate in parallel.
- some components such as the steering and deceleration components may each have their own processor that only performs computations related to component-specific functions .
- a processor may be located remotely from the vehicle and in wireless communication with the vehicle. In other aspects, some of the processes described herein are performed on a processor disposed within the vehicle while others are performed by a remote processor, including taking steps necessary to perform a single maneuver.
- the memory 114 may contain instructions 115 (eg, program logic) executable by the processor 113 to perform various functions of the vehicle 100 , including those described above.
- Memory 114 may also contain additional instructions, including instructions to send data to, receive data from, interact with, and/or control one or more of travel system 102 , sensor system 104 , control system 106 , and peripherals 108 . instruction.
- memory 114 may store data such as road maps, route information, vehicle location, direction, speed, and other such vehicle data, among other information. Such information may be used by the vehicle 100 and the computer system 112 during operation of the vehicle 100 in autonomous, semi-autonomous and/or manual modes.
- a user interface 116 for providing information to or receiving information from a user of the vehicle 100 .
- the user interface 116 may include one or more input/output devices within the set of peripheral devices 108 , such as a wireless communication system 146 , an onboard computer 148 , a microphone 150 and a speaker 152 .
- Computer system 112 may control functions of vehicle 100 based on input received from various subsystems (eg, travel system 102 , sensor system 104 , and control system 106 ) and from user interface 116 .
- computer system 112 may utilize input from control system 106 in order to control steering unit 132 to avoid obstacles detected by sensor system 104 and obstacle avoidance system 144 .
- computer system 112 is operable to provide control of various aspects of vehicle 100 and its subsystems.
- one or more of these components described above may be installed or associated with the vehicle 100 separately.
- memory 114 may exist partially or completely separate from vehicle 100.
- the above-described components may be communicatively coupled together in a wired and/or wireless manner.
- FIG. 1 should not be construed as a limitation on the embodiments of the present application.
- a self-driving car traveling on a road can recognize objects within its surroundings to determine adjustments to the current speed.
- the objects may be other vehicles, traffic control equipment, or other types of objects.
- each identified object may be considered independently, and based on the object's respective characteristics, such as its current speed, acceleration, distance from the vehicle, etc., may be used to determine the speed at which the autonomous vehicle is to adjust.
- the autonomous vehicle vehicle 100 or a computing device associated with the autonomous vehicle 100 may be based on the characteristics of the identified objects and the state of the surrounding environment (eg, traffic, rain, ice on the road, etc.) to predict the behavior of the identified object.
- each identified object is dependent on the behavior of the other, so it is also possible to predict the behavior of a single identified object by considering all identified objects together.
- the vehicle 100 can adjust its speed based on the predicted behavior of the identified object.
- the self-driving car can determine what steady state the vehicle will need to adjust to (eg, accelerate, decelerate, or stop) based on the predicted behavior of the object.
- other factors may also be considered to determine the speed of the vehicle 100, such as the lateral position of the vehicle 100 in the road being traveled, the curvature of the road, the proximity of static and dynamic objects, and the like.
- the computing device may also provide instructions to modify the steering angle of the vehicle 100 so that the self-driving car follows a given trajectory and/or maintains contact with objects in the vicinity of the self-driving car (eg, , cars in adjacent lanes on the road) safe lateral and longitudinal distances.
- objects in the vicinity of the self-driving car eg, , cars in adjacent lanes on the road
- the autonomous vehicle 100 or a computing device associated with the autonomous vehicle 100 may also be based on the state of the vehicle and the detected environmental information, Predict the availability of autonomous driving on the road ahead and control the switching between autonomous and manual driving modes.
- the above-mentioned vehicle 100 can be a car, a truck, a motorcycle, a bus, a boat, an airplane, a helicopter, a lawn mower, a recreational vehicle, a playground vehicle, construction equipment, a tram, a golf cart, a train, a cart, etc.
- the application examples are not particularly limited.
- FIG. 2 is an example diagram of an automatic driving system provided by an embodiment of the present application.
- the automatic driving system shown in FIG. 2 includes a computer system 101 , wherein the computer system 101 includes a processor 103 , and the processor 103 is coupled with a system bus 105 .
- the processor 103 may be one or more processors, each of which may include one or more processor cores.
- a video adapter 107 which can drive a display 109, is coupled to the system bus 105.
- the system bus 105 is coupled to an input/output (I/O) bus 113 through a bus bridge 111 .
- I/O interface 115 is coupled to the I/O bus.
- I/O interface 115 communicates with various I/O devices, such as input device 117 (eg, keyboard, mouse, touch screen, etc.), media tray 121, (eg, compact disc read-only) memory, CD-ROM), multimedia interface, etc.).
- Transceiver 123 which can transmit and/or receive radio communication signals
- camera 155 which can capture sceneries and dynamic digital video images
- USB universal serial bus
- the processor 103 may be any conventional processor, including a reduced instruction set computing (reduced instruction set computer, RISC) processor, a complex instruction set computing (complex instruction set computer, CISC) processor or a combination of the above.
- the processor may be a dedicated device such as an application specific integrated circuit (ASIC).
- the processor 103 may be a neural network processor or a combination of a neural network processor and the above-mentioned conventional processors.
- computer system 101 may be located remotely from the autonomous vehicle and may communicate wirelessly with the autonomous vehicle.
- some of the processes described herein are performed on a processor disposed within the autonomous vehicle, others are performed by a remote processor, including taking actions required to perform a single maneuver.
- Network interface 129 is a hardware network interface, such as a network card.
- the network 127 may be an external network, such as the Internet, or an internal network, such as an Ethernet network or a virtual private network (VPN).
- the network 127 may also be a wireless network, such as a WiFi network, a cellular network, and the like.
- the hard disk drive interface is coupled to the system bus 105 .
- the hard drive interface is connected to the hard drive.
- System memory 135 is coupled to system bus 105 . Data running in system memory 135 may include operating system 137 and application programs 143 of computer 101 .
- the operating system includes a parser 139 (shell) and a kernel 141 (kernel).
- the shell 139 is an interface between the user and the kernel of the operating system.
- the shell is the outermost layer of the operating system.
- the shell manages the interaction between the user and the operating system: waiting for user input, interpreting user input to the operating system, and processing various operating system output.
- Kernel 141 consists of those parts of the operating system that manage memory, files, peripherals, and system resources. Interacting directly with hardware, the operating system kernel usually runs processes and provides inter-process communication, providing CPU time slice management, interrupts, memory management, IO management, and more.
- Application 143 includes programs that control the autonomous driving of the car, for example, programs that manage the interaction of the autonomous car with obstacles on the road, programs that control the route or speed of the autonomous car, and programs that control the interaction of the autonomous car with other autonomous vehicles on the road. .
- Application 143 also exists on the system of deploying server 149.
- computer system 101 may download application 143 from deploying server 14 when application 147 needs to be executed.
- the application 141 may be a program that controls the autonomous vehicle to activate or deactivate the assisted autonomous driving function.
- Sensor 153 is associated with computer system 101 .
- the sensor 153 is used to detect the environment around the computer 101 .
- the sensor 153 can detect animals, cars, obstacles and pedestrian crossings, etc. Further sensors can also detect the environment around the above-mentioned animals, cars, obstacles and pedestrian crossings, such as: the environment around animals, for example, animals appear around other animals, weather conditions, ambient light levels, etc.
- the sensors may be cameras, infrared sensors, chemical detectors, microphones, and the like.
- Computer system 112 in FIG. 1 may also receive information from or transfer information to other computer systems.
- sensor data collected from the sensor system 104 of the vehicle 100 may be transferred to another computer for processing of the data.
- data from the computer system 312 may be transmitted via a network to a server 320 on the cloud side (which may also be referred to as the cloud) for further processing.
- Networks and intermediate nodes may include various configurations and protocols, including the Internet, the World Wide Web, Intranets, Virtual Private Networks, Wide Area Networks, Local Area Networks, private networks using one or more of the company's proprietary communication protocols, Ethernet, WiFi, and hypertext The hypertext transfer protocol (HTTP), and various combinations of the foregoing.
- Such communications may be by any device capable of transferring data to and from other computers, such as modems and wireless interfaces.
- data such as vehicle status and environmental information are transmitted to the cloud-side server 320 for further processing.
- the cloud-side server can use a variety of neural network models to identify and process these data, and feed the identification results back to the computer system 312, so that The computer system 312 may determine whether the assisted autopilot function is turned on or off.
- server 320 may include a server having multiple computers, such as a load balancing server farm, that exchange information with different nodes of the network for the purpose of receiving, processing, and transmitting data from computer system 312 .
- the server may be configured similarly to computer system 312 , with processor 330 , memory 340 , instructions 350 , and data 360 .
- An automated driving system may contain several assisted automated driving functions. Such as pre-collision safety braking (pre-collision system, PCS), adaptive cruise control (adaptive cruise control, ACC), lane keeping assist (lane keeping aid, LKA), cross traffic alert (cross traffic alert, CTA), Rear cross traffic alert (RCTA), blind spot warning (BSW), off vehicle warning and traffic jam assist (TJA), etc.
- pre-collision safety braking pre-collision system, PCS
- adaptive cruise control adaptive cruise control
- ACC adaptive cruise control
- LKA lane keeping assist
- cross traffic alert crossing traffic alert
- CTA Rear cross traffic alert
- BW blind spot warning
- TJA off vehicle warning and traffic jam assist
- the driving basis of autonomous vehicles is based on the preset destination and the surrounding environment of the vehicle obtained by various sensors, and finally sends the user to the corresponding destination through the planned route.
- the user may have some temporary intentions that are different from driving to the destination according to the visual information around the vehicle. If you are close to the car in front, you need to keep your distance, etc.
- the present application provides a method for controlling the driving of a vehicle, so that during the process of driving an autonomous vehicle in the automatic driving mode, if the user has a temporary intention, the user's instructions and the surrounding environment information of the vehicle can be multi-processed.
- Modal understanding determine the user's driving intention, and control the motion of the vehicle according to the user's driving intention. Therefore, the user's temporary intention can be executed in the automatic driving mode, and the user's experience in the automatic driving process can be further improved.
- FIG. 4 is an example diagram of a method for controlling the driving of a vehicle provided by an embodiment of the present application. It should be understood that the method shown in FIG. 4 can be applied to the vehicle shown in FIG. 1 or the automatic driving system shown in FIG. 2 . It should be understood that the method shown in FIG. 4 is performed in an automatic driving mode.
- the method 400 includes steps S410 to S440, which will be described in detail below.
- the user instruction includes: any one or more of a user's natural voice instruction (ie, a user's voice instruction), a user text instruction, and a user air gesture instruction, which is not limited in this application.
- a user's natural voice instruction ie, a user's voice instruction
- a user text instruction ie, a user's voice instruction
- a user air gesture instruction ie, a user's voice instruction
- Temporary intentions can be input to related in-vehicle devices by means of user instructions.
- the temporary intent is input into the microphone by means of natural voice instructions; for another example, the temporary intent is input into the relevant user action acquisition device by means of air gesture instructions; for example, the temporary intent is transmitted by means of text instructions It is directly input into the relevant text input device, which is not limited in this application.
- the user text instruction in the above step S410 may be obtained directly from the user through the relevant text entry device, or the user may be obtained from other devices first.
- a voice command or an air gesture command is then converted into a text command through a related device.
- the present application does not limit the acquisition method of the text command.
- the user if the user generates a temporary intention, he can use natural speech to speak his intention to the relevant in-vehicle device (eg, a microphone) in the car.
- the conversion of natural speech instructions into text instructions may be implemented by automatic speech recognition (ASR).
- ASR automatic speech recognition
- the user's text instruction is acquired, and specifically, the text instruction may be acquired from the ASR.
- the air gesture instruction can be converted into a text instruction by the relevant gesture recognition device.
- the environmental information around the vehicle can be acquired through a photographing device, specifically, an image or video is acquired through the photographing device, so as to reflect the environmental information through the information in the image or video; it can also be obtained through lidar, vehicle-mounted sensors and/or vehicle
- This application does not limit the environmental information obtained through networking or the like.
- the solution will be described by taking the photographing device acquiring the environmental information as an example.
- the photographing device may obtain video information or image information, or may first obtain video information around the vehicle, and then obtain image information from the video, which is not limited in this application.
- the acquisition of image information captured by a photographing device is taken as an example for description, but it should be understood that this does not constitute a limitation to the present application.
- a shooting activation signal may be sent to the photographing apparatus to activate the photographing apparatus to photograph image information (ie, environmental information) around the vehicle.
- image information ie, environmental information
- the photographing device photographs the surrounding image information
- the surrounding image information photographed by the photographing device is acquired.
- the photographing device may periodically photograph image information around the vehicle.
- acquiring image information around the vehicle may include: acquiring image information around the vehicle periodically captured by a photographing device.
- the suitable image information may be the image information newly captured by the photographing device, or may be image information corresponding to a specific time interval estimated according to the recognition time of natural voice commands or air gesture commands. It may also be the image information corresponding to the acquisition of the text instruction. Specifically, the selection of the image information should be carried out according to the actual situation, which is not limited in this application.
- S430 perform multimodal understanding on the user's instruction and the environmental information around the vehicle, and determine the user's driving intention. or,
- the above step S430 may also be: determining the user's driving intention according to the user's instruction and environmental information around the vehicle.
- determining the user's driving intention according to the user's instruction and environmental information around the vehicle This means that the solution of the present application does not limit the way of determining the user's driving intention according to the user's instructions and the environmental information around the vehicle. Determined by other means, which is not limited in this application.
- the multimodal understanding of the user's instruction and the environmental information around the vehicle to determine the user's driving intention is used as an example for description.
- step S430 can be completed in a multi-modal processing module (ie, the multi-modal processing module 540 in FIG. 5 ).
- the module will be described below with reference to FIG. 5 , and the process of multimodal processing will be described with reference to FIG. 8 and FIG. 9 , which will not be repeated here.
- the driving intention includes at least one intention, each intention in the at least one intention includes n slots, and each slot in the n slots includes a slot name, a slot value, and a classification of the slot value, n is greater than or equal to 0, and n is an integer.
- the intent may include at least one of: stop, overtake, slow down, follow, turn, and the like. It should be understood that other intentions may also be included in actual operations, which are not limited in this application.
- the slot name may include at least one of: a parking position, a speed value, an overtaking or following object, a turning direction, and the like. It should be understood that in actual operation, other slot names may also be included, which are not limited in this application.
- the classification of the slot value may be: an enumeration type slot value, a text type slot value or an environment type slot value.
- the enumeration class slot value indicates that the slot value is a predefined enumeration value. For example: the user command is "turn right at the next intersection”. At this time, there is a slot corresponding to the steering orientation. Since the steering orientation can be enumerated, for example, there are only four options for the steering orientation: left, right, straight, U-turn. At this time, the slot value of the slot "turning orientation" is "right", and the slot value can be understood as an enumeration type slot value.
- the text-type slot value indicates that the slot value is a substring in the user instruction or the text generated according to the user instruction. It should be understood that the slot value at this time is a non-enumerable value.
- the user command is "stop next to the gas station”. At this time, there is a slot corresponding to the parking position. Since the parking position cannot be enumerated, at this time, the substring in the command can be used. "Beside the station” is used as a slot value, which can be understood as a text-based slot value.
- the user's instruction is "park at the luxurious hotel in front”. At this time, there is a slot corresponding to the parking position.
- the text generated according to the instruction can be used.
- "High-level hotel” is used as a slot value, which can also be understood as a text-based slot value. It should be understood that the above-mentioned descriptions are all described below by taking a user text instruction as an example. Then, the text-type slot value indicates that the slot value may be a substring in the user text instruction or text generated according to the user text instruction, and the following embodiments take this as an example.
- the environment class slot value indicates that the slot value is identified in the environment information according to the content mentioned in the user instruction.
- the environmental information when the environmental information is acquired by the photographing device, the environmental information may be image information, then the environment-based slot value may also be an image-based slot value, and the image can reflect the environment around the vehicle. Therefore, the image-type slot value indicates that the slot value is identified in the image information according to the content mentioned in the user instruction.
- the image-type slot value indicates that the slot value is identified in the image information according to the content mentioned in the user instruction.
- the image-type slot value indicates that the slot value is identified in the image information according to the content mentioned in the user instruction.
- the image-type slot value indicates that the slot value is identified in the image information according to the content mentioned in the user instruction.
- the user command is "drive to the blue car position and pull over to the side”
- there is a slot corresponding to the parking position Since the parking position is the "blue car position", you can use
- the rectangular frame identifies the "blu
- driving intention includes at least one intention
- the driving intention may include one intention or multiple intentions at the same time. For example, when the user instruction is "turn right at the next intersection”, it includes a steering intent; when the user instruction is “turn right at the next intersection and stop”, it includes a steering intent and a parking intent.
- each intent in at least one intent includes n slots, each of the n slots includes a slot name, a slot value, and a classification of the slot value, and n is greater than or equal to 0. , n is an integer", which means that the intent may include one or more slots describing the intent, or may not include the slot. If the intent includes a slot describing the intent, each corresponding slot includes a slot name, a slot value, and a classification of the slot value.
- the representation is to stop, and there is no slot describing the intent at this time, and subsequent operations can be performed directly based on the intent.
- the slot name, slot value, and slot corresponding to the slot can be listed according to the user command. Classification of bit values.
- the slot name, slot value, and classification of the slot value corresponding to the first slot of the parking intent may be the parking location, the gas station ahead, and the text-based slot value, respectively;
- the slot name, slot value and the classification of the slot value corresponding to the two slots can be the parking position, the rectangular frame (identifying the gas station ahead in the image information), and the image slot value.
- the user's driving intention can be presented to the user through an augmented reality-head up display (AR-HUD) or a central control screen, so that the user can timely judge the correctness of the multimodal understanding result.
- AR-HUD augmented reality-head up display
- central control screen a central control screen
- the AR-HUD can present the object mentioned by the user on the windshield (such as the rectangular box shown in (a) in Figure 11), or use the AR-HUD The control screen, etc. displays the objects mentioned by the user.
- an automatic driving control instruction for the vehicle may be generated according to the above-obtained driving intention. So that the vehicle can control the vehicle according to the automatic driving control instruction in the automatic driving mode.
- the driving intention after the driving intention is determined, it is judged whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; if the driving intention is feasible, the automatic driving control instruction for the vehicle is regenerated. Therefore, it is possible to avoid violation of traffic laws or other problems when executing the user's driving intention in the automatic driving mode, thereby ensuring the user experience in the automatic driving process and the safety of automatic driving.
- prompt information may be generated and sent to the user.
- the prompt information may also include the reason why the driving intention is not feasible.
- the vehicle can also prompt the user through a voice broadcast, such as "parking for you"; it can also use AR-HUD or the central control screen to display the target path and the target path of the vehicle to be driven.
- the target position is displayed to the user (eg, dynamic arrows and boxes shown in (b) of FIG. 11 ).
- the above-mentioned method 400 may be executed on a cloud server or an edge cloud server, or may be executed in a computer system of a vehicle, which is not limited in this application.
- the user's driving intention in the automatic driving mode of the vehicle, can be determined by acquiring user instructions and environmental information around the vehicle, and performing multi-modal understanding of the user instructions and environmental information around the vehicle; Then, according to the user's driving intention, an automatic driving control command for the vehicle is generated.
- the user's temporary driving intention can be executed, and the user does not need to manually take over the control to execute the temporary driving intention, so that the user's experience in the process of automatic driving can be improved.
- FIG. 5 is an example diagram of a system architecture provided by an embodiment of the present application. It should be understood that the system architecture is only an example, and does not constitute a limitation to the present application. As shown in FIG. 5 , the system architecture 500 includes: a microphone 510, an automatic speech recognition (ASR) module 520, a camera 530 (ie, a photographing device), a multimodal processing module 540, a decision planning calculation module 550 and Vehicle motion control module 560 . These modules are described below.
- ASR automatic speech recognition
- Microphone 510 a microphone or microphone group deployed in the vehicle cockpit, used to collect audio information of the user in the cockpit, that is, the user's voice command involved in this application, which may also be referred to as the user's natural voice command.
- ASR module 520 used to recognize the user's natural language instructions collected by the microphone 510, and convert the user's natural language instructions into text instructions.
- Camera 530 a camera or camera group deployed on the vehicle, used to collect image information around the vehicle.
- Multimodal processing module 540 mainly includes a multimodal intent recognition engine. It is used to receive the text instruction recognized by the ASR module 520 and the image information collected by the camera 530, and generate the corresponding driving intention according to the text instruction and the image information. And in some cases, the multimodal processing module 540 can also be used to control the camera 530 to collect image information, as shown in Embodiment 1 below.
- Decision planning calculation module 550 used for judging the driving intention generated by the multimodal processing module 540 in combination with traffic regulations, surrounding environment and other conditions to determine whether the driving intention is feasible. The driving intent is adjusted where necessary, and vehicle control commands are generated.
- Vehicle motion control module 560 used to control the vehicle motion according to the vehicle control command from the decision planning calculation module 550 .
- FIG. 6 is an example diagram of a specific implementation provided by an embodiment of the present application. As shown in FIG. 6 , the specific implementation includes steps 1 to 11, and these steps are described in detail below.
- Step 1 The user issues a voice command.
- Step 2 Send natural voice commands.
- the microphone 510 sends the received natural voice instruction to the ASR module 520 .
- the ASR module 520 performs voice recognition on the received voice command, and identifies the text command corresponding to the voice command.
- Step 4. Transmit user text instructions.
- the ASR module 520 transmits the recognized textual instructions to the multimodal processing module 530 .
- Step 5 Send a capture activation signal.
- the multimodal processing module 530 After receiving the text instruction, the multimodal processing module 530 sends a shooting activation signal to the camera 530 to activate the camera 530 to collect surrounding image information.
- Step 6 Capture image information around the vehicle.
- the camera 530 After the camera 530 receives the shooting activation signal, it shoots image information around the vehicle.
- Step 7 Send image information around the vehicle.
- the camera 530 sends the captured image information around the vehicle to the multimodal processing module 540 .
- Step 8 Multimodal understanding based on textual instructions and image information.
- the multimodal processing module 540 performs multimodal understanding based on the text instruction and image information, and obtains the user's driving intention.
- Step 9 Send driving intent.
- the multimodal processing module 540 sends the driving intention identified in step 8 to the decision planning calculation module 550 .
- Step 10 Determine if the intent is feasible.
- the user's driving intention may not comply with the traffic laws (for example, the user requires the opposite direction of the one-way street or requests to stop at the intersection where parking is not possible, etc.); or, the user's driving intention may not be realized in the current surrounding environment; or some other circumstances lead to The user's driving intent may not be realized.
- the decision planning calculation module 550 needs to judge whether the driving intention is feasible according to the driving intention in combination with necessary information such as the surrounding environment and traffic regulations, generate prompt information according to the judgment result, and notify the user. For example, if the judgment result is infeasible, the user's driving intention cannot be executed, and the user can be informed of the reason for the inability to execute. If the judgment result is feasible, step 11 is executed.
- Step 11 Adjust the driving parameters of the vehicle according to the driving intention, surrounding environment, traffic regulations and other information.
- the decision planning calculation module 550 determines the specific vehicle motion control instruction according to the driving intention, surrounding environment, traffic regulations and other necessary information, and sends it to the vehicle motion control module 560 .
- the vehicle motion control module 560 performs specific execution operations according to the vehicle motion control instructions.
- control instruction of the vehicle motion may be modified according to the actual situation, so that the vehicle continues to drive in the automatic driving mode to the final destination to be reached by the user.
- FIG. 7 is an example diagram of another specific implementation manner provided by an embodiment of the present application. As shown in FIG. 7 , the specific implementation includes steps 1 to 10, and these steps are described in detail below.
- Step 1 to Step 4 Reference may be made to Step 1 to Step 4 in the previous implementation manner (in FIG. 6 ), which will not be repeated here.
- Step 5 Periodically capture image information around the vehicle.
- the camera 530 periodically captures image information around the vehicle.
- Step 6 Send image information around the vehicle.
- the camera 530 periodically sends the captured image information around the vehicle to the multimodal processing module 540 .
- Step 7 Multimodal understanding based on textual instructions and image information.
- the multi-modal processing module 540 obtains the user's driving intention based on multi-modal understanding of the text instruction and image information at an appropriate time.
- the image information at the appropriate time may be the latest image information, or may be image information corresponding to a specific time interval estimated according to the recognition time of the natural language instruction.
- Step 8 to Step 10 Reference may be made to Step 9 to Step 11 in the previous implementation (in FIG. 6 ), which will not be repeated here.
- FIG. 8 is an example diagram of a multimodal processing process provided by an embodiment of the present application.
- the multi-modal processing mainly inputs user instructions and environmental information into the multi-modal processing module, and the multi-modal understanding is carried out through the multi-modal processing module, and finally the driving intention is output.
- the multimodal processing module is obtained through pre-training. Specifically, in the training process, user instructions (such as user voice instructions, user text instructions or user air gesture instructions), environmental information (such as image information), and corresponding driving intentions can be used as training data to perform multimodal processing.
- the modules are trained as shown in Figure 10. So that in the application stage of the multimodal processing module, after inputting user instructions and environmental information, the corresponding driving intention can be output.
- FIG. 9 is an exemplary diagram of another multimodal processing process provided by an embodiment of the present application.
- text instructions are used as user instructions
- image information is used as environmental information.
- FIG. 9 is only a structural example of the multimodal processing module shown in FIG. 8 , and does not constitute a limitation to the present application. It should be understood that, in practice, the structure of the multimodal processing module can also take other forms, and the structure of the multimodal processing module can also be composed of other processing models, networks or modules, as long as the input text instructions and images can be realized. It is enough to output the driving intention of the information.
- the multimodal processing process in this example will be described below with reference to FIG. 9 .
- the multimodal processing module may include a text processing model, a convolutional neural network (CNN), an attention module att.1 and an attention module att.2.
- the text processing model may be a BERT model commonly used in text processing, or may be other models that can be used for text processing, which is not limited in this application.
- the CNN network can be a deep residual network (Deep residual network, ResNet), etc., which is not limited.
- the process of the multimodal processing module for understanding the driving intent can be as follows:
- the text instruction extracts the corresponding text features through the BERT model; the image information extracts the corresponding image features through the CNN network (eg: ResNet).
- the CNN network eg: ResNet
- the attention module att.1 is used to synthesize the text features with the image features, so as to obtain at least one intent and n slots corresponding to each intent in the at least one intent, where n is greater than or equal to 0, and n is an integer.
- each of the n slots includes a slot name, a slot value and a classification of the slot value, wherein the classification of the slot value is an enumeration slot value, a text slot value or an image class Slot value (see the description of the driving intention in Figure 4).
- the slot value of a certain slot corresponding to the intent obtained by the attention module att.1 is classified as an image class slot value, then the image feature is integrated with the text feature through the attention module att.2, so as to obtain the slot value
- the slot value of the bit that is, the rectangular frame of the object mentioned in the user text instruction, for example, the rectangular frame corresponding to the blue car in Figure 11.
- the information obtained by att.1 and att.2 is the driving intention.
- FIG. 10 is an example diagram of a training method for a multimodal processing module provided by an embodiment of the present application. As shown in FIG. 10, the training method 1000 includes steps S1010 and S1020, and the steps are described below.
- the training data includes training input data and training target data
- the training input data includes user instructions and environmental information around the vehicle
- the training target data includes the driving intention corresponding to the training input data
- the driving intent includes at least one intent, each intent in the at least one intent includes n slots, and each of the n slots includes a slot name, a slot value, and a classification of the slot value, where n is greater than or equal to 0, where n is an integer.
- the intent includes at least one of: stop, overtake, slow down, follow, turn, and the like.
- the slot name includes at least one of: a parking position, a speed value, an overtaking or following object, a turning direction, and the like.
- Slot values are classified as: enumeration type slot value, text type slot value or environment type slot value.
- the enumeration slot value indicates that the slot value is a predefined enumeration value
- the text slot value indicates that the slot value is a substring in the user command or the text generated according to the user command
- the environment slot value indicates The slot value is identified in the environment information according to the content mentioned in the user instruction.
- FIG. 11 is an example diagram of an application scenario provided by an embodiment of the present application. It should be understood that the application scenario shown in FIG. 11 is only an example, and does not constitute a limitation to the present application. The application scenario is described below with reference to FIG. 11 .
- the user of the autonomous driving vehicle temporarily generates a new driving intention when the vehicle is driving in the autonomous driving mode according to a preset destination, and expresses a voice to the vehicle (for example, the vehicle the microphone on the top) to issue natural voice commands, such as "drive to the blue car position and pull over”.
- a voice for example, the vehicle the microphone on the top
- natural voice commands such as "drive to the blue car position and pull over”.
- the relevant on-board devices on the vehicle such as the ASR module, recognize the natural language commands and convert them into text commands.
- the device or related module on the vehicle for controlling the driving of the vehicle determines the temporary intention of the user (that is, the user needs to park on the roadside of the blue car in front) through the above method 400, and then the device or related module determines the temporary driving intention of the vehicle according to the temporary driving intention of the vehicle. Generate appropriate vehicle control commands and issue them to the vehicle.
- the vehicle can also provide user feedback through voice announcements and/or augmented reality-head up display (AR-HUD). As shown in (b) of Figure 11, the vehicle can prompt the user through voice broadcast, such as "stopping for you"; it can also display the target path and target location of the vehicle to be driven by AR-HUD. user.
- AR-HUD augmented reality-head up display
- this application scenario can also be understood as a user display interface, which can present the driving intention to the user, such as the rectangular frame shown in (a) in FIG.
- the target position for travel is shown as arrows and boxes as shown in (b) of FIG. 11 .
- FIG. 12 is an example diagram of a device for controlling the driving of a vehicle provided by an embodiment of the present application.
- the apparatus 1200 includes an acquisition unit 1210 and a processing unit 1220 .
- the obtaining unit 1210 is configured to obtain user instructions.
- the acquiring unit 1210 is further configured to acquire environmental information around the vehicle.
- the processing unit 1220 is configured to perform multimodal understanding on user instructions and environmental information around the vehicle to determine the user's driving intention.
- the processing unit 1220 is further configured to generate an automatic driving control instruction for the vehicle according to the user's driving intention.
- the driving intent may include at least one intent, each intent in the at least one intent includes n slots, and each of the n slots includes a slot name, a slot value, and a value of the slot value.
- Classification, n is greater than or equal to 0, and n is an integer.
- the intent may include at least one of: stop, overtake, slow down, follow, turn, and the like.
- the slot name may include at least one of: a parking position, a speed value, an overtaking or following object, a turning direction, and the like.
- the classification of the slot value may be: an enumeration class slot value, a text class slot value or an environment class slot value, wherein the enumeration class slot value indicates that the slot value is a predefined enumeration value.
- the text slot value indicates that the slot value is a substring in the user command or the text generated according to the user command
- the environment slot value indicates that the slot value is made in the environment information according to the content mentioned in the user command logo.
- the processing unit 1220 may also be used to: determine whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; if the driving intention is feasible, generate an automatic driving control instruction for the vehicle.
- the user instruction includes any one or more of a user voice instruction, a user text instruction, and a user air gesture instruction.
- the apparatus 1200 may further include: a sending unit 1230, the sending unit 1230 may be configured to send a photographing activation signal to the photographing apparatus, so as to activate the photographing apparatus to photograph the environmental information around the vehicle;
- the acquiring unit 1210 may also be configured to: acquire the environmental information around the vehicle photographed by the photographing device according to the photographing activation signal.
- the acquiring unit 1210 may be further configured to: acquire environmental information around the vehicle periodically photographed by the photographing device.
- the user's driving intention can be presented to the user through an augmented reality-head-up display AR-HUD or a central control screen.
- FIG. 13 is a training device for a multimodal processing module provided by an embodiment of the present application.
- the apparatus 1300 includes an acquisition unit 1310 and a processing unit 1320 .
- the obtaining unit 1310 is configured to obtain training data, the training data includes training input data and training target data, the training input data includes user instructions and environmental information around the vehicle, and the training target data includes the driving intention corresponding to the training input data.
- the processing unit 1320 is configured to train the multimodal processing module according to the training input data and the training target data.
- the driving intention may include at least one intention, each intention in the at least one intention includes n slots, and each slot in the n slots includes a slot name, a slot value, and a classification of the slot value.
- n is greater than or equal to 0, and n is an integer.
- the intent may include at least one of: stop, overtake, slow down, follow, turn, and the like.
- the slot name may include at least one of: a parking position, a speed value, an overtaking or following object, a turning direction, and the like.
- the slot value can be classified as: an enumeration slot value, a text slot value or an environment slot value, wherein the enumeration slot value indicates that the slot value is a predefined enumeration value.
- the text slot value indicates that the slot value is a substring in the user command or the text generated according to the user command
- the environment slot value indicates that the slot value is made in the environment information according to the content mentioned in the user command logo.
- FIG. 14 is a schematic structural diagram of an apparatus provided by an embodiment of the present application.
- the apparatus 1400 includes a processor 1402 , a communication interface 1403 and a memory 1404 .
- one example of the apparatus 1400 may be a chip.
- Another example of apparatus 1400 may be a computing device.
- the processor 1402, the memory 1404 and the communication interface 1403 can communicate through a bus.
- Executable code is stored in the memory 1404, and the processor 1402 reads the executable code in the memory 1404 to execute the corresponding method.
- the memory 1404 may also include other software modules required for running processes such as an operating system.
- the operating system can be LINUX TM , UNIX TM , WINDOWS TM and the like.
- the executable code in the memory 1404 is used to implement the method shown in FIG. 4 or FIG. 10
- the processor 1402 reads the executable code in the memory 1404 to execute the method shown in FIG. 4 or FIG. 10 .
- the processor 1402 may be a CPU.
- Memory 1404 may include volatile memory, such as random access memory (RAM).
- RAM random access memory
- the memory 1404 may also include non-volatile memory (2non-volatile memory, 2NVM), such as 2read-only memory (2ROM), flash memory, hard disk drive (HDD) or solid state drive ( solid state disk, SSD).
- 2NVM non-volatile memory
- 2ROM read-only memory
- flash memory such as 2read-only memory (2ROM), flash memory, hard disk drive (HDD) or solid state drive ( solid state disk, SSD).
- HDD hard disk drive
- SSD solid state drive
- example computer program product 1500 is provided using signal bearing medium 1501 .
- the signal bearing medium 1501 may include one or more program instructions 1502 that, when executed by one or more processors, may provide the functions or portions of the functions described above with respect to the methods shown in FIG. 4 or FIG. 10 .
- one or more of the features of S410 to S440 may be undertaken by one or more instructions associated with the signal bearing medium 1501 .
- the signal bearing medium 1501 may include a computer readable medium 1503 such as, but not limited to, a hard drive, a compact disc (CD), a digital video disc (DVD), a digital tape, a memory, a read only memory (read only memory) -only memory, ROM) or random access memory (RAM), etc.
- the signal bearing medium 1501 may include a computer recordable medium 1504 such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, and the like.
- signal bearing medium 1501 may include communication medium 1505, such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.).
- the signal bearing medium 1501 may be conveyed by a wireless form of communication medium 1505 (eg, a wireless communication medium conforming to the IEEE 802.11 standard or other transmission protocol).
- the one or more program instructions 1502 may be, for example, computer-executable instructions or logic-implemented instructions.
- the aforementioned computing device may be configured to, in response to program instructions 1502 communicated to the computing device via one or more of computer readable media 1503 , computer recordable media 1504 , and/or communication media 1505 , Provides various operations, functions, or actions. It should be understood that the arrangements described herein are for illustrative purposes only.
- a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
- an application running on a computing device and the computing device may be components.
- One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between 2 or more computers.
- these components can execute from various computer readable media having various data structures stored thereon.
- a component may, for example, be based on a signal having one or more data packets (eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals) Communicate through local and/or remote processes.
- data packets eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals
- the disclosed system, apparatus and method may be implemented in other manners.
- the apparatus embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
- the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
- the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
- the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
Landscapes
- Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Human Computer Interaction (AREA)
- Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
- Traffic Control Systems (AREA)
Abstract
A method and apparatus for controlling vehicle running and a vehicle. The method comprises: in an automatic running mode of a vehicle, obtaining a user instruction; obtaining environment information around the vehicle; performing multi-modal understanding on the user instruction and the environment information around the vehicle, and determining a driving intention of the user; and generating an automatic running control instruction for the vehicle according to the driving intention of the user.
Description
本申请涉及自动驾驶领域,并且更具体地,涉及一种控制车辆行驶的方法、装置及车辆。The present application relates to the field of automatic driving, and more particularly, to a method, device and vehicle for controlling the driving of a vehicle.
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
自动驾驶是人工智能领域的一种主流应用,自动驾驶技术依靠计算机视觉、雷达、监控装置和全球定位系统等协同合作,让机动车辆可以在不需要人类主动操作下,实现自动驾驶。自动驾驶的车辆使用各种计算系统来帮助将用户从一个位置运输到另一位置。一些自动驾驶车辆可能要求来自用户(诸如,领航员、驾驶员、或者乘客)的一些初始输入或者连续输入。自动驾驶车辆准许操作者从手动操作模式切换到自动驾驶模式或者介于两者之间的模式。由于自动驾驶技术无需人类来驾驶机动车辆,所以理论上能够有效避免人类的驾驶失误,减少交通事故的发生,且能够提高公路的运输效率。因此,自动驾驶技术越来越受到重视。Autopilot is a mainstream application in the field of artificial intelligence. Autopilot technology relies on the cooperation of computer vision, radar, monitoring devices and global positioning systems to allow motor vehicles to achieve autonomous driving without the need for human active operation. Autonomous vehicles use various computing systems to help transport users from one location to another. Some autonomous vehicles may require some initial or continuous input from a user, such as a pilot, driver, or passenger. An autonomous vehicle permits the operator to switch from a manual operating mode to an autonomous driving mode or a mode in between. Since automatic driving technology does not require humans to drive motor vehicles, it can theoretically effectively avoid human driving errors, reduce the occurrence of traffic accidents, and improve the efficiency of highway transportation. Therefore, autonomous driving technology is getting more and more attention.
目前,自动驾驶车辆的行驶依据是根据预先设定好的目的地以及通过各个传感器所获得的车辆周围环境,最终通过规划好的路径,将用户送到对应的目的地。但是,在实际车辆行驶途中,用户根据车周围的视觉信息,可能会产生一些与行驶到目的地不同的、临时的意图,例如:看到路边有熟人,需要临时停车,和他打招呼;觉得离前面车较近,需要拉开距离等。然而,在现有的自动驾驶技术下,若用户产生上述临时意图时,只能通过人工干预的方式,暂时接管车辆的控制权,然后执行自己的相关的临时意图。由于此时车辆已切换为人工驾驶模式,用户不再能够享受自动驾驶技术带来的更省心、更安全的驾驶体验。另外,当自动驾驶的等级处于第5级(Level 5,L5)时(按照美国机动车工程师学会(society of automotive engineers,SAE)关于自动化层级的定义),车辆的人工干预功能可能被取消,这时驾驶员将无法执行上述临时意图,使得用户体验感降低。At present, the driving basis of autonomous vehicles is based on the preset destination and the surrounding environment of the vehicle obtained by various sensors, and finally sends the user to the corresponding destination through the planned route. However, during the actual driving of the vehicle, the user may have some temporary intentions that are different from driving to the destination according to the visual information around the vehicle. If you are close to the car in front, you need to keep your distance, etc. However, under the existing autonomous driving technology, if the user generates the above temporary intention, he can only temporarily take over the control of the vehicle through manual intervention, and then execute his own temporary intention. Since the vehicle has been switched to manual driving mode at this time, users can no longer enjoy the more worry-free and safer driving experience brought by autonomous driving technology. In addition, when the level of automatic driving is at Level 5 (L5) (as defined by the Society of Automotive Engineers (SAE) on the level of automation), the human intervention function of the vehicle may be canceled, which At this time, the driver will not be able to perform the above temporary intention, so that the user experience will be reduced.
因此,如何提高自动驾驶过程中用户的体验感是亟需解决的问题。Therefore, how to improve the user experience in the process of autonomous driving is an urgent problem to be solved.
发明内容SUMMARY OF THE INVENTION
本申请提供一种控制车辆行驶的方法、装置及车辆,能够提高自动驾驶过程中用户的 体验感。The present application provides a method, device and vehicle for controlling the driving of a vehicle, which can improve the user's sense of experience in the process of automatic driving.
第一方面,提供了一种控制车辆行驶的方法,本申请提供的控制车辆行驶的方法可以由支持控制车辆行驶的电子装置执行。电子装置是指能够被抽象为计算机系统。本申请中的支持控制车辆行驶的电子装置,也可称为控制车辆行驶的装置。控制车辆行驶的装置可以是该电子装置的整机,也可以是该电子装置中的部分器件,例如:控制车辆行驶功能相关的芯片,如系统芯片。其中,系统芯片也称为片上系统(system on chip,SOC),或称为SOC芯片。具体地,控制车辆行驶的装置可以是诸如车辆中的车载电脑、车机、手机等这样的终端装置或车载设备,也可以是能够被设置在车辆或车载设备中的计算机系统中的处理器、系统芯片或其他类型的车载芯片。In a first aspect, a method for controlling the driving of a vehicle is provided, and the method for controlling the driving of a vehicle provided by the present application can be executed by an electronic device supporting the driving of the vehicle. An electronic device refers to a computer system that can be abstracted. In this application, the electronic device supporting the control of the running of the vehicle may also be referred to as the device for controlling the running of the vehicle. The device for controlling the driving of the vehicle may be the whole machine of the electronic device, or may be part of the device in the electronic device, for example: a chip related to the function of controlling the driving of the vehicle, such as a system chip. Among them, the system chip is also called system on chip (system on chip, SOC), or SOC chip. Specifically, the device for controlling the driving of the vehicle may be a terminal device or an in-vehicle device such as an in-vehicle computer, an in-vehicle machine, a mobile phone, etc. in the vehicle, or a processor, System-on-a-chip or other types of in-vehicle chips.
该方法包括:在车辆的自动驾驶模式下,获取用户指令;获取车辆周围的环境信息;对用户指令和车辆周围的环境信息进行多模态理解,确定用户的行驶意图;根据用户的行驶意图,生成对车辆的自动驾驶控制指令。The method includes: in the automatic driving mode of the vehicle, acquiring user instructions; acquiring environmental information around the vehicle; performing multi-modal understanding on the user instructions and the environmental information around the vehicle to determine the user's driving intention; according to the user's driving intention, Generate autonomous driving control commands for the vehicle.
本申请实施例中,在车辆的自动驾驶模式下,可以通过获取用户指令以及车辆周围的环境信息,并对用户指令以及车辆周围的环境信息进行多模态理解,确定出用户的行驶意图;再根据用户的行驶意图,生成对车辆的自动驾驶控制指令。使得车辆在自动驾驶模式下行驶时,就能够执行用户的临时行驶意图,无需用户通过人工接管控制权的方式去执行临时行驶意图,从而能够提高自动驾驶过程中用户的体验感。In the embodiment of the present application, in the automatic driving mode of the vehicle, the user's driving intention can be determined by acquiring user instructions and environmental information around the vehicle, and performing multi-modal understanding of the user instructions and environmental information around the vehicle; According to the user's driving intention, an automatic driving control command for the vehicle is generated. When the vehicle is driving in the automatic driving mode, the user's temporary driving intention can be executed, and the user does not need to manually take over the control to execute the temporary driving intention, so that the user's experience in the process of automatic driving can be improved.
结合第一方面,在第一方面的某些实现方式中,行驶意图包括至少一个意图,该至少一个意图中的每个意图包括n个槽位,该n个槽位中的每个槽位包括槽位名、槽位值以及槽位值的分类,n大于或等于0,n为整数。In conjunction with the first aspect, in some implementations of the first aspect, the driving intent includes at least one intent, each of the at least one intent includes n slots, and each of the n slots includes Slot name, slot value and classification of slot value, n is greater than or equal to 0, n is an integer.
结合第一方面,在第一方面的某些实现方式中,意图包括:停车、超车、减速、跟车、转向中的至少一种。In conjunction with the first aspect, in some implementations of the first aspect, the intent includes at least one of: stopping, overtaking, decelerating, following, and turning.
结合第一方面,在第一方面的某些实现方式中,槽位名包括:停车的位置、速度值、超车或跟车的对象、转向方位中的至少一种。With reference to the first aspect, in some implementations of the first aspect, the slot name includes at least one of: a parking position, a speed value, an overtaking or following object, and a steering orientation.
结合第一方面,在第一方面的某些实现方式中,槽位值的分类为:枚举类槽位值、文本类槽位值或环境类槽位值,With reference to the first aspect, in some implementations of the first aspect, the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value,
其中,枚举类槽位值表示槽位值是预先定义的枚举值,文本类槽位值表示槽位值是用户指令中的子串或根据用户指令生成的文本,环境类槽位值表示槽位值是根据用户指令中所提及的内容在环境信息中做的标识。Among them, the enumeration slot value indicates that the slot value is a predefined enumeration value, the text slot value indicates that the slot value is a substring in the user command or the text generated according to the user command, and the environment slot value indicates The slot value is identified in the environment information according to the content mentioned in the user instruction.
可选地,该环境类槽位值包括图像类槽位值,该图像能够反映车辆周围的环境。因而,图像类槽位值可以表示槽位值是根据用户指令中所提及的内容在图像信息中做的标识。Optionally, the environment class slot value includes an image class slot value, and the image can reflect the environment around the vehicle. Therefore, the image-type slot value may indicate that the slot value is an identification made in the image information according to the content mentioned in the user instruction.
结合第一方面,在第一方面的某些实现方式中,根据用户的行驶意图,生成对车辆的自动驾驶控制指令包括:根据行驶意图、周围环境和交通法规,判断行驶意图是否可行;若行驶意图可行,生成对车辆的自动驾驶控制指令。With reference to the first aspect, in some implementations of the first aspect, generating an automatic driving control instruction for the vehicle according to the user's driving intention includes: judging whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; The intent is feasible, and the autonomous driving control instructions for the vehicle are generated.
可选地,若行驶意图不可行,可以生成提示信息发送给用户。Optionally, if the driving intention is not feasible, prompt information may be generated and sent to the user.
可选地,该提示信息中可以包括行驶意图不可行的原因。Optionally, the prompt information may include the reason why the driving intention is not feasible.
在本申请实施例中,在确定行驶意图之后,根据行驶意图、周围环境和交通法规,判断行驶意图是否可行;若行驶意图可行,再生成对车辆的自动驾驶控制指令。从而能够避免在自动驾驶模式下执行用户的行驶意图时违反交通法规或出现其他问题,保证了自动驾 驶过程中的用户体验和自动驾驶的安全性。In the embodiment of the present application, after the driving intention is determined, it is judged whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; if the driving intention is feasible, the automatic driving control instruction for the vehicle is regenerated. In this way, it is possible to avoid violation of traffic laws or other problems when executing the user's driving intention in the automatic driving mode, which ensures the user experience and the safety of automatic driving during the automatic driving process.
结合第一方面,在第一方面的某些实现方式中,该用户指令包括:用户语音指令、用户文本指令、用户隔空手势指令中的任意一项或多项。With reference to the first aspect, in some implementations of the first aspect, the user instruction includes any one or more of a user voice instruction, a user text instruction, and a user air gesture instruction.
可选地,若实际获取的用户指令为用户语音指令或用户隔空手势指令,则在实际操作中,可以先将用户语音指令或用户隔空手势指令转换为用户文本指令,再对文本指令和周围环境信息进行多模态理解,也可以直接对用户语音指令或用户隔空手势指令进行多模态理解,本申请对此不做限定。Optionally, if the actually obtained user command is a user voice command or a user air gesture command, then in actual operation, the user voice command or the user air gesture command can be converted into a user text command, and then the text command and the user gesture command can be converted into user text commands. The multimodal understanding of the surrounding environment information can also be performed directly on the user's voice command or the user's gesture command in the air, which is not limited in this application.
结合第一方面,在第一方面的某些实现方式中,该方法还包括:向拍摄装置发送拍摄激活信号,以激活拍摄装置对车辆周围的环境信息进行拍摄;获取车辆周围的环境信息包括:获取拍摄装置根据拍摄激活信号拍摄的车辆周围的环境信息。With reference to the first aspect, in some implementations of the first aspect, the method further includes: sending a photographing activation signal to a photographing device to activate the photographing device to photograph environmental information around the vehicle; acquiring the environmental information around the vehicle includes: Obtain the environmental information around the vehicle photographed by the photographing device according to the photographing activation signal.
应理解,拍摄装置拍摄的环境信息也可以记为图像信息。但应理解,在实际操作中,获取的环境信息除了可以是拍摄装置拍摄的图像信息;还可以是激光雷达、车载传感器和/或车联网等获取的环境信息,本申请对此不做限定。It should be understood that the environmental information photographed by the photographing device may also be recorded as image information. However, it should be understood that in actual operation, the acquired environmental information may be not only image information captured by a photographing device, but also environmental information acquired by lidar, vehicle-mounted sensors, and/or Internet of Vehicles, etc., which is not limited in this application.
结合第一方面,在第一方面的某些实现方式中,获取车辆周围的环境信息包括:获取拍摄装置周期性地拍摄的车辆周围的环境信息。With reference to the first aspect, in some implementations of the first aspect, acquiring environmental information around the vehicle includes: acquiring environmental information around the vehicle periodically photographed by the photographing device.
结合第一方面,在第一方面的某些实现方式中,用户的行驶意图通过增强现实-抬头显示AR-HUD或中控屏呈现给用户。With reference to the first aspect, in some implementations of the first aspect, the user's driving intention is presented to the user through an augmented reality-head-up display AR-HUD or a central control screen.
在本申请实施例中,用户的行驶意图可以通过增强现实-抬头显示AR-HUD或中控屏的方式呈现给用户,以便用户及时判断多模态理解结果的正确性。In the embodiment of the present application, the user's driving intention may be presented to the user in the form of augmented reality-head-up display AR-HUD or a central control screen, so that the user can timely judge the correctness of the multimodal understanding result.
第二方面,提供了一种控制车辆行驶的装置,该装置包括获取单元和处理单元,在车辆的自动驾驶模式下,获取单元用于,获取用户指令;获取单元还用于,获取车辆周围的环境信息;处理单元用于,对用户指令和车辆周围的环境信息进行多模态理解,确定用户的行驶意图;处理单元还用于,根据用户的行驶意图,生成对车辆的自动驾驶控制指令。In a second aspect, a device for controlling the driving of a vehicle is provided. The device includes an acquisition unit and a processing unit. In the automatic driving mode of the vehicle, the acquisition unit is used to acquire user instructions; the acquisition unit is further used to acquire information around the vehicle. environmental information; the processing unit is used for multimodal understanding of user instructions and environmental information around the vehicle to determine the user's driving intention; the processing unit is also used for generating automatic driving control instructions for the vehicle according to the user's driving intention.
结合第二方面,在第二方面的某些实现方式中,行驶意图包括至少一个意图,该至少一个意图中的每个意图包括n个槽位,该n个槽位中的每个槽位包括槽位名、槽位值以及槽位值的分类,n大于或等于0,n为整数。In conjunction with the second aspect, in some implementations of the second aspect, the driving intent includes at least one intent, each intent in the at least one intent includes n slots, and each of the n slots includes Slot name, slot value and classification of slot value, n is greater than or equal to 0, n is an integer.
结合第二方面,在第二方面的某些实现方式中,意图包括:停车、超车、减速、跟车、转向中的至少一种。In conjunction with the second aspect, in some implementations of the second aspect, the intent includes at least one of: stopping, overtaking, decelerating, following, and turning.
结合第二方面,在第二方面的某些实现方式中,槽位名包括:停车的位置、速度值、超车或跟车的对象、转向方位中的至少一种。With reference to the second aspect, in some implementations of the second aspect, the slot name includes at least one of: a parking position, a speed value, an overtaking or following object, and a steering orientation.
结合第二方面,在第二方面的某些实现方式中,槽位值的分类为:枚举类槽位值、文本类槽位值或环境类槽位值,其中,枚举类槽位值表示槽位值是预先定义的枚举值,文本类槽位值表示槽位值是用户指令中的子串或根据用户指令生成的文本,环境类槽位值表示槽位值是根据用户指令中所提及的内容在环境信息中做的标识。With reference to the second aspect, in some implementations of the second aspect, the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value, wherein the enumeration type slot value Indicates that the slot value is a predefined enumeration value, the text type slot value indicates that the slot value is a substring in the user instruction or the text generated according to the user instruction, and the environment type slot value indicates that the slot value is based on the user instruction. The mentioned content is identified in the environmental information.
结合第二方面,在第二方面的某些实现方式中,处理单元还用于:根据行驶意图、周围环境和交通法规,判断行驶意图是否可行;若行驶意图可行,生成对车辆的自动驾驶控制指令。With reference to the second aspect, in some implementations of the second aspect, the processing unit is further configured to: determine whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; if the driving intention is feasible, generate an automatic driving control for the vehicle instruction.
结合第二方面,在第二方面的某些实现方式中,用户指令包括:用户语音指令、用户文本指令、用户隔空手势指令中的任意一项或多项。With reference to the second aspect, in some implementations of the second aspect, the user instructions include: any one or more of user voice instructions, user text instructions, and user air gesture instructions.
结合第二方面,在第二方面的某些实现方式中,该装置还包括:发送单元,发送单元用于,向拍摄装置发送拍摄激活信号,以激活拍摄装置对车辆周围的环境信息进行拍摄;获取单元还用于:获取拍摄装置根据拍摄激活信号拍摄的车辆周围的环境信息。With reference to the second aspect, in some implementations of the second aspect, the device further includes: a sending unit, where the sending unit is configured to send a photographing activation signal to the photographing device, so as to activate the photographing device to photograph environmental information around the vehicle; The acquiring unit is further configured to: acquire the environmental information around the vehicle photographed by the photographing device according to the photographing activation signal.
结合第二方面,在第二方面的某些实现方式中,获取单元还用于:获取拍摄装置周期性地拍摄的车辆周围的环境信息。With reference to the second aspect, in some implementations of the second aspect, the acquiring unit is further configured to: acquire environmental information around the vehicle periodically photographed by the photographing device.
结合第二方面,在第二方面的某些实现方式中,用户的行驶意图通过增强现实-抬头显示AR-HUD或中控屏呈现给用户。In combination with the second aspect, in some implementations of the second aspect, the user's driving intention is presented to the user through an augmented reality-head-up display AR-HUD or a central control screen.
第三方面,提供了一种多模态处理模块的训练方法,包括:获取训练数据,训练数据包括训练输入数据和训练目标数据,训练输入数据包括用户指令和车辆周围的环境信息,训练目标数据包括训练输入数据对应的行驶意图;根据训练输入数据和训练目标数据训练多模态处理模块。In a third aspect, a training method for a multimodal processing module is provided, including: acquiring training data, the training data includes training input data and training target data, the training input data includes user instructions and environmental information around the vehicle, and the training target data Including the driving intention corresponding to the training input data; training the multimodal processing module according to the training input data and the training target data.
结合第三方面,在第三方面的某些实现方式中,行驶意图包括至少一个意图,该至少一个意图中的每个意图包括n个槽位,该n个槽位中的每个槽位包括槽位名、槽位值以及槽位值的分类,n大于或等于0,n为整数。In conjunction with the third aspect, in some implementations of the third aspect, the driving intent includes at least one intent, each intent in the at least one intent includes n slots, and each of the n slots includes Slot name, slot value and classification of slot value, n is greater than or equal to 0, n is an integer.
结合第三方面,在第三方面的某些实现方式中,意图包括:停车、超车、减速、跟车、转向中的至少一种。In conjunction with the third aspect, in some implementations of the third aspect, the intent includes at least one of: stopping, overtaking, decelerating, following, and turning.
结合第三方面,在第三方面的某些实现方式中,槽位名包括:停车的位置、速度值、超车或跟车的对象、转向方位中的至少一种。With reference to the third aspect, in some implementations of the third aspect, the slot name includes at least one of: a parking position, a speed value, an overtaking or following object, and a steering orientation.
结合第三方面,在第三方面的某些实现方式中,槽位值的分类为:枚举类槽位值、文本类槽位值或环境类槽位值,其中,枚举类槽位值表示槽位值是预先定义的枚举值,文本类槽位值表示槽位值是用户指令中的子串或根据用户指令生成的文本,环境类槽位值表示槽位值是根据用户指令中所提及的内容在环境信息中做的标识。In combination with the third aspect, in some implementations of the third aspect, the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value, wherein the enumeration type slot value Indicates that the slot value is a predefined enumeration value, the text type slot value indicates that the slot value is a substring in the user instruction or the text generated according to the user instruction, and the environment type slot value indicates that the slot value is based on the user instruction. The mentioned content is identified in the environmental information.
第四方面,提供了一种多模态处理模块的训练装置,包括获取单元和处理单元,获取单元用于,获取训练数据,训练数据包括训练输入数据和训练目标数据,训练输入数据包括用户指令和车辆周围的环境信息,训练目标数据包括训练输入数据对应的行驶意图;处理单元用于,根据训练输入数据和训练目标数据训练多模态处理模块。A fourth aspect provides a training device for a multimodal processing module, including an acquisition unit and a processing unit, the acquisition unit is used to acquire training data, the training data includes training input data and training target data, and the training input data includes user instructions and the environment information around the vehicle, the training target data includes the driving intention corresponding to the training input data; the processing unit is used for training the multimodal processing module according to the training input data and the training target data.
结合第四方面,在第四方面的某些实现方式中,行驶意图包括至少一个意图,该至少一个意图中的每个意图包括n个槽位,n个槽位中的每个槽位包括槽位名、槽位值以及槽位值的分类,n大于或等于0,n为整数。In conjunction with the fourth aspect, in some implementations of the fourth aspect, the driving intent includes at least one intent, each intent in the at least one intent includes n slots, and each of the n slots includes a slot Bit name, slot value and classification of slot value, n is greater than or equal to 0, n is an integer.
结合第四方面,在第四方面的某些实现方式中,意图包括:停车、超车、减速、跟车、转向中的至少一种。In conjunction with the fourth aspect, in some implementations of the fourth aspect, the intent includes at least one of: stopping, overtaking, decelerating, following, and turning.
结合第四方面,在第四方面的某些实现方式中,槽位名包括:停车的位置、速度值、超车或跟车的对象、转向方位中的至少一种。With reference to the fourth aspect, in some implementations of the fourth aspect, the slot name includes at least one of: a parking location, a speed value, an overtaking or following object, and a steering orientation.
结合第四方面,在第四方面的某些实现方式中,槽位值的分类为:枚举类槽位值、文本类槽位值或环境类槽位值,其中,枚举类槽位值表示槽位值是预先定义的枚举值,文本类槽位值表示槽位值是用户指令中的子串或根据用户指令生成的文本,环境类槽位值表示槽位值是根据用户指令中所提及的内容在环境信息中做的标识。With reference to the fourth aspect, in some implementations of the fourth aspect, the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value, wherein the enumeration type slot value Indicates that the slot value is a predefined enumeration value, the text type slot value indicates that the slot value is a substring in the user instruction or the text generated according to the user instruction, and the environment type slot value indicates that the slot value is based on the user instruction. The mentioned content is identified in the environmental information.
第五方面,提供了另一种控制车辆行驶的方法,包括:在该车辆的自动驾驶模式下,获取用户指令;获取该车辆周围的环境信息;根据该用户指令和该环境信息,确定该用户 的行驶意图;至少根据该用户的行驶意图,生成对该车辆的自动驾驶控制指令;基于该自动驾驶控制指令控制该车辆行驶。In a fifth aspect, another method for controlling the driving of a vehicle is provided, comprising: in an automatic driving mode of the vehicle, acquiring a user instruction; acquiring environmental information around the vehicle; and determining the user according to the user instruction and the environmental information The driving intention of the vehicle; at least according to the driving intention of the user, an automatic driving control instruction for the vehicle is generated; based on the automatic driving control instruction, the vehicle is controlled to drive.
本申请实施例中,在车辆的自动驾驶模式下,可以通过获取用户指令以及车辆周围的环境信息,并根据用户指令和环境信息,确定用户的行驶意图;再根据用户的行驶意图,生成对车辆的自动驾驶控制指令。使得车辆在自动驾驶模式下行驶时,就能够执行用户的临时行驶意图,无需用户通过人工接管控制权的方式去执行临时行驶意图,从而能够提高自动驾驶过程中用户的体验感。In the embodiment of the present application, in the automatic driving mode of the vehicle, the user's driving intention can be determined by acquiring the user's instruction and the environmental information around the vehicle, and according to the user's instruction and the environmental information; of the autopilot control commands. When the vehicle is driving in the automatic driving mode, the user's temporary driving intention can be executed, and the user does not need to manually take over the control to execute the temporary driving intention, so that the user's experience in the process of automatic driving can be improved.
结合第五方面,在第五方面的某些实现方式中,该根据该用户指令和该环境信息,确定该用户的行驶意图包括:对该用户指令和该环境信息进行多模态理解;根据该多模态理解的结果,确定该用户的行驶意图。With reference to the fifth aspect, in some implementations of the fifth aspect, determining the user's driving intention according to the user instruction and the environment information includes: performing multimodal understanding on the user instruction and the environment information; The result of multimodal understanding determines the user's driving intention.
结合第五方面,在第五方面的某些实现方式中,该用户指令包括:用户语音指令、用户文本指令、用户隔空手势指令中的至少一项。With reference to the fifth aspect, in some implementations of the fifth aspect, the user instruction includes at least one of a user voice instruction, a user text instruction, and a user air gesture instruction.
结合第五方面,在第五方面的某些实现方式中,行驶意图包括至少一个意图,该至少一个意图中的每个意图包括n个槽位,该n个槽位中的每个槽位包括槽位名、槽位值以及槽位值的分类,n大于或等于0,n为整数。In conjunction with the fifth aspect, in some implementations of the fifth aspect, the driving intent includes at least one intent, each of the at least one intent includes n slots, and each of the n slots includes Slot name, slot value and classification of slot value, n is greater than or equal to 0, n is an integer.
结合第五方面,在第五方面的某些实现方式中,意图包括:停车、超车、减速、跟车、转向中的至少一种。In conjunction with the fifth aspect, in some implementations of the fifth aspect, the intent includes at least one of: stopping, overtaking, decelerating, following, and turning.
结合第五方面,在第五方面的某些实现方式中,槽位名包括:停车的位置、速度值、超车或跟车的对象、转向方位中的至少一种。With reference to the fifth aspect, in some implementations of the fifth aspect, the slot name includes at least one of: a parking position, a speed value, an overtaking or following object, and a steering orientation.
结合第五方面,在第五方面的某些实现方式中,槽位值的分类为:枚举类槽位值、文本类槽位值或环境类槽位值,With reference to the fifth aspect, in some implementations of the fifth aspect, the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value,
其中,枚举类槽位值表示槽位值是预先定义的枚举值,文本类槽位值表示槽位值是用户指令中的子串或根据用户指令生成的文本,环境类槽位值表示槽位值是根据用户指令中所提及的内容在环境信息中做的标识。Among them, the enumeration slot value indicates that the slot value is a predefined enumeration value, the text slot value indicates that the slot value is a substring in the user command or the text generated according to the user command, and the environment slot value indicates The slot value is identified in the environment information according to the content mentioned in the user instruction.
结合第五方面,在第五方面的某些实现方式中,根据用户的行驶意图,生成对车辆的自动驾驶控制指令;包括:根据行驶意图、周围环境和交通法规,判断行驶意图是否可行;若行驶意图可行,生成对车辆的自动驾驶控制指令。With reference to the fifth aspect, in some implementations of the fifth aspect, an automatic driving control instruction for the vehicle is generated according to the user's driving intention; including: judging whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; The driving intention is feasible, and the automatic driving control command for the vehicle is generated.
可选地,若行驶意图不可行,可以生成提示信息发送给用户。Optionally, if the driving intention is not feasible, prompt information may be generated and sent to the user.
可选地,该提示信息中可以包括行驶意图不可行的原因。Optionally, the prompt information may include the reason why the driving intention is not feasible.
在本申请实施例中,在确定行驶意图之后,根据行驶意图、周围环境和交通法规,判断行驶意图是否可行;若行驶意图可行,再生成对车辆的自动驾驶控制指令。从而能够避免在自动驾驶模式下执行用户的行驶意图时违反交通法规或出现其他问题,保证了自动驾驶过程中的用户体验和自动驾驶的安全性。In the embodiment of the present application, after the driving intention is determined, it is judged whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; if the driving intention is feasible, the automatic driving control instruction for the vehicle is regenerated. Therefore, it is possible to avoid violation of traffic laws or other problems when executing the user's driving intention in the automatic driving mode, thereby ensuring the user experience in the automatic driving process and the safety of automatic driving.
结合第五方面,在第五方面的某些实现方式中,若该获取用户指令为获取用户的文本指令,那么在获取用户的文本指令之前,还可以先获取用户的自然语音指令或用户隔空手势指令;再将自然语音指令或用户隔空手势指令转换为文本指令。With reference to the fifth aspect, in some implementations of the fifth aspect, if the user's instruction to be acquired is to acquire the user's text instruction, then before acquiring the user's text instruction, the user's natural voice instruction or the user's airspace instruction may be acquired first. Gesture commands; then convert natural voice commands or user air gesture commands into text commands.
结合第五方面,在第五方面的某些实现方式中,该方法还包括:向拍摄装置发送拍摄激活信号,以激活拍摄装置对车辆周围的环境信息进行拍摄;获取车辆周围的环境信息包括:获取拍摄装置根据拍摄激活信号拍摄的车辆周围的环境信息。With reference to the fifth aspect, in some implementations of the fifth aspect, the method further includes: sending a photographing activation signal to a photographing device to activate the photographing device to photograph environmental information around the vehicle; acquiring the environmental information around the vehicle includes: Obtain the environmental information around the vehicle photographed by the photographing device according to the photographing activation signal.
结合第五方面,在第五方面的某些实现方式中,获取车辆周围的环境信息包括:获取拍摄装置周期性地拍摄的车辆周围的环境信息。With reference to the fifth aspect, in some implementations of the fifth aspect, acquiring the environmental information around the vehicle includes: acquiring the environmental information around the vehicle periodically photographed by the photographing device.
结合第五方面,在第五方面的某些实现方式中,用户的行驶意图通过增强现实-抬头显示AR-HUD或中控屏呈现给用户。With reference to the fifth aspect, in some implementations of the fifth aspect, the user's driving intention is presented to the user through an augmented reality-head-up display AR-HUD or a central control screen.
在本申请实施例中,用户的行驶意图可以通过增强现实-抬头显示AR-HUD或中控屏的方式呈现给用户,以便用户及时判断多模态理解结果的正确性。In the embodiment of the present application, the user's driving intention may be presented to the user in the form of augmented reality-head-up display AR-HUD or a central control screen, so that the user can timely judge the correctness of the multimodal understanding result.
第六方面,提供了另一种控制车辆行驶的装置,包括能够实现第五方面或者第五方面任一可能的实现方式中的控制车辆行驶的方法的各个模块。In a sixth aspect, another apparatus for controlling the running of a vehicle is provided, including various modules capable of implementing the method for controlling the running of a vehicle in the fifth aspect or any possible implementation manner of the fifth aspect.
第七方面,提供了一种多模态处理模块的处理方法,该多模态处理模块根据上述第三方面或者第三方面任一可能的实现方式中的训练方法训练得到;该处理方法包括:多模态处理模块获取输入数据,输入数据包括用户指令以及车辆周围的环境信息;多模态处理模块根据输入数据输出行驶意图。A seventh aspect provides a processing method for a multimodal processing module, where the multimodal processing module is obtained by training according to the third aspect or the training method in any possible implementation manner of the third aspect; the processing method includes: The multimodal processing module obtains input data, and the input data includes user instructions and environmental information around the vehicle; the multimodal processing module outputs the driving intention according to the input data.
第八方面,提供了一种多模态处理模块,其特征在于,该多模态处理模块根据上述第三方面或者第三方面任一可能的实现方式中的训练方法训练得到;该多模态处理模块包括:获取单元,用于获取输入数据,输入数据包括用户指令以及车辆周围的环境信息;处理单元,用于根据输入数据输出行驶意图。In an eighth aspect, a multimodal processing module is provided, wherein the multimodal processing module is obtained by training according to the third aspect or the training method in any possible implementation manner of the third aspect; the multimodal processing module is obtained by training. The processing module includes: an acquisition unit for acquiring input data, where the input data includes user instructions and environmental information around the vehicle; and a processing unit for outputting driving intentions according to the input data.
第九方面,提供了一种自动驾驶车辆,包括上述第二方面或者第二方面任一可能的实现方式中的装置;和/或,包括上述第四方面或者第四方面任一可能的实现方式中的装置;和/或,包括上述第六方面或者第六方面任一可能的实现方式中的装置;和/或,包括上述第八方面或者第八方面任一可能的实现方式中的模块;In a ninth aspect, an autonomous driving vehicle is provided, including the device in the second aspect or any possible implementation of the second aspect; and/or, including the fourth aspect or any possible implementation of the fourth aspect and/or, including the above sixth aspect or the device in any possible implementation manner of the sixth aspect; and/or, including the above eighth aspect or the module in any possible implementation manner of the eighth aspect;
第十方面,提供了一种控制车辆行驶的装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行上述第一方面或者第一方面的任一可能的实现方式中的所述控制车辆行驶的方法;和/或,调用所述程序指令来执行上述第五方面或者第五方面的任一可能的实现方式中的所述另一种控制车辆行驶的方法。A tenth aspect provides a device for controlling the driving of a vehicle, characterized by comprising a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions to execute the above-mentioned first aspect or The method for controlling the driving of a vehicle in any possible implementation manner of the first aspect; and/or, calling the program instructions to execute the fifth aspect or any possible implementation manner of the fifth aspect. Another way to control the movement of a vehicle.
第十一方面,提供了一种多模态处理模块的训练装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行上述第三方面或者第三方面的任一可能的实现方式中的所述多模态处理模块的训练方法。In an eleventh aspect, a training device for a multimodal processing module is provided, characterized in that it includes a processor and a memory, the memory is used for storing program instructions, and the processor is used for calling the program instructions to execute the above The third aspect or the method for training the multimodal processing module in any possible implementation manner of the third aspect.
第十二方面,提供了一种系统,该系统包括上述第二方面或者第二方面任一可能的实现方式中的装置;和/或,包括上述第六方面或者第六方面任一可能的实现方式中的装置。A twelfth aspect provides a system, where the system includes the above-mentioned second aspect or the apparatus in any possible implementation manner of the second aspect; and/or, includes the above-mentioned sixth aspect or any possible implementation of the sixth aspect device in the manner.
可选地,该系统可以为车辆,也可以为车辆上的车载系统,本申请对此不做限定。Optionally, the system may be a vehicle, or may be an on-board system on a vehicle, which is not limited in this application.
第十三方面,提供了一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面或者第一方面的任一可能的实现方式中的所述控制车辆行驶的方法;和/或,执行上述第五方面或者第五方面的任一可能的实现方式中的所述另一种控制车辆行驶的方法。A thirteenth aspect provides a computer program product containing instructions, which, when the computer program product runs on a computer, causes the computer to execute the control in the first aspect or any possible implementation manner of the first aspect A method for driving a vehicle; and/or, executing the another method for controlling the driving of a vehicle in the fifth aspect or any possible implementation manner of the fifth aspect.
第十四方面,提供了一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第三方面或者第三方面的任一可能的实现方式中的所述多模态处理模块的训练方法。A fourteenth aspect provides a computer program product containing instructions, when the computer program product runs on a computer, the computer program product causes the computer to execute the third aspect or any of the possible implementations of the third aspect. The training method of the modality processing module.
第十五方面,提供了一种计算机可读存储介质,所述计算机可读介质存储用于设备执 行的程序代码,所述程序代码包括用于执行上述第一方面或者第一方面的任一可能的实现方式中的所述控制车辆行驶的方法;和/或,执行上述第五方面或者第五方面的任一可能的实现方式中的所述另一种控制车辆行驶的方法。A fifteenth aspect provides a computer-readable storage medium, where the computer-readable medium stores program code for execution by a device, the program code including the first aspect or any possibility for executing the first aspect The method for controlling the driving of a vehicle in the implementation manner of the above; and/or, executing the another method for controlling the driving of a vehicle in the fifth aspect or any possible implementation manner of the fifth aspect.
第十六方面,提供了一种计算机可读存储介质,所述计算机可读介质存储用于设备执行的程序代码,所述程序代码包括用于执行上述第三方面或者第三方面的任一可能的实现方式中的所述多模态处理模块的训练方法。A sixteenth aspect provides a computer-readable storage medium, where the computer-readable medium stores program code for execution by a device, the program code including the third aspect or any possibility for executing the third aspect. The training method of the multimodal processing module in the implementation manner of .
第十七方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面或者第一方面的任一可能的实现方式中的所述控制车辆行驶的方法;和/或,执行上述第五方面或者第五方面的任一可能的实现方式中的所述另一种控制车辆行驶的方法。A seventeenth aspect provides a chip, the chip includes a processor and a data interface, the processor reads instructions stored in a memory through the data interface, and executes the first aspect or any possibility of the first aspect The method for controlling the driving of a vehicle in the implementation manner of the above; and/or, executing the another method for controlling the driving of a vehicle in the fifth aspect or any possible implementation manner of the fifth aspect.
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面或者第一方面的任一可能的实现方式中的所述控制车辆行驶的方法;和/或,执行上述第五方面或者第五方面的任一可能的实现方式中的所述另一种控制车辆行驶的方法。Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method for controlling vehicle driving in the first aspect or any possible implementation manner of the first aspect; and/or, execute the fifth aspect or any possible implementation manner of the fifth aspect. Said another method of controlling the running of a vehicle.
第十八方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第三方面或者第三方面的任一可能的实现方式中的所述多模态处理模块的训练方法。In an eighteenth aspect, a chip is provided, the chip includes a processor and a data interface, the processor reads an instruction stored in a memory through the data interface, and executes the third aspect or any possibility of the third aspect. The training method of the multimodal processing module in the implementation manner of .
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行上述第三方面或者第三方面的任一可能的实现方式中的所述多模态处理模块的训练方法。Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the training method of the multimodal processing module in the third aspect or any possible implementation manner of the third aspect.
图1是本申请实施例提供的一种车辆的功能框图;1 is a functional block diagram of a vehicle provided by an embodiment of the present application;
图2是本申请实施例适用的一种自动驾驶系统的示例图;FIG. 2 is an exemplary diagram of an automatic driving system to which an embodiment of the present application is applicable;
图3是本申请实施例的一种云侧指令自动驾驶车辆的应用示例图;FIG. 3 is an example diagram of an application of a cloud-side command to an autonomous driving vehicle according to an embodiment of the present application;
图4是本申请实施例的提供的一种控制车辆行驶的方法示例图;FIG. 4 is an example diagram of a method for controlling the driving of a vehicle provided by an embodiment of the present application;
图5是本申请实施例提供的一种系统架构示例图;FIG. 5 is an example diagram of a system architecture provided by an embodiment of the present application;
图6是本申请实施例提供的一种具体实现方式的示例图;6 is an example diagram of a specific implementation provided by an embodiment of the present application;
图7是本申请实施例提供的另一种具体实现方式的示例图;FIG. 7 is an exemplary diagram of another specific implementation manner provided by an embodiment of the present application;
图8是本申请实施例提供的一种多模态处理方法的示例图;8 is an exemplary diagram of a multimodal processing method provided by an embodiment of the present application;
图9是本申请实施例提供的另一种多模态处理方法的示例图;FIG. 9 is an exemplary diagram of another multimodal processing method provided by an embodiment of the present application;
图10是本申请实施例提供的一种多模态处理模块的训练方法的示例图;10 is an example diagram of a training method for a multimodal processing module provided by an embodiment of the present application;
图11是本申请实施例提供的一种应用场景的示例图;FIG. 11 is an example diagram of an application scenario provided by an embodiment of the present application;
图12是本申请实施例提供的一种控制车辆行驶的装置的示例图;FIG. 12 is an example diagram of a device for controlling the driving of a vehicle provided by an embodiment of the present application;
图13是本申请实施例提供的一种多模态处理模块的训练装置;13 is a training device for a multimodal processing module provided by an embodiment of the present application;
图14为本申请实施例提供的一种装置的结构示例图;FIG. 14 is a schematic structural diagram of an apparatus provided by an embodiment of the present application;
图15是本申请实施例提供的一种计算机程序产品的示例图。FIG. 15 is an example diagram of a computer program product provided by an embodiment of the present application.
下面将结合附图,对本申请中的技术方案进行描述。The technical solutions in the present application will be described below with reference to the accompanying drawings.
图1是本申请实施例提供的一种车辆的功能框图。在一个实施例中,将车辆100配置为完全或部分地自动驾驶模式。FIG. 1 is a functional block diagram of a vehicle provided by an embodiment of the present application. In one embodiment, the vehicle 100 is configured in a fully or partially autonomous driving mode.
例如,车辆100可以在处于自动驾驶模式中的同时控制自身,并且可通过人为操作来确定车辆及其周边环境的当前状态,确定周边环境中的至少一个其他车辆的可能行为,并确定其他车辆执行可能行为的可能性相对应的置信水平,基于所确定的信息来控制车辆100。在车辆100处于自动驾驶模式中时,可以将车辆100置为在没有和人交互的情况下操作。For example, the vehicle 100 can control itself while in an autonomous driving mode, and can determine the current state of the vehicle and its surroundings through human manipulation, determine the possible behavior of at least one other vehicle in the surrounding environment, and determine the other vehicles perform The confidence level corresponding to the likelihood of the possible behavior, the vehicle 100 is controlled based on the determined information. When the vehicle 100 is in an autonomous driving mode, the vehicle 100 may be placed to operate without human interaction.
车辆100可包括各种子系统,例如行进系统102、传感器系统104、控制系统106、一个或多个外围设备108以及电源110、计算机系统112和用户接口116。可选地,车辆100可包括更多或更少的子系统,并且每个子系统可包括多个元件。另外,车辆100的每个子系统和元件可以通过有线或者无线互连。Vehicle 100 may include various subsystems, such as travel system 102 , sensor system 104 , control system 106 , one or more peripherals 108 and power supply 110 , computer system 112 , and user interface 116 . Alternatively, vehicle 100 may include more or fewer subsystems, and each subsystem may include multiple elements. Additionally, each of the subsystems and elements of the vehicle 100 may be interconnected by wire or wirelessly.
行进系统102可包括为车辆100提供动力运动的组件。在一个实施例中,行进系统102可包括引擎118、能量源119、传动装置120和车轮/轮胎121。引擎118可以是内燃引擎、电动机、空气压缩引擎或其他类型的引擎组合,例如,汽油发动机和电动机组成的混动引擎,内燃引擎和空气压缩引擎组成的混动引擎。引擎118将能量源119转换成机械能量。The travel system 102 may include components that provide powered motion for the vehicle 100 . In one embodiment, travel system 102 may include engine 118 , energy source 119 , transmission 120 , and wheels/tires 121 . The engine 118 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a gasoline engine and electric motor hybrid engine, an internal combustion engine and an air compression engine hybrid engine. Engine 118 converts energy source 119 into mechanical energy.
能量源119的示例包括汽油、柴油、其他基于石油的燃料、丙烷、其他基于压缩气体的燃料、乙醇、太阳能电池板、电池和其他电力来源。能量源119也可以为车辆100的其他系统提供能量。Examples of energy sources 119 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity. The energy source 119 may also provide energy to other systems of the vehicle 100 .
传动装置120可以将来自引擎118的机械动力传送到车轮121。传动装置120可包括变速箱、差速器和驱动轴。在一个实施例中,传动装置120还可以包括其他器件,比如离合器。其中,驱动轴可包括可耦合到一个或多个车轮121的一个或多个轴。Transmission 120 may transmit mechanical power from engine 118 to wheels 121 . Transmission 120 may include a gearbox, a differential, and a driveshaft. In one embodiment, transmission 120 may also include other devices, such as clutches. Among other things, the drive shaft may include one or more axles that may be coupled to one or more wheels 121 .
传感器系统104可包括感测关于车辆100周边的环境的信息的若干个传感器。例如,传感器系统104可包括定位系统122(定位系统可以是全球定位系统(global positioning system,GPS)系统,也可以是北斗系统或者其他定位系统)、惯性测量单元(inertial measurement unit,IMU)124、雷达126、激光测距仪128以及相机130。传感器系统104还可包括被监视车辆100的内部系统的传感器(例如,车内空气质量监测器、燃油量表、机油温度表等)。来自这些传感器中的一个或多个的传感器数据可用于检测对象及其相应特性(位置、形状、方向、速度等)。这种检测和识别是自主车辆100的安全操作的关键功能。The sensor system 104 may include several sensors that sense information about the environment surrounding the vehicle 100 . For example, the sensor system 104 may include a positioning system 122 (the positioning system may be a global positioning system (GPS) system, a Beidou system or other positioning systems), an inertial measurement unit (IMU) 124, Radar 126 , laser rangefinder 128 and camera 130 . The sensor system 104 may also include sensors of the internal systems of the vehicle 100 being monitored (eg, an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensor data from one or more of these sensors can be used to detect objects and their corresponding characteristics (position, shape, orientation, velocity, etc.). This detection and identification is a critical function for the safe operation of the autonomous vehicle 100 .
定位系统122可用于估计车辆100的地理位置。IMU 124用于基于惯性加速度来感测车辆100的位置和朝向变化。在一个实施例中,IMU 124可以是加速度计和陀螺仪的组合。The positioning system 122 may be used to estimate the geographic location of the vehicle 100 . The IMU 124 is used to sense position and orientation changes of the vehicle 100 based on inertial acceleration. In one embodiment, IMU 124 may be a combination of an accelerometer and a gyroscope.
雷达126可利用无线电信号来感测车辆100的周边环境内的物体。在一些实施例中,除了感测物体以外,雷达126还可用于感测物体的速度和/或前进方向。Radar 126 may utilize radio signals to sense objects within the surrounding environment of vehicle 100 . In some embodiments, in addition to sensing objects, radar 126 may be used to sense the speed and/or heading of objects.
激光测距仪128可利用激光来感测车辆100所位于的环境中的物体。在一些实施例中,激光测距仪128可包括一个或多个激光源、激光扫描器以及一个或多个检测器,以及其他 系统组件。The laser rangefinder 128 may utilize laser light to sense objects in the environment in which the vehicle 100 is located. In some embodiments, the laser rangefinder 128 may include one or more laser sources, laser scanners, and one or more detectors, among other system components.
相机130可用于捕捉车辆100的周边环境的多个图像。相机130可以是静态相机或视频相机。Camera 130 may be used to capture multiple images of the surrounding environment of vehicle 100 . Camera 130 may be a still camera or a video camera.
控制系统106为控制车辆100及其组件的操作。控制系统106可包括各种元件,其中包括转向系统132、油门134、制动单元136、传感器融合算法138、计算机视觉系统140、路线控制系统142以及障碍物避免系统144。The control system 106 controls the operation of the vehicle 100 and its components. Control system 106 may include various elements including steering system 132 , throttle 134 , braking unit 136 , sensor fusion algorithms 138 , computer vision system 140 , route control system 142 , and obstacle avoidance system 144 .
转向系统132可操作来调整车辆100的前进方向。例如在一个实施例中可以为方向盘系统。The steering system 132 is operable to adjust the heading of the vehicle 100 . For example, in one embodiment it may be a steering wheel system.
油门134用于控制引擎118的操作速度并进而控制车辆100的速度。The throttle 134 is used to control the operating speed of the engine 118 and thus the speed of the vehicle 100 .
制动单元136用于控制车辆100减速。制动单元136可使用摩擦力来减慢车轮121。在其他实施例中,制动单元136可将车轮121的动能转换为电流。制动单元136也可采取其他形式来减慢车轮121转速从而控制车辆100的速度。The braking unit 136 is used to control the deceleration of the vehicle 100 . The braking unit 136 may use friction to slow the wheels 121 . In other embodiments, the braking unit 136 may convert the kinetic energy of the wheels 121 into electrical current. The braking unit 136 may also take other forms to slow the wheels 121 to control the speed of the vehicle 100 .
计算机视觉系统140可以操作来处理和分析由相机130捕捉的图像以便识别车辆100周边环境中的物体和/或特征。所述物体和/或特征可包括交通信号、道路边界和障碍物。计算机视觉系统140可使用物体识别算法、运动中恢复结构(Structure from Motion,SFM)算法、视频跟踪和其他计算机视觉技术。在一些实施例中,计算机视觉系统140可以用于为环境绘制地图、跟踪物体、估计物体的速度等等。Computer vision system 140 may be operable to process and analyze images captured by camera 130 in order to identify objects and/or features in the environment surrounding vehicle 100 . The objects and/or features may include traffic signals, road boundaries and obstacles. Computer vision system 140 may use object recognition algorithms, Structure from Motion (SFM) algorithms, video tracking, and other computer vision techniques. In some embodiments, the computer vision system 140 may be used to map the environment, track objects, estimate the speed of objects, and the like.
路线控制系统142用于确定车辆100的行驶路线。在一些实施例中,路线控制系统142可结合来自传感器138、GPS122和一个或多个预定地图的数据以为车辆100确定行驶路线。The route control system 142 is used to determine the travel route of the vehicle 100 . In some embodiments, the route control system 142 may combine data from the sensors 138 , the GPS 122 , and one or more predetermined maps to determine a driving route for the vehicle 100 .
障碍物避免系统144用于识别、评估和避免或者以其他方式越过车辆100的环境中的潜在障碍物。The obstacle avoidance system 144 is used to identify, evaluate, and avoid or otherwise traverse potential obstacles in the environment of the vehicle 100 .
当然,在一个实例中,控制系统106可以增加或替换地包括除了所示出和描述的那些以外的组件。或者也可以减少一部分上述示出的组件。Of course, in one example, the control system 106 may additionally or alternatively include components other than those shown and described. Alternatively, some of the components shown above may be reduced.
车辆100通过外围设备108与外部传感器、其他车辆、其他计算机系统或用户之间进行交互。外围设备108可包括无线通信系统146、车载电脑148、麦克风150和/或扬声器152。Vehicle 100 interacts with external sensors, other vehicles, other computer systems, or users through peripheral devices 108 . Peripherals 108 may include a wireless communication system 146 , an onboard computer 148 , a microphone 150 and/or a speaker 152 .
在一些实施例中,外围设备108提供车辆100的用户与用户接口116交互的手段。例如,车载电脑148可向车辆100的用户提供信息。用户接口116还可操作车载电脑148来接收用户的输入。车载电脑148可以通过触摸屏进行操作。在其他情况中,外围设备108可提供用于车辆100与位于车内的其它设备通信的手段。例如,麦克风150可从车辆100的用户接收音频(例如,语音命令或其他音频输入)。类似地,扬声器152可向车辆100的用户输出音频。In some embodiments, peripherals 108 provide a means for a user of vehicle 100 to interact with user interface 116 . For example, the onboard computer 148 may provide information to the user of the vehicle 100 . User interface 116 may also operate on-board computer 148 to receive user input. The onboard computer 148 can be operated via a touch screen. In other cases, peripheral devices 108 may provide a means for vehicle 100 to communicate with other devices located within the vehicle. For example, microphone 150 may receive audio (eg, voice commands or other audio input) from a user of vehicle 100 . Similarly, speakers 152 may output audio to a user of vehicle 100 .
无线通信系统146可以直接地或者经由通信网络来与一个或多个设备无线通信。例如,无线通信系统146可使用3G蜂窝通信,例如码分多址(code division multiple access,CDMA)、全球移动通信系统(global system for mobile communications,GSM)、通用分组无线服务技术(general packet radio service,GPRS),或者4G蜂窝通信,例如长期演进(long term evolution,LTE),或者5G蜂窝通信。无线通信系统146可利用WiFi与无线局域网(wireless local area network,WLAN)通信。在一些实施例中,无线通信系统146 可利用红外链路、蓝牙等与设备直接通信。其他无线协议,例如各种车辆通信系统,例如,无线通信系统146可包括一个或多个专用短程通信(dedicated short range communications,DSRC)设备,这些设备可包括车辆和/或路边台站之间的公共和/或私有数据通信。Wireless communication system 146 may wirelessly communicate with one or more devices, either directly or via a communication network. For example, wireless communication system 146 may use 3G cellular communications such as code division multiple access (CDMA), global system for mobile communications (GSM), general packet radio service , GPRS), or 4G cellular communications, such as long term evolution (LTE), or 5G cellular communications. The wireless communication system 146 may communicate with a wireless local area network (WLAN) using WiFi. In some embodiments, the wireless communication system 146 may communicate directly with the device using an infrared link, Bluetooth, or the like. Other wireless protocols, such as various vehicle communication systems, for example, wireless communication system 146 may include one or more dedicated short range communications (DSRC) devices, which may include communication between vehicles and/or roadside stations public and/or private data communications.
电源110可向车辆100的各种组件提供电力。在一个实施例中,电源110可以为可再充电锂离子或铅酸电池。这种电池的一个或多个电池组可被配置为电源为车辆100的各种组件提供电力。在一些实施例中,电源110和能量源119可一起实现,例如一些全电动车中那样。The power supply 110 may provide power to various components of the vehicle 100 . In one embodiment, the power source 110 may be a rechargeable lithium-ion or lead-acid battery. One or more battery packs of such a battery may be configured as a power source to provide power to various components of the vehicle 100 . In some embodiments, power source 110 and energy source 119 may be implemented together, such as in some all-electric vehicles.
车辆100的部分或所有功能受计算机系统112控制。计算机系统112可包括至少一个处理器113,处理器113执行存储在例如存储器114这样的非暂态计算机可读介质中的指令115。计算机系统112还可以是采用分布式方式控制车辆100的个体组件或子系统的多个计算设备。Some or all of the functions of the vehicle 100 are controlled by the computer system 112 . Computer system 112 may include at least one processor 113 that executes instructions 115 stored in a non-transitory computer-readable medium such as memory 114 . Computer system 112 may also be multiple computing devices that control individual components or subsystems of vehicle 100 in a distributed fashion.
处理器113可以是任何常规的处理器,诸如商业可获得的CPU。可选地,该处理器可以是诸如ASIC或其它基于硬件的处理器的专用设备。尽管图1功能性地图示了处理器、存储器、和在相同块中的计算机110的其它元件,但是本领域的普通技术人员应该理解该处理器、计算机、或存储器实际上可以包括可以或者可以不存储在相同的物理外壳内的多个处理器、计算机、或存储器。例如,存储器可以是硬盘驱动器或位于不同于计算机110的外壳内的其它存储介质。因此,对处理器或计算机的引用将被理解为包括对可以或者可以不并行操作的处理器或计算机或存储器的集合的引用。不同于使用单一的处理器来执行此处所描述的步骤,诸如转向组件和减速组件的一些组件每个都可以具有其自己的处理器,所述处理器只执行与特定于组件的功能相关的计算。The processor 113 may be any conventional processor, such as a commercially available CPU. Alternatively, the processor may be a dedicated device such as an ASIC or other hardware-based processor. Although FIG. 1 functionally illustrates the processor, memory, and other elements of the computer 110 in the same block, one of ordinary skill in the art will understand that the processor, computer, or memory may actually include a processor, a computer, or a memory that may or may not Multiple processors, computers, or memories stored within the same physical enclosure. For example, the memory may be a hard drive or other storage medium located within an enclosure other than computer 110 . Thus, reference to a processor or computer will be understood to include reference to a collection of processors or computers or memories that may or may not operate in parallel. Rather than using a single processor to perform the steps described herein, some components such as the steering and deceleration components may each have their own processor that only performs computations related to component-specific functions .
在此处所描述的各个方面中,处理器可以位于远离该车辆并且与该车辆进行无线通信。在其它方面中,此处所描述的过程中的一些在布置于车辆内的处理器上执行而其它则由远程处理器执行,包括采取执行单一操纵的必要步骤。In various aspects described herein, a processor may be located remotely from the vehicle and in wireless communication with the vehicle. In other aspects, some of the processes described herein are performed on a processor disposed within the vehicle while others are performed by a remote processor, including taking steps necessary to perform a single maneuver.
在一些实施例中,存储器114可包含指令115(例如,程序逻辑),指令115可被处理器113执行来执行车辆100的各种功能,包括以上描述的那些功能。存储器114也可包含额外的指令,包括向行进系统102、传感器系统104、控制系统106和外围设备108中的一个或多个发送数据、从其接收数据、与其交互和/或对其进行控制的指令。In some embodiments, the memory 114 may contain instructions 115 (eg, program logic) executable by the processor 113 to perform various functions of the vehicle 100 , including those described above. Memory 114 may also contain additional instructions, including instructions to send data to, receive data from, interact with, and/or control one or more of travel system 102 , sensor system 104 , control system 106 , and peripherals 108 . instruction.
除了指令115以外,存储器114还可存储数据,例如道路地图、路线信息,车辆的位置、方向、速度以及其它这样的车辆数据,以及其他信息。这种信息可在车辆100在自主、半自主和/或手动模式中操作期间被车辆100和计算机系统112使用。In addition to instructions 115, memory 114 may store data such as road maps, route information, vehicle location, direction, speed, and other such vehicle data, among other information. Such information may be used by the vehicle 100 and the computer system 112 during operation of the vehicle 100 in autonomous, semi-autonomous and/or manual modes.
用户接口116,用于向车辆100的用户提供信息或从其接收信息。可选地,用户接口116可包括在外围设备108的集合内的一个或多个输入/输出设备,例如无线通信系统146、车车在电脑148、麦克风150和扬声器152。A user interface 116 for providing information to or receiving information from a user of the vehicle 100 . Optionally, the user interface 116 may include one or more input/output devices within the set of peripheral devices 108 , such as a wireless communication system 146 , an onboard computer 148 , a microphone 150 and a speaker 152 .
计算机系统112可基于从各种子系统(例如,行进系统102、传感器系统104和控制系统106)以及从用户接口116接收的输入来控制车辆100的功能。例如,计算机系统112可利用来自控制系统106的输入以便控制转向单元132来避免由传感器系统104和障碍物避免系统144检测到的障碍物。在一些实施例中,计算机系统112可操作来对车辆100及其子系统的许多方面提供控制。Computer system 112 may control functions of vehicle 100 based on input received from various subsystems (eg, travel system 102 , sensor system 104 , and control system 106 ) and from user interface 116 . For example, computer system 112 may utilize input from control system 106 in order to control steering unit 132 to avoid obstacles detected by sensor system 104 and obstacle avoidance system 144 . In some embodiments, computer system 112 is operable to provide control of various aspects of vehicle 100 and its subsystems.
可选地,上述这些组件中的一个或多个可与车辆100分开安装或关联。例如,存储器 114可以部分或完全地与车辆100分开存在。上述组件可以按有线和/或无线方式来通信地耦合在一起。Alternatively, one or more of these components described above may be installed or associated with the vehicle 100 separately. For example, memory 114 may exist partially or completely separate from vehicle 100. The above-described components may be communicatively coupled together in a wired and/or wireless manner.
可选地,上述组件只是一个示例,实际应用中,上述各个模块中的组件有可能根据实际需要增添或者删除,图1不应理解为对本申请实施例的限制。Optionally, the above component is just an example. In practical applications, components in each of the above modules may be added or deleted according to actual needs, and FIG. 1 should not be construed as a limitation on the embodiments of the present application.
在道路行进的自动驾驶汽车,如上面的车辆100,可以识别其周围环境内的物体以确定对当前速度的调整。所述物体可以是其它车辆、交通控制设备、或者其它类型的物体。在一些示例中,可以独立地考虑每个识别的物体,并且基于物体的各自的特性,诸如它的当前速度、加速度、与车辆的间距等,可以用来确定自动驾驶汽车所要调整的速度。A self-driving car traveling on a road, such as vehicle 100 above, can recognize objects within its surroundings to determine adjustments to the current speed. The objects may be other vehicles, traffic control equipment, or other types of objects. In some examples, each identified object may be considered independently, and based on the object's respective characteristics, such as its current speed, acceleration, distance from the vehicle, etc., may be used to determine the speed at which the autonomous vehicle is to adjust.
可选地,自动驾驶汽车车辆100或者与自动驾驶车辆100相关联的计算设备(如图1的计算机系统112、计算机视觉系统140、存储器114)可以基于所识别的物体的特性和周围环境的状态(例如,交通、雨、道路上的冰、等等)来预测所述识别的物体的行为。可选地,每一个所识别的物体都依赖于彼此的行为,因此还可以将所识别的所有物体全部一起考虑来预测单个识别的物体的行为。车辆100能够基于预测的所述识别的物体的行为来调整它的速度。换句话说,自动驾驶汽车能够基于所预测的物体的行为来确定车辆将需要调整到(例如,加速、减速、或者停止)什么稳定状态。在这个过程中,也可以考虑其它因素来确定车辆100的速度,诸如,车辆100在行驶的道路中的横向位置、道路的曲率、静态和动态物体的接近度等等。Alternatively, the autonomous vehicle vehicle 100 or a computing device associated with the autonomous vehicle 100 (eg, computer system 112, computer vision system 140, memory 114 of FIG. 1) may be based on the characteristics of the identified objects and the state of the surrounding environment (eg, traffic, rain, ice on the road, etc.) to predict the behavior of the identified object. Optionally, each identified object is dependent on the behavior of the other, so it is also possible to predict the behavior of a single identified object by considering all identified objects together. The vehicle 100 can adjust its speed based on the predicted behavior of the identified object. In other words, the self-driving car can determine what steady state the vehicle will need to adjust to (eg, accelerate, decelerate, or stop) based on the predicted behavior of the object. In this process, other factors may also be considered to determine the speed of the vehicle 100, such as the lateral position of the vehicle 100 in the road being traveled, the curvature of the road, the proximity of static and dynamic objects, and the like.
除了提供调整自动驾驶汽车的速度的指令之外,计算设备还可以提供修改车辆100的转向角的指令,以使得自动驾驶汽车遵循给定的轨迹和/或维持与自动驾驶汽车附近的物体(例如,道路上的相邻车道中的轿车)的安全横向和纵向距离。In addition to providing instructions to adjust the speed of the self-driving car, the computing device may also provide instructions to modify the steering angle of the vehicle 100 so that the self-driving car follows a given trajectory and/or maintains contact with objects in the vicinity of the self-driving car (eg, , cars in adjacent lanes on the road) safe lateral and longitudinal distances.
可选的,自动驾驶汽车车辆100或者与自动驾驶车辆100相关联的计算设备(如图1的计算机系统112、计算机视觉系统140、存储器114)还可以基于车辆的状态及检测到的环境信息,预测在前方路段自动驾驶是否可用,并控制自动驾驶模式和人工驾驶模式的切换。Optionally, the autonomous vehicle 100 or a computing device associated with the autonomous vehicle 100 (such as the computer system 112, the computer vision system 140, and the memory 114 in FIG. 1 ) may also be based on the state of the vehicle and the detected environmental information, Predict the availability of autonomous driving on the road ahead and control the switching between autonomous and manual driving modes.
上述车辆100可以为轿车、卡车、摩托车、公共汽车、船、飞机、直升飞机、割草机、娱乐车、游乐场车辆、施工设备、电车、高尔夫球车、火车、和手推车等,本申请实施例不做特别的限定。The above-mentioned vehicle 100 can be a car, a truck, a motorcycle, a bus, a boat, an airplane, a helicopter, a lawn mower, a recreational vehicle, a playground vehicle, construction equipment, a tram, a golf cart, a train, a cart, etc. The application examples are not particularly limited.
图2是本申请实施例提供的一种自动驾驶系统的示例图。FIG. 2 is an example diagram of an automatic driving system provided by an embodiment of the present application.
如图2所示的自动驾驶系统包括计算机系统101,其中,计算机系统101包括处理器103,处理器103和系统总线105耦合。处理器103可以是一个或者多个处理器,其中每个处理器都可以包括一个或多个处理器核。显示适配器(video adapter)107,显示适配器可以驱动显示器109,显示器109和系统总线105耦合。系统总线105通过总线桥111和输入输出(input/output,I/O)总线113耦合。I/O接口115和I/O总线耦合。I/O接口115和多种I/O设备进行通信,比如输入设备117(如:键盘,鼠标,触摸屏等),多媒体盘(media tray)121,(例如,只读光盘(compact disc read-only memory,CD-ROM),多媒体接口等)。收发器123(可以发送和/或接受无线电通信信号),摄像头155(可以捕捉景田和动态数字视频图像)和外部通用串行总线(universal serial bus,USB)接口125。其中,可选地,和I/O接口115相连接的接口可以是USB接口。The automatic driving system shown in FIG. 2 includes a computer system 101 , wherein the computer system 101 includes a processor 103 , and the processor 103 is coupled with a system bus 105 . The processor 103 may be one or more processors, each of which may include one or more processor cores. A video adapter 107, which can drive a display 109, is coupled to the system bus 105. The system bus 105 is coupled to an input/output (I/O) bus 113 through a bus bridge 111 . I/O interface 115 is coupled to the I/O bus. I/O interface 115 communicates with various I/O devices, such as input device 117 (eg, keyboard, mouse, touch screen, etc.), media tray 121, (eg, compact disc read-only) memory, CD-ROM), multimedia interface, etc.). Transceiver 123 (which can transmit and/or receive radio communication signals), camera 155 (which can capture sceneries and dynamic digital video images) and external universal serial bus (USB) interface 125 . Wherein, optionally, the interface connected to the I/O interface 115 may be a USB interface.
其中,处理器103可以是任何传统处理器,包括精简指令集计算(reduced instruction set computer,RISC)处理器、复杂指令集计算(complex instruction set computer,CISC)处理器或上述的组合。可选地,处理器可以是诸如专用集成电路(application specific integrated circuit,ASIC)的专用装置。可选地,处理器103可以是神经网络处理器或者是神经网络处理器和上述传统处理器的组合。The processor 103 may be any conventional processor, including a reduced instruction set computing (reduced instruction set computer, RISC) processor, a complex instruction set computing (complex instruction set computer, CISC) processor or a combination of the above. Alternatively, the processor may be a dedicated device such as an application specific integrated circuit (ASIC). Optionally, the processor 103 may be a neural network processor or a combination of a neural network processor and the above-mentioned conventional processors.
可选地,在本文所述的各种实施例中,计算机系统101可位于远离自动驾驶车辆的地方,并且可与自动驾驶车辆无线通信。在其它方面,本文所述的一些过程在设置在自动驾驶车辆内的处理器上执行,其它由远程处理器执行,包括采取执行单个操纵所需的动作。Alternatively, in various embodiments described herein, computer system 101 may be located remotely from the autonomous vehicle and may communicate wirelessly with the autonomous vehicle. In other aspects, some of the processes described herein are performed on a processor disposed within the autonomous vehicle, others are performed by a remote processor, including taking actions required to perform a single maneuver.
计算机101可以通过网络接口129和软件部署服务器149通信。网络接口129是硬件网络接口,比如,网卡。网络127可以是外部网络,比如因特网,也可以是内部网络,比如以太网或者虚拟私人网络(virtual private network,VPN)。可选地,网络127还可以是无线网络,比如WiFi网络,蜂窝网络等。 Computer 101 may communicate with software deployment server 149 through network interface 129 . Network interface 129 is a hardware network interface, such as a network card. The network 127 may be an external network, such as the Internet, or an internal network, such as an Ethernet network or a virtual private network (VPN). Optionally, the network 127 may also be a wireless network, such as a WiFi network, a cellular network, and the like.
硬盘驱动接口和系统总线105耦合。硬件驱动接口和硬盘驱动器相连接。系统内存135和系统总线105耦合。运行在系统内存135的数据可以包括计算机101的操作系统137和应用程序143。The hard disk drive interface is coupled to the system bus 105 . The hard drive interface is connected to the hard drive. System memory 135 is coupled to system bus 105 . Data running in system memory 135 may include operating system 137 and application programs 143 of computer 101 .
操作系统包括解析器139(shell)和内核141(kernel)。shell 139是介于使用者和操作系统之内核(kernel)间的一个接口。shell是操作系统最外面的一层。shell管理使用者与操作系统之间的交互:等待使用者的输入,向操作系统解释使用者的输入,并且处理各种各样的操作系统的输出结果。The operating system includes a parser 139 (shell) and a kernel 141 (kernel). The shell 139 is an interface between the user and the kernel of the operating system. The shell is the outermost layer of the operating system. The shell manages the interaction between the user and the operating system: waiting for user input, interpreting user input to the operating system, and processing various operating system output.
内核141由操作系统中用于管理存储器、文件、外设和系统资源的那些部分组成。直接与硬件交互,操作系统内核通常运行进程,并提供进程间的通信,提供CPU时间片管理、中断、内存管理、IO管理等等。Kernel 141 consists of those parts of the operating system that manage memory, files, peripherals, and system resources. Interacting directly with hardware, the operating system kernel usually runs processes and provides inter-process communication, providing CPU time slice management, interrupts, memory management, IO management, and more.
应用程序143包括控制汽车自动驾驶相关的程序,比如,管理自动驾驶的汽车和路上障碍物交互的程序,控制自动驾驶汽车路线或者速度的程序,控制自动驾驶汽车和路上其他自动驾驶汽车交互的程序。应用程序143也存在于deploying server 149的系统上。在一个实施例中,在需要执行应用程序147时,计算机系统101可以从deploying server14下载应用程序143。Application 143 includes programs that control the autonomous driving of the car, for example, programs that manage the interaction of the autonomous car with obstacles on the road, programs that control the route or speed of the autonomous car, and programs that control the interaction of the autonomous car with other autonomous vehicles on the road. . Application 143 also exists on the system of deploying server 149. In one embodiment, computer system 101 may download application 143 from deploying server 14 when application 147 needs to be executed.
例如,应用程序141可以是控制自动驾驶车辆启动或关闭辅助自动驾驶功能的程序。For example, the application 141 may be a program that controls the autonomous vehicle to activate or deactivate the assisted autonomous driving function.
传感器153和计算机系统101关联。传感器153用于探测计算机101周围的环境。举例来说,传感器153可以探测动物,汽车,障碍物和人行横道等,进一步传感器还可以探测上述动物,汽车,障碍物和人行横道等物体周围的环境,比如:动物周围的环境,例如,动物周围出现的其他动物,天气条件,周围环境的光亮度等。可选地,如果计算机101位于自动驾驶的汽车上,传感器可以是摄像头,红外线感应器,化学检测器,麦克风等。 Sensor 153 is associated with computer system 101 . The sensor 153 is used to detect the environment around the computer 101 . For example, the sensor 153 can detect animals, cars, obstacles and pedestrian crossings, etc. Further sensors can also detect the environment around the above-mentioned animals, cars, obstacles and pedestrian crossings, such as: the environment around animals, for example, animals appear around other animals, weather conditions, ambient light levels, etc. Alternatively, if the computer 101 is located on a self-driving car, the sensors may be cameras, infrared sensors, chemical detectors, microphones, and the like.
图1中的计算机系统112还可以从其它计算机系统接收信息或转移信息到其它计算机系统。或者,从车辆100的传感器系统104收集的传感器数据可以被转移到另一个计算机对此数据进行处理。Computer system 112 in FIG. 1 may also receive information from or transfer information to other computer systems. Alternatively, sensor data collected from the sensor system 104 of the vehicle 100 may be transferred to another computer for processing of the data.
例如,如图3所示,来自计算机系统312的数据可以经由网络被传送到云侧的服务器320(也可以称为云端)用于进一步的处理。网络以及中间节点可以包括各种配置和协议,包括因特网、万维网、内联网、虚拟专用网络、广域网、局域网、使用一个或多个公司的专有通信协议的专用网络、以太网、WiFi和超文本传输协议(hyper text transfer protocol, HTTP)、以及前述的各种组合。这种通信可以由能够传送数据到其它计算机和从其它计算机传送数据的任何设备,诸如调制解调器和无线接口。例如,将车辆的状态以及环境信息等数据传送至云侧的服务器320以进一步处理,云侧服务器可以利用多种神经网络模型对这些数据进行识别、处理,并将识别结果反馈计算机系统312,使得计算机系统312可以确认是否开启或关闭辅助自动驾驶功能。For example, as shown in FIG. 3, data from the computer system 312 may be transmitted via a network to a server 320 on the cloud side (which may also be referred to as the cloud) for further processing. Networks and intermediate nodes may include various configurations and protocols, including the Internet, the World Wide Web, Intranets, Virtual Private Networks, Wide Area Networks, Local Area Networks, private networks using one or more of the company's proprietary communication protocols, Ethernet, WiFi, and hypertext The hypertext transfer protocol (HTTP), and various combinations of the foregoing. Such communications may be by any device capable of transferring data to and from other computers, such as modems and wireless interfaces. For example, data such as vehicle status and environmental information are transmitted to the cloud-side server 320 for further processing. The cloud-side server can use a variety of neural network models to identify and process these data, and feed the identification results back to the computer system 312, so that The computer system 312 may determine whether the assisted autopilot function is turned on or off.
在一个示例中,服务器320可以包括具有多个计算机的服务器,例如负载均衡服务器群,为了从计算机系统312接收、处理并传送数据的目的,其与网络的不同节点交换信息。该服务器可以被类似于计算机系统312配置,具有处理器330、存储器340、指令350、和数据360。In one example, server 320 may include a server having multiple computers, such as a load balancing server farm, that exchange information with different nodes of the network for the purpose of receiving, processing, and transmitting data from computer system 312 . The server may be configured similarly to computer system 312 , with processor 330 , memory 340 , instructions 350 , and data 360 .
自动驾驶系统可以包含若干辅助自动驾驶功能。例如预碰撞安全制动(pre-collision system,PCS)、自适应巡航控制(adaptive cruise control,ACC),车道保持辅助(lane keeping aid,LKA),横穿交通警告(cross traffic alert,CTA)、车尾横穿交通警告(rear cross traffic alert,RCTA)、盲点报警(blind spot warning,BSW)、关闭车辆报警以及交通拥堵辅助(traffic jam assist,TJA)等。An automated driving system may contain several assisted automated driving functions. Such as pre-collision safety braking (pre-collision system, PCS), adaptive cruise control (adaptive cruise control, ACC), lane keeping assist (lane keeping aid, LKA), cross traffic alert (cross traffic alert, CTA), Rear cross traffic alert (RCTA), blind spot warning (BSW), off vehicle warning and traffic jam assist (TJA), etc.
目前,自动驾驶车辆的行驶依据是根据预先设定好的目的地以及通过各个传感器所获得的车辆周围环境,最终通过规划好的路径,将用户送到对应的目的地。但是,在实际车辆行驶途中,用户根据车周围的视觉信息,可能会产生一些与行驶到目的地不同的、临时的意图,例如:看到路边有熟人,需要临时停车,和他打招呼;觉得离前面车较近,需要拉开距离等。At present, the driving basis of autonomous vehicles is based on the preset destination and the surrounding environment of the vehicle obtained by various sensors, and finally sends the user to the corresponding destination through the planned route. However, during the actual driving of the vehicle, the user may have some temporary intentions that are different from driving to the destination according to the visual information around the vehicle. If you are close to the car in front, you need to keep your distance, etc.
然而,在现有的自动驾驶技术下,若用户产生上述临时意图时,只能通过人工干预的方式,暂时接管车辆的控制权,然后执行自己的相关的临时意图。由于此时车辆已切换为人工驾驶模式,用户不再能够享受自动驾驶技术带来的更省心、更安全的驾驶体验。另外,当自动驾驶的等级处于第5级(Level 5,L5)时(按照美国机动车工程师学会(society of automotive engineers,SAE)关于自动化层级的定义),车辆的人工干预功能可能被取消,这时驾驶员将无法执行上述临时意图,使得用户体验感降低。However, under the existing autonomous driving technology, if the user generates the above temporary intention, he can only temporarily take over the control of the vehicle through manual intervention, and then execute his own temporary intention. Since the vehicle has been switched to manual driving mode at this time, users can no longer enjoy the more worry-free and safer driving experience brought by autonomous driving technology. In addition, when the level of automatic driving is at Level 5 (L5) (as defined by the Society of Automotive Engineers (SAE) on the level of automation), the human intervention function of the vehicle may be canceled, which At this time, the driver will not be able to perform the above temporary intention, so that the user experience will be reduced.
因此,如何提高自动驾驶过程中用户的体验感是亟需解决的问题。Therefore, how to improve the user experience in the process of autonomous driving is an urgent problem to be solved.
针对上述问题,本申请提供了一种控制车辆行驶的方法,使得自动驾驶车辆在自动驾驶模式下行驶的过程中,若用户产生临时意图时,可以通过对用户指令和车辆周围的环境信息进行多模态理解,确定用户的行驶意图,并根据用户的行驶意图对车辆的运动进行控制。从而能够在自动驾驶模式下实现用户临时意图的执行,进一步能够提高自动驾驶过程中用户的体验感。In view of the above problems, the present application provides a method for controlling the driving of a vehicle, so that during the process of driving an autonomous vehicle in the automatic driving mode, if the user has a temporary intention, the user's instructions and the surrounding environment information of the vehicle can be multi-processed. Modal understanding, determine the user's driving intention, and control the motion of the vehicle according to the user's driving intention. Therefore, the user's temporary intention can be executed in the automatic driving mode, and the user's experience in the automatic driving process can be further improved.
图4是本申请实施例的提供的一种控制车辆行驶的方法示例图。应理解,图4所示方法可以应用在图1所示车辆或图2所示自动驾驶系统中。应理解,图4所示方法在自动驾驶模式下执行。FIG. 4 is an example diagram of a method for controlling the driving of a vehicle provided by an embodiment of the present application. It should be understood that the method shown in FIG. 4 can be applied to the vehicle shown in FIG. 1 or the automatic driving system shown in FIG. 2 . It should be understood that the method shown in FIG. 4 is performed in an automatic driving mode.
如图4所示,方法400包括步骤S410至S440,下面对这些步骤进行详细的描述。As shown in FIG. 4 , the method 400 includes steps S410 to S440, which will be described in detail below.
S410,在车辆的自动驾驶模式下,获取用户指令。S410, in the automatic driving mode of the vehicle, obtain a user instruction.
可选地,该用户指令包括:用户自然语音指令(即用户语音指令)、用户文本指令、用户隔空手势指令等中的任意一项或多项,本申请对此不做限定。Optionally, the user instruction includes: any one or more of a user's natural voice instruction (ie, a user's voice instruction), a user text instruction, and a user air gesture instruction, which is not limited in this application.
应理解,在车辆自动驾驶模式下行驶的过程中,若用户产生临时意图,例如:看到路 边有熟人,需要临时停车,和他打招呼;觉得离前面车较近,需要拉开距离等。可以通过用户指令的方式将临时意图输入到相关的车载设备。例如,通过自然语音指令的方式将临时意图输入到麦克风中;又例如,通过隔空手势指令的方式将临时意图输入到相关的用户动作获取装置中;再例如,通过文本指令的方式将临时意图直接输入至相关的文本录入装置中,本申请对此不做限定。It should be understood that in the process of driving in the automatic driving mode of the vehicle, if the user has temporary intentions, such as: seeing an acquaintance on the side of the road, you need to stop temporarily and say hello to him; if you feel that you are close to the car in front, you need to distance yourself, etc. Temporary intentions can be input to related in-vehicle devices by means of user instructions. For example, the temporary intent is input into the microphone by means of natural voice instructions; for another example, the temporary intent is input into the relevant user action acquisition device by means of air gesture instructions; for example, the temporary intent is transmitted by means of text instructions It is directly input into the relevant text input device, which is not limited in this application.
可选地,若将上述步骤S410中的获取用户指令限定为获取用户文本指令,那么在实际操作中,可以直接通过相关的文本录入装置从用户获取用户文本指令,也可以先从其他装置获取用户语音指令或隔空手势指令,然后将其通过相关装置转化为文本指令,本申请对文本指令的获取方式不做限定。示例性地,若用户产生临时意图,可以使用自然语音向车内的相关车载设备(例如:麦克风)说出自己的意图。可选地,将自然语音指令转换为文本指令可以通过自动语音识别(automatic speech recognition,ASR)来实现。此时,获取用户的文本指令,具体可以从ASR中获取文本指令。示例性地,隔空手势指令可以通过相关手势识别装置转换为文本指令。Optionally, if the obtaining of the user instruction in the above step S410 is limited to obtaining the user text instruction, then in actual operation, the user text instruction may be obtained directly from the user through the relevant text entry device, or the user may be obtained from other devices first. A voice command or an air gesture command is then converted into a text command through a related device. The present application does not limit the acquisition method of the text command. Exemplarily, if the user generates a temporary intention, he can use natural speech to speak his intention to the relevant in-vehicle device (eg, a microphone) in the car. Optionally, the conversion of natural speech instructions into text instructions may be implemented by automatic speech recognition (ASR). At this time, the user's text instruction is acquired, and specifically, the text instruction may be acquired from the ASR. Exemplarily, the air gesture instruction can be converted into a text instruction by the relevant gesture recognition device.
应理解,为便于描述,在下文实施例中,将以用户文本指令为例进行描述,但应理解,这并不能构成对本申请方案的限定。It should be understood that, for the convenience of description, in the following embodiments, the user text instruction will be used as an example for description, but it should be understood that this does not constitute a limitation on the solution of the present application.
S420,获取车辆周围的环境信息。S420 , obtain environmental information around the vehicle.
应理解,车辆周围的环境信息可以通过拍摄装置获取,具体地,通过拍摄装置获取图像或视频,以通过该图像或视频中的信息反映环境信息;也可以通过激光雷达、车载传感器和/或车联网等获取的环境信息,本申请对此不做限定。为便于描述,在本申请中将以拍摄装置获取环境信息为例对方案进行描述。It should be understood that the environmental information around the vehicle can be acquired through a photographing device, specifically, an image or video is acquired through the photographing device, so as to reflect the environmental information through the information in the image or video; it can also be obtained through lidar, vehicle-mounted sensors and/or vehicle This application does not limit the environmental information obtained through networking or the like. For convenience of description, in this application, the solution will be described by taking the photographing device acquiring the environmental information as an example.
应理解,在实际操作中,拍摄装置可以获取视频信息或图像信息,也可以先获取车辆周围的视频信息,再从视频中进行图像信息的获取,本申请对此不做限定。为便于描述,在下文实施例中,均以获取拍摄装置拍摄的图像信息为例进行描述,但应理解,这并不构成对本申请的限定。It should be understood that in actual operation, the photographing device may obtain video information or image information, or may first obtain video information around the vehicle, and then obtain image information from the video, which is not limited in this application. For ease of description, in the following embodiments, the acquisition of image information captured by a photographing device is taken as an example for description, but it should be understood that this does not constitute a limitation to the present application.
可选地,在获取到用户指令之后,可以向拍摄装置发送拍摄激活信号,以激活拍摄装置对车辆周围的图像信息(即环境信息)进行拍摄。在拍摄装置对周围图像信息进行拍照之后,获取摄像装置所拍摄的周围的图像信息。Optionally, after the user instruction is acquired, a shooting activation signal may be sent to the photographing apparatus to activate the photographing apparatus to photograph image information (ie, environmental information) around the vehicle. After the photographing device photographs the surrounding image information, the surrounding image information photographed by the photographing device is acquired.
可选地,拍摄装置可以周期性地拍摄的车辆周围的图像信息。此时获取所述车辆周围的图像信息可以包括:获取拍摄装置周期性地拍摄的所述车辆周围的图像信息。Optionally, the photographing device may periodically photograph image information around the vehicle. At this time, acquiring image information around the vehicle may include: acquiring image information around the vehicle periodically captured by a photographing device.
这种情况中,在进行下文所述的多模态理解时,需要在周期性地拍摄的车辆周围的图像信息中选择出合适的图像图像进行多模态理解。In this case, when performing the multimodal understanding described below, it is necessary to select an appropriate image image from the periodically captured image information around the vehicle to perform the multimodal understanding.
其中,合适的图像信息可以是拍照装置最新所拍摄的图像信息,也可以是根据自然语音指令或隔空手势指令等的识别时间估计特定的时间间隔所对应的图像信息。还可以是获取文本指令时所对应的图像信息,具体应该结合实际情况进行图像信息的选择,本申请对此不做限定。The suitable image information may be the image information newly captured by the photographing device, or may be image information corresponding to a specific time interval estimated according to the recognition time of natural voice commands or air gesture commands. It may also be the image information corresponding to the acquisition of the text instruction. Specifically, the selection of the image information should be carried out according to the actual situation, which is not limited in this application.
S430,对用户指令和车辆周围的环境信息进行多模态理解,确定用户的行驶意图。或者,S430, perform multimodal understanding on the user's instruction and the environmental information around the vehicle, and determine the user's driving intention. or,
上述步骤S430也可以是:根据用户指令和车辆周围的环境信息,确定用户的行驶意图。这样意味着本申请方案不限定根据用户指令和车辆周围的环境信息确定用户的行驶意 图的方式,意味着可以对用户指令和车辆周围的环境信息进行多模态理解确定用户的行驶意图,也可以通过其他方式确定,本申请对此不做限定。但作为一个优选方案,在下文中,均以对用户指令和车辆周围的环境信息进行多模态理解,确定用户的行驶意图为例进行描述。The above step S430 may also be: determining the user's driving intention according to the user's instruction and environmental information around the vehicle. This means that the solution of the present application does not limit the way of determining the user's driving intention according to the user's instructions and the environmental information around the vehicle. Determined by other means, which is not limited in this application. However, as a preferred solution, in the following description, the multimodal understanding of the user's instruction and the environmental information around the vehicle to determine the user's driving intention is used as an example for description.
那么,在本申请中,在获取到用户指令和车辆周围的环境信息后,便可以进行多模态理解,确定出用户的行驶意图。意味着,步骤S430可以在多模态处理模块(即图5中的多模态处理模块540)中完成。下文将结合图5对该模块进行描述,并结合图8与图9对多模态处理的过程进行描述,此处先不进行赘述。Then, in the present application, after obtaining the user's instruction and the environmental information around the vehicle, multi-modal understanding can be performed to determine the user's driving intention. It means that step S430 can be completed in a multi-modal processing module (ie, the multi-modal processing module 540 in FIG. 5 ). The module will be described below with reference to FIG. 5 , and the process of multimodal processing will be described with reference to FIG. 8 and FIG. 9 , which will not be repeated here.
可选地,行驶意图包括至少一个意图,至少一个意图中的每个意图包括n个槽位,n个槽位中的每个槽位包括槽位名、槽位值以及槽位值的分类,n大于或等于0,n为整数。Optionally, the driving intention includes at least one intention, each intention in the at least one intention includes n slots, and each slot in the n slots includes a slot name, a slot value, and a classification of the slot value, n is greater than or equal to 0, and n is an integer.
可选地,意图可以包括:停车、超车、减速、跟车、转向等中的至少一种。应理解,在实际操作中,也可以包括其他意图,本申请对此不做限定。Optionally, the intent may include at least one of: stop, overtake, slow down, follow, turn, and the like. It should be understood that other intentions may also be included in actual operations, which are not limited in this application.
可选地,槽位名可以包括:停车的位置、速度值、超车或跟车的对象、转向方位等中的至少一种。应理解,在实际操作中,也可以包括其他槽位名,本申请对此不做限定。Optionally, the slot name may include at least one of: a parking position, a speed value, an overtaking or following object, a turning direction, and the like. It should be understood that in actual operation, other slot names may also be included, which are not limited in this application.
可选地,所述槽位值的分类可以为:枚举类槽位值、文本类槽位值或环境类槽位值。Optionally, the classification of the slot value may be: an enumeration type slot value, a text type slot value or an environment type slot value.
其中,枚举类槽位值表示槽位值是预先定义的枚举值。例如:用户指令为“在下个路口右转”,此时,存在与转向方位对应的槽位,由于转向方位是可以枚举的,例如:转向方位仅有四个选项:左、右、直行、掉头。此时,槽位“转向方位”的槽位值为“右”,该槽位值可以理解为枚举类槽位值。The enumeration class slot value indicates that the slot value is a predefined enumeration value. For example: the user command is "turn right at the next intersection". At this time, there is a slot corresponding to the steering orientation. Since the steering orientation can be enumerated, for example, there are only four options for the steering orientation: left, right, straight, U-turn. At this time, the slot value of the slot "turning orientation" is "right", and the slot value can be understood as an enumeration type slot value.
文本类槽位值表示槽位值是用户指令中的子串或根据所述用户指令生成的文本应理解,此时的槽位值是不可枚举的值。例如:用户指令为“在加油站旁边停下来”,此时,存在与停车的位置对应的槽位,由于停车的位置是不可以枚举的,此时,可以使用指令中的子串“加油站旁边”作为槽位值,该槽位值可以理解为文本类槽位值。又例如,用户指令为“在前方豪华的酒店处停车”,此时,存在与停车的位置对应的槽位,由于停车的位置是不可以枚举的,此时,可以使用根据指令生成的文本“高等级的酒店”作为槽位值,该槽位值也可以理解为文本类槽位值。应理解,上述提到,在下文均以用户文本指令为例进行描述。那么,文本类槽位值表示槽位值可以是用户文本指令中的子串或根据所述用户文本指令生成的文本,且下面实施例中均以此为例。The text-type slot value indicates that the slot value is a substring in the user instruction or the text generated according to the user instruction. It should be understood that the slot value at this time is a non-enumerable value. For example: the user command is "stop next to the gas station". At this time, there is a slot corresponding to the parking position. Since the parking position cannot be enumerated, at this time, the substring in the command can be used. "Beside the station" is used as a slot value, which can be understood as a text-based slot value. For another example, the user's instruction is "park at the luxurious hotel in front". At this time, there is a slot corresponding to the parking position. Since the parking position cannot be enumerated, at this time, the text generated according to the instruction can be used. "High-level hotel" is used as a slot value, which can also be understood as a text-based slot value. It should be understood that the above-mentioned descriptions are all described below by taking a user text instruction as an example. Then, the text-type slot value indicates that the slot value may be a substring in the user text instruction or text generated according to the user text instruction, and the following embodiments take this as an example.
环境类槽位值表示槽位值是根据用户指令中所提及的内容在环境信息中做的标识。可选地,在通过拍摄装置获取环境信息时,该环境信息可以是图像信息,那么环境类槽位值也可以是图像类槽位值,该图像能够反映车辆周围的环境。因而,图像类槽位值表示槽位值是根据用户指令中所提及的内容在图像信息中做的标识。例如,在下文图11所示场景中,用户指令为“行驶到蓝车位置,靠边停车”时,存在与停车的位置对应的槽位,由于停车的位置是“蓝车位置”,则可以使用矩形框在图像信息中将“蓝车”标识出来(图11中所示),此时,该矩形框为槽位值,该槽位值可以理解为图像类槽位值。且应理解,在下文中将均以图像类槽位值为例进行描述,本申请对此不做限定。The environment class slot value indicates that the slot value is identified in the environment information according to the content mentioned in the user instruction. Optionally, when the environmental information is acquired by the photographing device, the environmental information may be image information, then the environment-based slot value may also be an image-based slot value, and the image can reflect the environment around the vehicle. Therefore, the image-type slot value indicates that the slot value is identified in the image information according to the content mentioned in the user instruction. For example, in the scenario shown in Figure 11 below, when the user command is "drive to the blue car position and pull over to the side", there is a slot corresponding to the parking position. Since the parking position is the "blue car position", you can use The rectangular frame identifies the "blue car" in the image information (as shown in Figure 11 ). At this time, the rectangular frame is the slot value, and the slot value can be understood as the image slot value. It should be understood that the following description will be made by taking the image slot value as an example, which is not limited in this application.
应理解,上文提及“行驶意图包括至少一个意图”,意味着行驶意图可以包括一个意图也可以同时包括多个意图。例如,用户指令为“在下个路口右转”时,包括一个转向意图;用户指令为“在下个路口右转,并停车”时,包括一个转向意图和一个停车意图。It should be understood that the above-mentioned "driving intention includes at least one intention" means that the driving intention may include one intention or multiple intentions at the same time. For example, when the user instruction is "turn right at the next intersection", it includes a steering intent; when the user instruction is "turn right at the next intersection and stop", it includes a steering intent and a parking intent.
上文还提及“至少一个意图中的每个意图包括n个槽位,n个槽位中的每个槽位包括槽位名、槽位值以及槽位值的分类,n大于或等于0,n为整数”,意味着,意图可以包括描述意图的一个或多个槽位,也可以不包括槽位。若意图包括描述意图的槽位,则对应的每个槽位包括槽位名、槽位值以及槽位值的分类。It is also mentioned above that "each intent in at least one intent includes n slots, each of the n slots includes a slot name, a slot value, and a classification of the slot value, and n is greater than or equal to 0. , n is an integer", which means that the intent may include one or more slots describing the intent, or may not include the slot. If the intent includes a slot describing the intent, each corresponding slot includes a slot name, a slot value, and a classification of the slot value.
例如,用户指令为“停车”时,表示意图为停车,此时没有描述意图的槽位,可以直接以该意图为准进行后续的操作。For example, when the user's command is "stop", the representation is to stop, and there is no slot describing the intent at this time, and subsequent operations can be performed directly based on the intent.
又例如,用户指令为“在前方加油站处停车”,此时有描述意图(停车)的多个槽位,则可以根据用户指令列出该槽位对应的槽位名、槽位值以及槽位值的分类。示例性地,该停车意图的第一个槽位对应的槽位名、槽位值以及槽位值的分类可以分别是停车的位置、前方加油站、文本类槽位值;该停车意图的第二个槽位对应的槽位名、槽位值以及槽位值的分类可以分别是停车的位置、矩形框(在图像信息中标识前方加油站)、图像类槽位值。For another example, if the user command is "stop at the gas station ahead", and there are multiple slots describing the intention (parking), the slot name, slot value, and slot corresponding to the slot can be listed according to the user command. Classification of bit values. Exemplarily, the slot name, slot value, and classification of the slot value corresponding to the first slot of the parking intent may be the parking location, the gas station ahead, and the text-based slot value, respectively; The slot name, slot value and the classification of the slot value corresponding to the two slots can be the parking position, the rectangular frame (identifying the gas station ahead in the image information), and the image slot value.
同时,基于此,可以看出,在同一个行驶意图中,可能会存在一个槽位值的分类也可能会存在多个槽位值的分类,需要具体情况进行分析,本申请对此不做穷举。At the same time, based on this, it can be seen that in the same driving intention, there may be one slot value classification or multiple slot value classifications, and it is necessary to analyze the specific situation, and this application is not exhaustive. lift.
可选地,用户的行驶意图可以通过增强现实-抬头显示(augmented reality-head up display,AR-HUD)或中控屏等方式呈现给用户,以便用户及时判断多模态理解结果的正确性。Optionally, the user's driving intention can be presented to the user through an augmented reality-head up display (AR-HUD) or a central control screen, so that the user can timely judge the correctness of the multimodal understanding result.
例如,行驶意图中包含环境类槽位值时,可以通过AR-HUD将用户提到的物体呈现在前挡玻璃上(如:图11中的(a)所示的矩形框),或者通过中控屏等展示出用户提到的物体。For example, when the driving intention contains the environment slot value, the AR-HUD can present the object mentioned by the user on the windshield (such as the rectangular box shown in (a) in Figure 11), or use the AR-HUD The control screen, etc. displays the objects mentioned by the user.
S440,根据用户的行驶意图,生成对车辆的自动驾驶控制指令。S440, according to the user's driving intention, generate an automatic driving control instruction for the vehicle.
可选地,可以根据上述所得到的行驶意图,生成对车辆的自动驾驶控制指令。以使得车辆在自动驾驶模式下可以根据该自动驾驶控制指令对车辆进行控制。Optionally, an automatic driving control instruction for the vehicle may be generated according to the above-obtained driving intention. So that the vehicle can control the vehicle according to the automatic driving control instruction in the automatic driving mode.
由于自动驾驶的过程中,应该遵守自动驾驶的规则,即应结合周围环境情况进行驾驶,且不能够违反交通法规等。In the process of automatic driving, the rules of automatic driving should be obeyed, that is, driving should be carried out in combination with the surrounding environment and should not violate traffic laws.
因而,可选地,可以先根据行驶意图、周围环境和交通法规,判断行驶意图是否可行;若行驶意图可行,再生成对车辆的自动驾驶控制指令。具体地,可以参考下文附图6中步骤10与步骤11的描述。Therefore, optionally, it is possible to first determine whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; if the driving intention is feasible, then generate an automatic driving control instruction for the vehicle. Specifically, reference may be made to the descriptions of steps 10 and 11 in FIG. 6 below.
在本申请实施例中,在确定行驶意图之后,根据行驶意图、周围环境和交通法规,判断行驶意图是否可行;若行驶意图可行,再生成对车辆的自动驾驶控制指令。从而能够避免在自动驾驶模式下执行用户的行驶意图时违反交通法规或出现其他问题,保证了自动驾驶过程中的用户体验和自动驾驶的安全性。In the embodiment of the present application, after the driving intention is determined, it is judged whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; if the driving intention is feasible, the automatic driving control instruction for the vehicle is regenerated. Therefore, it is possible to avoid violation of traffic laws or other problems when executing the user's driving intention in the automatic driving mode, thereby ensuring the user experience in the automatic driving process and the safety of automatic driving.
可选地,若行驶意图不可行,可以生成提示信息发送给用户。可选地,该提示信息中还可以包括行驶意图不可行的原因。Optionally, if the driving intention is not feasible, prompt information may be generated and sent to the user. Optionally, the prompt information may also include the reason why the driving intention is not feasible.
可选地,若行驶意图可行,车辆还可以通过语音播报的方式提示用户,如“正在为您执行停车”;还可以通过AR-HUD或中控屏等的方式将车辆即将行驶的目标路径和目标位置展示给用户(如:图11中的(b)所示的动态箭头及方框)。Optionally, if the driving intention is feasible, the vehicle can also prompt the user through a voice broadcast, such as "parking for you"; it can also use AR-HUD or the central control screen to display the target path and the target path of the vehicle to be driven. The target position is displayed to the user (eg, dynamic arrows and boxes shown in (b) of FIG. 11 ).
可选地,上述方法400可以在云服务器或边缘云服务器上执行,也可以在车辆的计算机系统中执行,本申请对此不做限定。Optionally, the above-mentioned method 400 may be executed on a cloud server or an edge cloud server, or may be executed in a computer system of a vehicle, which is not limited in this application.
在本申请实施例中,在车辆的自动驾驶模式下,可以通过获取用户指令以及车辆周围 的环境信息,并对用户指令以及车辆周围的环境信息进行多模态理解,确定出用户的行驶意图;再根据用户的行驶意图,生成对车辆的自动驾驶控制指令。使得车辆在自动驾驶模式下行驶时,就能够执行用户的临时行驶意图,无需用户通过人工接管控制权的方式去执行临时行驶意图,从而能够提高自动驾驶过程中用户的体验感。In the embodiment of the present application, in the automatic driving mode of the vehicle, the user's driving intention can be determined by acquiring user instructions and environmental information around the vehicle, and performing multi-modal understanding of the user instructions and environmental information around the vehicle; Then, according to the user's driving intention, an automatic driving control command for the vehicle is generated. When the vehicle is driving in the automatic driving mode, the user's temporary driving intention can be executed, and the user does not need to manually take over the control to execute the temporary driving intention, so that the user's experience in the process of automatic driving can be improved.
图5是本申请实施例提供的一种系统架构示例图。应理解,该系统架构仅为示例,不能构成对本申请的限定。如图5所示,该系统架构500包括:麦克风510、自动语音识别(automatic speech recognition,ASR)模块520、摄像头530(即,拍摄装置)、多模态处理模块540、决策规划计算模块550以及整车运动控制模块560。下面对这些模块分别作以说明。FIG. 5 is an example diagram of a system architecture provided by an embodiment of the present application. It should be understood that the system architecture is only an example, and does not constitute a limitation to the present application. As shown in FIG. 5 , the system architecture 500 includes: a microphone 510, an automatic speech recognition (ASR) module 520, a camera 530 (ie, a photographing device), a multimodal processing module 540, a decision planning calculation module 550 and Vehicle motion control module 560 . These modules are described below.
麦克风510:部署在车辆座舱内的麦克风或麦克风组,用于收集座舱内用户的音频信息,即本申请所涉及的用户语音指令,也可以称为用户的自然语音指令。Microphone 510: a microphone or microphone group deployed in the vehicle cockpit, used to collect audio information of the user in the cockpit, that is, the user's voice command involved in this application, which may also be referred to as the user's natural voice command.
ASR模块520:用于识别麦克风510所收集的用户的自然语言指令,并将用户的自然语言指令转换为文本指令。ASR module 520: used to recognize the user's natural language instructions collected by the microphone 510, and convert the user's natural language instructions into text instructions.
摄像头530:部署在车辆上的摄像头或摄像头组,用于采集车周围的图像信息。Camera 530: a camera or camera group deployed on the vehicle, used to collect image information around the vehicle.
多模态处理模块540:主要包含多模态意图识别引擎。用于接收ASR模块520所识别出的文本指令和摄像头530所采集的图像信息,并根据文本指令和图像信息生成相应的行驶意图。且在某些情况中,该多模态处理模块540还可以用于控制摄像头530进行图像信息的采集,如下文实施例1所示。Multimodal processing module 540: mainly includes a multimodal intent recognition engine. It is used to receive the text instruction recognized by the ASR module 520 and the image information collected by the camera 530, and generate the corresponding driving intention according to the text instruction and the image information. And in some cases, the multimodal processing module 540 can also be used to control the camera 530 to collect image information, as shown in Embodiment 1 below.
决策规划计算模块550:用于结合交通法规、周围环境等情况对多模态处理模块540所生成的行驶意图进行判断,确定该行驶意图是否可行。在需要进行调整的情况下对行驶意图进行调整,并生成车辆控制命令。Decision planning calculation module 550: used for judging the driving intention generated by the multimodal processing module 540 in combination with traffic regulations, surrounding environment and other conditions to determine whether the driving intention is feasible. The driving intent is adjusted where necessary, and vehicle control commands are generated.
整车运动控制模块560:用于根据决策规划计算模块550的车辆控制命令对车辆运动进行控制。Vehicle motion control module 560 : used to control the vehicle motion according to the vehicle control command from the decision planning calculation module 550 .
应理解,以上各个部件或模块的物理部署可以单独部署,也可以以任意组合的形式进行部署。应理解,在组合部署的情况下,组合的模块之间的信息转发可以不再必要。It should be understood that the physical deployment of the above components or modules can be deployed individually or in any combination. It should be understood that in the case of combined deployment, information forwarding between the combined modules may not be necessary.
应理解,上述系统架构中的全部部件或模块可以部署在车辆中;也可以将部分部件或模块,例如:ASR模块520、多模态处理模块540以及决策规划计算模块550中的部分或全部部署在云服务器或者边缘云服务器上,其他部署在车辆上,并通过车云交互的方式实现本申请的方案,本申请对此不做限定。It should be understood that all the components or modules in the above-mentioned system architecture can be deployed in the vehicle; some components or modules, such as the ASR module 520 , the multimodal processing module 540 and the decision planning calculation module 550 can also be deployed in part or in whole On the cloud server or edge cloud server, others are deployed on the vehicle, and the solution of the present application is implemented by means of vehicle-cloud interaction, which is not limited in this application.
基于上述系统架构500,下面将结合附图6至图9对本申请的具体实现方式进行详细介绍。Based on the above-mentioned system architecture 500 , the specific implementation of the present application will be described in detail below with reference to FIGS. 6 to 9 .
图6是本申请实施例提供的一种具体实现方式的示例图。如图6所示,该具体实现方式包括步骤1至步骤11,下面对这些步骤进行详细描述。FIG. 6 is an example diagram of a specific implementation provided by an embodiment of the present application. As shown in FIG. 6 , the specific implementation includes steps 1 to 11, and these steps are described in detail below.
步骤1.用户下达语音指令。 Step 1. The user issues a voice command.
在车辆根据预先输入的目的地进行自动驾驶的过程中,若该车辆上的用户临时产生新的行驶意图,则可以以语音的形式向车辆中的麦克风510说出自己的意图。During the process of automatic driving of the vehicle according to the pre-input destination, if the user on the vehicle temporarily generates a new driving intention, he or she can speak his intention to the microphone 510 in the vehicle in the form of speech.
步骤2.发送自然语音指令。 Step 2. Send natural voice commands.
麦克风510将收到的自然语音指令发送给ASR模块520。The microphone 510 sends the received natural voice instruction to the ASR module 520 .
步骤3.语音识别。Step 3. Voice recognition.
ASR模块520对收到的语音指令进行语音识别,识别出该语音指令对应的文本指令。The ASR module 520 performs voice recognition on the received voice command, and identifies the text command corresponding to the voice command.
步骤4.传输用户文本指令。Step 4. Transmit user text instructions.
ASR模块520将识别出的文本指令传输给多模态处理模块530。The ASR module 520 transmits the recognized textual instructions to the multimodal processing module 530 .
步骤5.发送拍摄激活信号。Step 5. Send a capture activation signal.
多模态处理模块530接收到文本指令之后,向摄像头530发送拍摄激活信号,用以激活摄像头530对周围的图像信息进行采集。After receiving the text instruction, the multimodal processing module 530 sends a shooting activation signal to the camera 530 to activate the camera 530 to collect surrounding image information.
步骤6.拍摄车辆周围图像信息。Step 6. Capture image information around the vehicle.
在摄像头530接收到拍摄激活信号后,对车辆周围的图像信息进行拍摄。After the camera 530 receives the shooting activation signal, it shoots image information around the vehicle.
步骤7.发送车辆周围图像信息。Step 7. Send image information around the vehicle.
摄像头530将所拍摄的车辆周围的图像信息发送给多模态处理模块540。The camera 530 sends the captured image information around the vehicle to the multimodal processing module 540 .
步骤8.基于文本指令和图像信息进行多模态理解。Step 8. Multimodal understanding based on textual instructions and image information.
多模态处理模块540基于文本指令和图像信息进行多模态理解,获得用户的行驶意图。The multimodal processing module 540 performs multimodal understanding based on the text instruction and image information, and obtains the user's driving intention.
应理解,关于行驶意图已在上文进行了详细介绍,此处不再赘述。另外,关于多模态处理模块540进行多模态理解的过程将在下文结合图8和图9进行介绍。It should be understood that the driving intention has been introduced in detail above, and will not be repeated here. In addition, the process of multi-modal understanding performed by the multi-modal processing module 540 will be described below in conjunction with FIG. 8 and FIG. 9 .
步骤9.发送行驶意图。Step 9. Send driving intent.
多模态处理模块540将步骤8中识别的行驶意图发送给决策规划计算模块550。The multimodal processing module 540 sends the driving intention identified in step 8 to the decision planning calculation module 550 .
步骤10.判断意图是否可行。Step 10. Determine if the intent is feasible.
由于用户的行驶意图可能不符合交通法规(例如,用户要求单行道逆行或者要求在不能停车的路口停车等);或者,用户的行驶意图在当前周围环境下可能无法实现;或者出现一些其他情况导致用户行驶意图可能无法实现。Because the user's driving intention may not comply with the traffic laws (for example, the user requires the opposite direction of the one-way street or requests to stop at the intersection where parking is not possible, etc.); or, the user's driving intention may not be realized in the current surrounding environment; or some other circumstances lead to The user's driving intent may not be realized.
因此,决策规划计算模块550需要根据行驶意图结合周围环境以及交通法规等必要信息,判断该行驶意图是否可行,并根据判断的结果生成提示信息,并通知给用户。例如:若判断结果不可行,则通过用户该行驶意图无法执行,同时可以告知用户无法执行的原因。若判断结果可行,则执行步骤11。Therefore, the decision planning calculation module 550 needs to judge whether the driving intention is feasible according to the driving intention in combination with necessary information such as the surrounding environment and traffic regulations, generate prompt information according to the judgment result, and notify the user. For example, if the judgment result is infeasible, the user's driving intention cannot be executed, and the user can be informed of the reason for the inability to execute. If the judgment result is feasible, step 11 is executed.
步骤11.根据行驶意图及周围环境、交通法规等信息,调整车辆行驶参数。Step 11. Adjust the driving parameters of the vehicle according to the driving intention, surrounding environment, traffic regulations and other information.
具体地,若在步骤10中判断结果可行,则决策规划计算模块550根据行驶意图以及周围环境、交通法规等必要信息,确定具体的车辆运动控制指令,并发送给整车运动控制模块560。整车运动控制模块560按照车辆运动控制指令进行具体的执行操作。Specifically, if the judgment result in step 10 is feasible, the decision planning calculation module 550 determines the specific vehicle motion control instruction according to the driving intention, surrounding environment, traffic regulations and other necessary information, and sends it to the vehicle motion control module 560 . The vehicle motion control module 560 performs specific execution operations according to the vehicle motion control instructions.
应理解,在完成行驶意图之后,可以根据实际情况,修改车辆运动的控制指令,使得车辆继续在自动驾驶模式下行驶到用户所要到达的最终目的地。It should be understood that, after the driving intention is completed, the control instruction of the vehicle motion may be modified according to the actual situation, so that the vehicle continues to drive in the automatic driving mode to the final destination to be reached by the user.
图7是本申请实施例提供的另一种具体实现方式的示例图。如图7所示,该具体实现方式包括步骤1至步骤10,下面对这些步骤进行详细描述。FIG. 7 is an example diagram of another specific implementation manner provided by an embodiment of the present application. As shown in FIG. 7 , the specific implementation includes steps 1 to 10, and these steps are described in detail below.
步骤1至步骤4.可参考上一实现方式中(图6中)的步骤1至步骤4,此处不再赘述。 Step 1 to Step 4. Reference may be made to Step 1 to Step 4 in the previous implementation manner (in FIG. 6 ), which will not be repeated here.
步骤5.周期性拍摄车辆周围图像信息。Step 5. Periodically capture image information around the vehicle.
摄像头530周期性地拍摄车辆周围的图像信息。The camera 530 periodically captures image information around the vehicle.
步骤6.发送车辆周围图像信息。Step 6. Send image information around the vehicle.
摄像头530周期性地将所拍摄的车辆周围的图像信息发送给多模态处理模块540。The camera 530 periodically sends the captured image information around the vehicle to the multimodal processing module 540 .
步骤7.基于文本指令和图像信息进行多模态理解。Step 7. Multimodal understanding based on textual instructions and image information.
多模态处理模块540基于文本指令和合适时间的图像信息的进行多模态理解,获得用户的行驶意图。The multi-modal processing module 540 obtains the user's driving intention based on multi-modal understanding of the text instruction and image information at an appropriate time.
其中,合适时间的图像信息可以是最新的图像信息,也可以是根据自然语言指令的识别时间估计特定的时间间隔所对应的图像信息。The image information at the appropriate time may be the latest image information, or may be image information corresponding to a specific time interval estimated according to the recognition time of the natural language instruction.
同样地,关于行驶意图已在上文进行了详细介绍,此处不再赘述。另外,关于多模态处理模块540进行多模态理解的过程将在下文结合图8和图9进行介绍。Likewise, the driving intention has been introduced in detail above, and will not be repeated here. In addition, the process of multi-modal understanding performed by the multi-modal processing module 540 will be described below in conjunction with FIG. 8 and FIG. 9 .
步骤8至步骤10.可参考上一实现方式中(图6中)的步骤9至步骤11,此处不再赘述。Step 8 to Step 10. Reference may be made to Step 9 to Step 11 in the previous implementation (in FIG. 6 ), which will not be repeated here.
图8是本申请实施例提供的一种多模态处理过程的示例图。FIG. 8 is an example diagram of a multimodal processing process provided by an embodiment of the present application.
如图8所示,多模模态处理主要是将用户指令以及环境信息输入至多模态处理模块中,通过多模态处理模块进行多模态理解,最终输出行驶意图。As shown in Figure 8, the multi-modal processing mainly inputs user instructions and environmental information into the multi-modal processing module, and the multi-modal understanding is carried out through the multi-modal processing module, and finally the driving intention is output.
应理解,该多模态处理模块是通过预先训练得到的。具体地,在训练过程中,可以将用户指令(如用户语音指令、用户文本指令或用户隔空手势指令)、环境信息(如图像信息)以及相应的行驶意图作为训练数据,对多模态处理模块进行训练,如图10所示。以使得在多模态处理模块的应用阶段,在输入用户指令和环境信息后,便可以输出相应的行驶意图。It should be understood that the multimodal processing module is obtained through pre-training. Specifically, in the training process, user instructions (such as user voice instructions, user text instructions or user air gesture instructions), environmental information (such as image information), and corresponding driving intentions can be used as training data to perform multimodal processing. The modules are trained as shown in Figure 10. So that in the application stage of the multimodal processing module, after inputting user instructions and environmental information, the corresponding driving intention can be output.
图9是本申请实施例提供的另一种多模态处理过程的示例图。在图9中,以文本指令作为用户指令,以图像信息作为环境信息。应理解,图9仅作为图8所示多模态处理模块的一种结构示例,不构成对本申请的限定。应理解,在实际中,多模态处理模块的结构也可以呈现其他形式,多模态处理模块的结构中也可以由其他处理模型、网络或模块组成,只要能够实现根据输入的文本指令和图像信息输出行驶意图即可。下面结合图9对该示例中的多模态处理过程进行介绍。FIG. 9 is an exemplary diagram of another multimodal processing process provided by an embodiment of the present application. In FIG. 9 , text instructions are used as user instructions, and image information is used as environmental information. It should be understood that FIG. 9 is only a structural example of the multimodal processing module shown in FIG. 8 , and does not constitute a limitation to the present application. It should be understood that, in practice, the structure of the multimodal processing module can also take other forms, and the structure of the multimodal processing module can also be composed of other processing models, networks or modules, as long as the input text instructions and images can be realized. It is enough to output the driving intention of the information. The multimodal processing process in this example will be described below with reference to FIG. 9 .
如图9所示,该多模态处理模块可以包括文本处理模型、卷积神经网络(convolutional neural network,CNN)、注意力模块att.1和注意力模块att.2。其中,文本处理模型可以是文本处理常用的BERT模型,也可以是其他可以用于进行文本处理的模型,本申请对此不作限定。CNN网络可以为深度残差网络(Deep residual network,ResNet)等,不作限定。As shown in Figure 9, the multimodal processing module may include a text processing model, a convolutional neural network (CNN), an attention module att.1 and an attention module att.2. The text processing model may be a BERT model commonly used in text processing, or may be other models that can be used for text processing, which is not limited in this application. The CNN network can be a deep residual network (Deep residual network, ResNet), etc., which is not limited.
在该示例中,多模态处理模块进行行驶意图理解的过程可以如下:In this example, the process of the multimodal processing module for understanding the driving intent can be as follows:
在多模态处理模块获取到文本指令和图像信息后,其中的文本指令通过BERT模型提取相应的文本特征;图像信息通过CNN网络(如:ResNet)提取相应的图像特征。After the multimodal processing module obtains the text instruction and image information, the text instruction extracts the corresponding text features through the BERT model; the image information extracts the corresponding image features through the CNN network (eg: ResNet).
注意力模块att.1用于使文本特征综合了图像特征,从而得到至少一个意图,以及至少一个意图中的每个意图对应的n个槽位,n大于或等于0,n为整数。其中,n个槽位中的每个槽位包括槽位名、槽位值以及槽位值的分类,其中该槽位值的分类为枚举类槽位值、文本类槽位值或图像类槽位值(可参见图4部分对行驶意图的描述)。The attention module att.1 is used to synthesize the text features with the image features, so as to obtain at least one intent and n slots corresponding to each intent in the at least one intent, where n is greater than or equal to 0, and n is an integer. Wherein, each of the n slots includes a slot name, a slot value and a classification of the slot value, wherein the classification of the slot value is an enumeration slot value, a text slot value or an image class Slot value (see the description of the driving intention in Figure 4).
若注意力模块att.1所得到意图对应的某个槽位的槽位值的分类为图像类槽位值,则此时通过注意力模块att.2使图像特征综合文本特征,从而得到该槽位的槽位值,即用户文本指令中所提及的物体的矩形框,例如,图11中蓝车对应的矩形框。If the slot value of a certain slot corresponding to the intent obtained by the attention module att.1 is classified as an image class slot value, then the image feature is integrated with the text feature through the attention module att.2, so as to obtain the slot value The slot value of the bit, that is, the rectangular frame of the object mentioned in the user text instruction, for example, the rectangular frame corresponding to the blue car in Figure 11.
综上,att.1和att.2所得到的信息即为行驶意图。To sum up, the information obtained by att.1 and att.2 is the driving intention.
图10是本申请实施例提供的一种多模态处理模块的训练方法的示例图。如图10所示, 训练方法1000包括步骤S1010和S1020,下面对这里步骤进行描述。FIG. 10 is an example diagram of a training method for a multimodal processing module provided by an embodiment of the present application. As shown in FIG. 10, the training method 1000 includes steps S1010 and S1020, and the steps are described below.
S1010,获取训练数据。S1010, acquiring training data.
训练数据包括训练输入数据和训练目标数据,训练输入数据包括用户指令和车辆周围的环境信息,训练目标数据包括训练输入数据对应的行驶意图。The training data includes training input data and training target data, the training input data includes user instructions and environmental information around the vehicle, and the training target data includes the driving intention corresponding to the training input data.
其中,行驶意图包括至少一个意图,至少一个意图中的每个意图包括n个槽位,n个槽位中的每个槽位包括槽位名、槽位值以及槽位值的分类,n大于或等于0,n为整数。The driving intent includes at least one intent, each intent in the at least one intent includes n slots, and each of the n slots includes a slot name, a slot value, and a classification of the slot value, where n is greater than or equal to 0, where n is an integer.
意图包括:停车、超车、减速、跟车、转向等中的至少一种。The intent includes at least one of: stop, overtake, slow down, follow, turn, and the like.
槽位名包括:停车的位置、速度值、超车或跟车的对象、转向方位等中的至少一种。The slot name includes at least one of: a parking position, a speed value, an overtaking or following object, a turning direction, and the like.
槽位值的分类为:枚举类槽位值、文本类槽位值或环境类槽位值。Slot values are classified as: enumeration type slot value, text type slot value or environment type slot value.
其中,枚举类槽位值表示槽位值是预先定义的枚举值,文本类槽位值表示槽位值是用户指令中的子串或根据用户指令生成的文本,环境类槽位值表示槽位值是根据用户指令中所提及的内容在环境信息中做的标识。Among them, the enumeration slot value indicates that the slot value is a predefined enumeration value, the text slot value indicates that the slot value is a substring in the user command or the text generated according to the user command, and the environment slot value indicates The slot value is identified in the environment information according to the content mentioned in the user instruction.
S1020,根据训练输入数据和训练目标数据训练多模态处理模块。S1020, train a multimodal processing module according to the training input data and the training target data.
图11是本申请实施例提供的一种应用场景的示例图。应理解,图11所示的应用场景仅作为一种示例,不构成对本申请的限定。下面结合图11对该应用场景进行介绍。FIG. 11 is an example diagram of an application scenario provided by an embodiment of the present application. It should be understood that the application scenario shown in FIG. 11 is only an example, and does not constitute a limitation to the present application. The application scenario is described below with reference to FIG. 11 .
如图11中的(a)所示,自动驾驶车辆的用户在车辆根据预先设定的目的地在自动驾驶模式下行驶时,临时产生了新的行驶意图,并通过语音向车辆(例如,车辆上的麦克风)下达自然语音指令,如“行驶到蓝车位置,靠边停车”。随后,车辆上相关的车载装置如ASR模块识别自然语言指令,并转换为文本指令。接着,车辆上的控制车辆行驶的装置或相关模块通过上述方法400判断出用户的临时意图(即用户需要在前方蓝色车子的路旁停车),然后该装置或相关模块根据车辆的临时行驶意图生成合适的车辆控制指令,下发给车辆。另外,车辆上还可以通过语音播报和/或增强现实抬头显示(augmented reality-head up display,AR-HUD)的方式给用户反馈。如图11中的(b)所示,车辆可以通过语音播报的方式提示用户,如“正在为您执行停车”;还可以通过AR-HUD的方式将车辆即将行驶的目标路径和目标位置展示给用户。As shown in (a) of FIG. 11 , the user of the autonomous driving vehicle temporarily generates a new driving intention when the vehicle is driving in the autonomous driving mode according to a preset destination, and expresses a voice to the vehicle (for example, the vehicle the microphone on the top) to issue natural voice commands, such as "drive to the blue car position and pull over". Subsequently, the relevant on-board devices on the vehicle, such as the ASR module, recognize the natural language commands and convert them into text commands. Next, the device or related module on the vehicle for controlling the driving of the vehicle determines the temporary intention of the user (that is, the user needs to park on the roadside of the blue car in front) through the above method 400, and then the device or related module determines the temporary driving intention of the vehicle according to the temporary driving intention of the vehicle. Generate appropriate vehicle control commands and issue them to the vehicle. In addition, the vehicle can also provide user feedback through voice announcements and/or augmented reality-head up display (AR-HUD). As shown in (b) of Figure 11, the vehicle can prompt the user through voice broadcast, such as "stopping for you"; it can also display the target path and target location of the vehicle to be driven by AR-HUD. user.
应理解,该应用场景也可以理解为一种用户显示界面,该界面能够向用户呈现行驶意图,如图11中的(a)所示的矩形框,也能够向用户呈现出即将行驶的路径和行驶的目标位置,如图11中的(b)所示的箭头和方框。It should be understood that this application scenario can also be understood as a user display interface, which can present the driving intention to the user, such as the rectangular frame shown in (a) in FIG. The target position for travel is shown as arrows and boxes as shown in (b) of FIG. 11 .
图12是本申请实施例提供的一种控制车辆行驶的装置的示例图。如图12所示,该装置1200包括获取单元1210和处理单元1220。FIG. 12 is an example diagram of a device for controlling the driving of a vehicle provided by an embodiment of the present application. As shown in FIG. 12 , the apparatus 1200 includes an acquisition unit 1210 and a processing unit 1220 .
在车辆的自动驾驶模式下,获取单元1210用于,获取用户指令。In the automatic driving mode of the vehicle, the obtaining unit 1210 is configured to obtain user instructions.
获取单元1210还用于,获取车辆周围的环境信息。The acquiring unit 1210 is further configured to acquire environmental information around the vehicle.
处理单元1220用于,对用户指令和车辆周围的环境信息进行多模态理解,确定用户的行驶意图。The processing unit 1220 is configured to perform multimodal understanding on user instructions and environmental information around the vehicle to determine the user's driving intention.
处理单元1220还用于,根据用户的行驶意图,生成对车辆的自动驾驶控制指令。The processing unit 1220 is further configured to generate an automatic driving control instruction for the vehicle according to the user's driving intention.
可选地,行驶意图可以包括至少一个意图,至少一个意图中的每个意图包括n个槽位,n个槽位中的每个槽位包括槽位名、槽位值以及所述位值的分类,n大于或等于0,n为整数。Optionally, the driving intent may include at least one intent, each intent in the at least one intent includes n slots, and each of the n slots includes a slot name, a slot value, and a value of the slot value. Classification, n is greater than or equal to 0, and n is an integer.
可选地,意图可以包括:停车、超车、减速、跟车、转向等中的至少一种。Optionally, the intent may include at least one of: stop, overtake, slow down, follow, turn, and the like.
可选地,槽位名可以包括:停车的位置、速度值、超车或跟车的对象、转向方位等中的至少一种。Optionally, the slot name may include at least one of: a parking position, a speed value, an overtaking or following object, a turning direction, and the like.
可选地,槽位值的分类可以为:枚举类槽位值、文本类槽位值或环境类槽位值,其中,枚举类槽位值表示槽位值是预先定义的枚举值,文本类槽位值表示槽位值是用户指令中的子串或根据用户指令生成的文本,环境类槽位值表示槽位值是根据用户指令中所提及的内容在环境信息中做的标识。Optionally, the classification of the slot value may be: an enumeration class slot value, a text class slot value or an environment class slot value, wherein the enumeration class slot value indicates that the slot value is a predefined enumeration value. , the text slot value indicates that the slot value is a substring in the user command or the text generated according to the user command, and the environment slot value indicates that the slot value is made in the environment information according to the content mentioned in the user command logo.
可选地,处理单元1220还可以用于:根据行驶意图、周围环境和交通法规,判断行驶意图是否可行;若行驶意图可行,生成对车辆的自动驾驶控制指令。Optionally, the processing unit 1220 may also be used to: determine whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations; if the driving intention is feasible, generate an automatic driving control instruction for the vehicle.
可选地,该用户指令包括:用户语音指令、用户文本指令、用户隔空手势指令中的任意一项或多项。Optionally, the user instruction includes any one or more of a user voice instruction, a user text instruction, and a user air gesture instruction.
可选地,装置1200还可以包括:发送单元1230,发送单元1230可以用于,向拍摄装置发送拍摄激活信号,以激活拍摄装置对车辆周围的环境信息进行拍摄;Optionally, the apparatus 1200 may further include: a sending unit 1230, the sending unit 1230 may be configured to send a photographing activation signal to the photographing apparatus, so as to activate the photographing apparatus to photograph the environmental information around the vehicle;
获取单元1210还可以用于:获取拍摄装置根据拍摄激活信号拍摄的车辆周围的环境信息。The acquiring unit 1210 may also be configured to: acquire the environmental information around the vehicle photographed by the photographing device according to the photographing activation signal.
可选地,获取单元1210还可以用于:获取拍摄装置周期性地拍摄的车辆周围的环境信息。Optionally, the acquiring unit 1210 may be further configured to: acquire environmental information around the vehicle periodically photographed by the photographing device.
可选地,用户的行驶意图可以通过增强现实-抬头显示AR-HUD或中控屏呈现给用户。Optionally, the user's driving intention can be presented to the user through an augmented reality-head-up display AR-HUD or a central control screen.
图13是本申请实施例提供的一种多模态处理模块的训练装置。如图13所示,该装置1300包括获取单元1310和处理单元1320。FIG. 13 is a training device for a multimodal processing module provided by an embodiment of the present application. As shown in FIG. 13 , the apparatus 1300 includes an acquisition unit 1310 and a processing unit 1320 .
其中,获取单元1310用于,获取训练数据,训练数据包括训练输入数据和训练目标数据,训练输入数据包括用户指令和车辆周围的环境信息,训练目标数据包括训练输入数据对应的行驶意图。The obtaining unit 1310 is configured to obtain training data, the training data includes training input data and training target data, the training input data includes user instructions and environmental information around the vehicle, and the training target data includes the driving intention corresponding to the training input data.
处理单元1320用于,根据训练输入数据和训练目标数据训练多模态处理模块。The processing unit 1320 is configured to train the multimodal processing module according to the training input data and the training target data.
可选地,行驶意图可以包括至少一个意图,至少一个意图中的每个意图包括n个槽位,n个槽位中的每个槽位包括槽位名、槽位值以及槽位值的分类,n大于或等于0,n为整数。Optionally, the driving intention may include at least one intention, each intention in the at least one intention includes n slots, and each slot in the n slots includes a slot name, a slot value, and a classification of the slot value. , n is greater than or equal to 0, and n is an integer.
可选地,意图可以包括:停车、超车、减速、跟车、转向等中的至少一种。Optionally, the intent may include at least one of: stop, overtake, slow down, follow, turn, and the like.
可选地,槽位名可以包括:停车的位置、速度值、超车或跟车的对象、转向方位等中的至少一种。Optionally, the slot name may include at least one of: a parking position, a speed value, an overtaking or following object, a turning direction, and the like.
可选地,槽位值可以的分类为:枚举类槽位值、文本类槽位值或环境类槽位值,其中,枚举类槽位值表示槽位值是预先定义的枚举值,文本类槽位值表示槽位值是用户指令中的子串或根据用户指令生成的文本,环境类槽位值表示槽位值是根据用户指令中所提及的内容在环境信息中做的标识。Optionally, the slot value can be classified as: an enumeration slot value, a text slot value or an environment slot value, wherein the enumeration slot value indicates that the slot value is a predefined enumeration value. , the text slot value indicates that the slot value is a substring in the user command or the text generated according to the user command, and the environment slot value indicates that the slot value is made in the environment information according to the content mentioned in the user command logo.
图14为本申请实施例提供的一种装置的结构示例图。装置1400包括处理器1402、通信接口1403和存储器1404。FIG. 14 is a schematic structural diagram of an apparatus provided by an embodiment of the present application. The apparatus 1400 includes a processor 1402 , a communication interface 1403 and a memory 1404 .
可选地,装置1400的一种示例可以为芯片。装置1400的另一种示例可以为计算设备。Alternatively, one example of the apparatus 1400 may be a chip. Another example of apparatus 1400 may be a computing device.
处理器1402、存储器1404和通信接口1403之间可以通过总线通信。存储器1404中存储有可执行代码,处理器1402读取存储器1404中的可执行代码以执行对应的方法。存储器1404中还可以包括操作系统等其他运行进程所需的软件模块。操作系统可以为LINUX
TM,UNIX
TM,WINDOWS
TM等。
The processor 1402, the memory 1404 and the communication interface 1403 can communicate through a bus. Executable code is stored in the memory 1404, and the processor 1402 reads the executable code in the memory 1404 to execute the corresponding method. The memory 1404 may also include other software modules required for running processes such as an operating system. The operating system can be LINUX ™ , UNIX ™ , WINDOWS ™ and the like.
例如,存储器1404中的可执行代码用于实现图4或图10所示的方法,处理器1402读取存储器1404中的该可执行代码以执行图4或图10所示的方法。For example, the executable code in the memory 1404 is used to implement the method shown in FIG. 4 or FIG. 10 , and the processor 1402 reads the executable code in the memory 1404 to execute the method shown in FIG. 4 or FIG. 10 .
其中,处理器1402可以为CPU。存储器1404可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器1404还可以包括非易失性存储器(2non-volatile memory,2NVM),例如只读存储器(2read-only memory,2ROM),快闪存储器,硬盘驱动器(hard disk drive,HDD)或固态启动器(solid state disk,SSD)。The processor 1402 may be a CPU. Memory 1404 may include volatile memory, such as random access memory (RAM). The memory 1404 may also include non-volatile memory (2non-volatile memory, 2NVM), such as 2read-only memory (2ROM), flash memory, hard disk drive (HDD) or solid state drive ( solid state disk, SSD).
在本申请的一些实施例中,所公开的方法可以实施为以机器可读格式被编码在计算机可读存储介质上的或者被编码在其它非瞬时性介质或者制品上的计算机程序指令。图15示意性地示出根据这里展示的至少一些实施例而布置的示例计算机程序产品的概念性局部视图,所述示例计算机程序产品包括用于在计算设备上执行计算机进程的计算机程序。在一个实施例中,示例计算机程序产品1500是使用信号承载介质1501来提供的。所述信号承载介质1501可以包括一个或多个程序指令1502,其当被一个或多个处理器运行时可以提供以上针对图4或图10所示的方法中描述的功能或者部分功能。因此,例如,参考图4中所示的实施例,S410至S440的一个或多个特征可以由与信号承载介质1501相关联的一个或多个指令来承担。In some embodiments of the present application, the disclosed methods may be implemented as computer program instructions encoded on a computer-readable storage medium in a machine-readable format or on other non-transitory media or articles of manufacture. 15 schematically illustrates a conceptual partial view of an example computer program product including a computer program for executing a computer process on a computing device, arranged in accordance with at least some embodiments presented herein. In one embodiment, example computer program product 1500 is provided using signal bearing medium 1501 . The signal bearing medium 1501 may include one or more program instructions 1502 that, when executed by one or more processors, may provide the functions or portions of the functions described above with respect to the methods shown in FIG. 4 or FIG. 10 . Thus, for example, with reference to the embodiment shown in FIG. 4 , one or more of the features of S410 to S440 may be undertaken by one or more instructions associated with the signal bearing medium 1501 .
在一些示例中,信号承载介质1501可以包含计算机可读介质1503,诸如但不限于,硬盘驱动器、紧密盘(CD)、数字视频光盘(DVD)、数字磁带、存储器、只读存储记忆体(read-only memory,ROM)或随机存储记忆体(random access memory,RAM)等等。在一些实施方式中,信号承载介质1501可以包含计算机可记录介质1504,诸如但不限于,存储器、读/写(R/W)CD、R/W DVD、等等。在一些实施方式中,信号承载介质1501可以包含通信介质1505,诸如但不限于,数字和/或模拟通信介质(例如,光纤电缆、波导、有线通信链路、无线通信链路、等等)。因此,例如,信号承载介质1501可以由无线形式的通信介质1505(例如,遵守IEEE 802.11标准或者其它传输协议的无线通信介质)来传达。一个或多个程序指令1502可以是,例如,计算机可执行指令或者逻辑实施指令。在一些示例中,前述的计算设备可以被配置为,响应于通过计算机可读介质1503、计算机可记录介质1504、和/或通信介质1505中的一个或多个传达到计算设备的程序指令1502,提供各种操作、功能、或者动作。应该理解,这里描述的布置仅仅是用于示例的目的。因而,本领域技术人员将理解,其它布置和其它元素(例如,机器、接口、功能、顺序、和功能组等等)能够被取而代之地使用,并且一些元素可以根据所期望的结果而一并省略。另外,所描述的元素中的许多是可以被实现为离散的或者分布式的组件的、或者以任何适当的组合和位置来结合其它组件实施的功能实体。In some examples, the signal bearing medium 1501 may include a computer readable medium 1503 such as, but not limited to, a hard drive, a compact disc (CD), a digital video disc (DVD), a digital tape, a memory, a read only memory (read only memory) -only memory, ROM) or random access memory (RAM), etc. In some implementations, the signal bearing medium 1501 may include a computer recordable medium 1504 such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, and the like. In some embodiments, signal bearing medium 1501 may include communication medium 1505, such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.). Thus, for example, the signal bearing medium 1501 may be conveyed by a wireless form of communication medium 1505 (eg, a wireless communication medium conforming to the IEEE 802.11 standard or other transmission protocol). The one or more program instructions 1502 may be, for example, computer-executable instructions or logic-implemented instructions. In some examples, the aforementioned computing device may be configured to, in response to program instructions 1502 communicated to the computing device via one or more of computer readable media 1503 , computer recordable media 1504 , and/or communication media 1505 , Provides various operations, functions, or actions. It should be understood that the arrangements described herein are for illustrative purposes only. Thus, those skilled in the art will understand that other arrangements and other elements (eg, machines, interfaces, functions, sequences, and groups of functions, etc.) can be used instead and that some elements may be omitted altogether depending on the desired results . Additionally, many of the described elements are functional entities that may be implemented as discrete or distributed components, or in conjunction with other components in any suitable combination and position.
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其它系统交互的互联网)的信号通过本地和/ 或远程进程来通信。The terms "component", "module", "system" and the like are used in this specification to refer to a computer-related entity, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be components. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between 2 or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. A component may, for example, be based on a signal having one or more data packets (eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals) Communicate through local and/or remote processes.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.
Claims (36)
- 一种控制车辆行驶的方法,其特征在于,包括:A method for controlling the driving of a vehicle, comprising:在所述车辆的自动驾驶模式下,获取用户指令;In the automatic driving mode of the vehicle, obtain a user instruction;获取所述车辆周围的环境信息;obtain environmental information around the vehicle;对所述用户指令和所述车辆周围的环境信息进行多模态理解,确定所述用户的行驶意图;Perform multimodal understanding on the user instruction and the environmental information around the vehicle to determine the user's driving intention;根据所述用户的行驶意图,生成对所述车辆的自动驾驶控制指令。According to the driving intention of the user, an automatic driving control instruction for the vehicle is generated.
- 如权利要求1所述的方法,其特征在于,所述行驶意图包括至少一个意图,所述至少一个意图中的每个意图包括n个槽位,所述n个槽位中的每个槽位包括槽位名、槽位值以及所述槽位值的分类,n大于或等于0,n为整数。The method of claim 1, wherein the driving intent includes at least one intent, each intent in the at least one intent includes n slots, each of the n slots Including slot name, slot value and classification of the slot value, n is greater than or equal to 0, n is an integer.
- 如权利要求2所述的方法,其特征在于,所述意图包括:停车、超车、减速、跟车、转向中的至少一种。The method of claim 2, wherein the intention includes at least one of: stopping, overtaking, decelerating, following, and turning.
- 如权利要求2或3所述的方法,其特征在于,所述槽位名包括:停车的位置、速度值、超车或跟车的对象、转向方位中的至少一种。The method according to claim 2 or 3, wherein the slot name includes at least one of: a parking position, a speed value, an object of overtaking or following, and a turning direction.
- 如权利要求2至4中任一项所述的方法,其特征在于,所述槽位值的分类为:枚举类槽位值、文本类槽位值或环境类槽位值,The method according to any one of claims 2 to 4, wherein the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value,其中,所述枚举类槽位值表示槽位值是预先定义的枚举值,所述文本类槽位值表示槽位值是所述用户指令中的子串或根据所述用户指令生成的文本,所述环境类槽位值表示槽位值是根据所述用户指令中所提及的内容在所述环境信息中做的标识。The enumeration slot value indicates that the slot value is a predefined enumeration value, and the text slot value indicates that the slot value is a substring in the user instruction or generated according to the user instruction Text, the environment class slot value indicates that the slot value is an identification made in the environment information according to the content mentioned in the user instruction.
- 如权利要求1至5中任一项所述的方法,其特征在于,所述根据所述用户的行驶意图,生成对所述车辆的自动驾驶控制指令包括:The method according to any one of claims 1 to 5, wherein the generating an automatic driving control instruction for the vehicle according to the driving intention of the user comprises:根据所述行驶意图、周围环境和交通法规,判断所述行驶意图是否可行;Judging whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations;若所述行驶意图可行,生成对所述车辆的自动驾驶控制指令。If the driving intention is feasible, an automatic driving control command for the vehicle is generated.
- 如权利要求1至6中任一项所述的方法,其特征在于,所述用户指令包括:用户语音指令、用户文本指令、用户隔空手势指令中的任意一项或多项。The method according to any one of claims 1 to 6, wherein the user instruction comprises: any one or more of a user voice instruction, a user text instruction, and a user air gesture instruction.
- 如权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 7, wherein the method further comprises:向拍摄装置发送拍摄激活信号,以激活所述拍摄装置对所述车辆周围的环境信息进行拍摄;sending a photographing activation signal to a photographing device to activate the photographing device to photograph environmental information around the vehicle;所述获取所述车辆周围的环境信息包括:The acquiring environmental information around the vehicle includes:获取所述拍摄装置根据所述拍摄激活信号拍摄的所述车辆周围的环境信息。Obtain the environmental information around the vehicle photographed by the photographing device according to the photographing activation signal.
- 如权利要求1至7中任一项所述的方法,其特征在于,所述获取所述车辆周围的环境信息包括:The method according to any one of claims 1 to 7, wherein the acquiring environmental information around the vehicle comprises:获取拍摄装置周期性地拍摄的所述车辆周围的环境信息。Obtain environmental information around the vehicle periodically photographed by the photographing device.
- 如权利要求1至9中任一项所述的方法,其特征在于,所述用户的行驶意图通过增强现实-抬头显示AR-HUD或中控屏呈现给用户。The method according to any one of claims 1 to 9, wherein the user's driving intention is presented to the user through an augmented reality-head-up display (AR-HUD) or a central control screen.
- 一种控制车辆行驶的装置,其特征在于,包括获取单元和处理单元,在所述车辆的自动驾驶模式下,A device for controlling the driving of a vehicle, characterized in that it comprises an acquisition unit and a processing unit, and in the automatic driving mode of the vehicle,所述获取单元用于,获取用户指令;The obtaining unit is used to obtain user instructions;所述获取单元还用于,获取所述车辆周围的环境信息;The acquiring unit is further configured to acquire environmental information around the vehicle;所述处理单元用于,对所述用户指令和所述车辆周围的环境信息进行多模态理解,确定所述用户的行驶意图;The processing unit is configured to perform multimodal understanding on the user instruction and environmental information around the vehicle, and determine the user's driving intention;所述处理单元还用于,根据所述用户的行驶意图,生成对所述车辆的自动驾驶控制指令。The processing unit is further configured to generate an automatic driving control instruction for the vehicle according to the driving intention of the user.
- 如权利要求11所述的装置,其特征在于,所述行驶意图包括至少一个意图,所述至少一个意图中的每个意图包括n个槽位,所述n个槽位中的每个槽位包括槽位名、槽位值以及所述槽位值的分类,n大于或等于0,n为整数。12. The apparatus of claim 11, wherein the driving intent includes at least one intent, each intent of the at least one intent includes n slots, each of the n slots Including slot name, slot value and classification of the slot value, n is greater than or equal to 0, n is an integer.
- 如权利要求12所述的装置,其特征在于,所述意图包括:停车、超车、减速、跟车、转向中的至少一种。13. The apparatus of claim 12, wherein the intent includes at least one of: stop, overtake, slow down, follow, and turn.
- 如权利要求12或13所述的装置,其特征在于,所述槽位名包括:停车的位置、速度值、超车或跟车的对象、转向方位中的至少一种。The device according to claim 12 or 13, wherein the slot name includes at least one of: a parking position, a speed value, an object of overtaking or following, and a turning direction.
- 如权利要求12至14中任一项所述的装置,其特征在于,所述槽位值的分类为:枚举类槽位值、文本类槽位值或环境类槽位值,The device according to any one of claims 12 to 14, wherein the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value,其中,所述枚举类槽位值表示槽位值是预先定义的枚举值,所述文本类槽位值表示槽位值是所述用户指令中的子串或根据所述用户指令生成的文本,所述环境类槽位值表示槽位值是根据所述文本用户指令中所提及的内容在所述环境信息中做的标识。The enumeration slot value indicates that the slot value is a predefined enumeration value, and the text slot value indicates that the slot value is a substring in the user instruction or generated according to the user instruction Text, the environment class slot value indicates that the slot value is an identification made in the environment information according to the content mentioned in the text user instruction.
- 如权利要求11至15中任一项所述的装置,其特征在于,所述处理单元还用于:The apparatus according to any one of claims 11 to 15, wherein the processing unit is further configured to:根据所述行驶意图、周围环境和交通法规,判断所述行驶意图是否可行;Judging whether the driving intention is feasible according to the driving intention, the surrounding environment and traffic regulations;若所述行驶意图可行,生成对所述车辆的自动驾驶控制指令。If the driving intention is feasible, an automatic driving control command for the vehicle is generated.
- 如权利要求11至16中任一项所述的装置,其特征在于,所述用户指令包括:用户语音指令、用户文本指令、用户隔空手势指令中的任意一项或多项。The device according to any one of claims 11 to 16, wherein the user instruction comprises: any one or more of a user voice instruction, a user text instruction, and a user air gesture instruction.
- 如权利要求11至17中任一项所述的装置,其特征在于,所述装置还包括:发送单元,所述发送单元用于,The apparatus according to any one of claims 11 to 17, wherein the apparatus further comprises: a sending unit, the sending unit is configured to:向拍摄装置发送拍摄激活信号,以激活所述拍摄装置对所述车辆周围的环境信息进行拍摄;sending a photographing activation signal to a photographing device to activate the photographing device to photograph environmental information around the vehicle;所述获取单元还用于:The acquisition unit is also used for:获取所述拍摄装置根据所述拍摄激活信号拍摄的所述车辆周围的环境信息。Obtain the environmental information around the vehicle photographed by the photographing device according to the photographing activation signal.
- 如权利要求11至17中任一项所述的装置,其特征在于,所述获取单元还用于:The device according to any one of claims 11 to 17, wherein the acquiring unit is further configured to:获取拍摄装置周期性地拍摄的所述车辆周围的环境信息。Obtain environmental information around the vehicle periodically photographed by the photographing device.
- 如权利要求11至19中任一项所述的装置,其特征在于,所述用户的行驶意图通过增强现实-抬头显示AR-HUD或中控屏呈现给用户。The device according to any one of claims 11 to 19, wherein the user's driving intention is presented to the user through an augmented reality-head-up display AR-HUD or a central control screen.
- 一种多模态处理模块的训练方法,其特征在于,包括:A training method for a multimodal processing module, comprising:获取训练数据,所述训练数据包括训练输入数据和训练目标数据,所述训练输入数据包括用户指令和车辆周围的环境信息,所述训练目标数据包括所述训练输入数据对应的行驶意图;acquiring training data, the training data includes training input data and training target data, the training input data includes user instructions and environmental information around the vehicle, and the training target data includes the driving intention corresponding to the training input data;根据所述训练输入数据和所述训练目标数据训练所述多模态处理模块。The multimodal processing module is trained according to the training input data and the training target data.
- 如权利要求21所述的方法,其特征在于,所述行驶意图包括至少一个意图,所 述至少一个意图中的每个意图包括n个槽位,所述n个槽位中的每个槽位包括槽位名、槽位值以及所述槽位值的分类,n大于或等于0,n为整数。21. The method of claim 21, wherein the driving intent includes at least one intent, each intent of the at least one intent includes n slots, each of the n slots Including slot name, slot value and classification of the slot value, n is greater than or equal to 0, n is an integer.
- 如权利要求22所述的方法,其特征在于,所述意图包括:停车、超车、减速、跟车、转向中的至少一种。23. The method of claim 22, wherein the intent includes at least one of: stop, overtake, slow down, follow, and turn.
- 如权利要求22或23所述的方法,其特征在于,所述槽位名包括:停车的位置、速度值、超车或跟车的对象、转向方位中的至少一种。The method according to claim 22 or 23, wherein the slot name includes at least one of: a parking position, a speed value, an object of overtaking or following, and a turning direction.
- 如权利要求22至24中任一项所述的方法,其特征在于,所述槽位值的分类为:枚举类槽位值、文本类槽位值或环境类槽位值,The method according to any one of claims 22 to 24, wherein the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value,其中,所述枚举类槽位值表示槽位值是预先定义的枚举值,所述文本类槽位值表示槽位值是所述用户指令中的子串或根据所述用户指令生成的文本,所述环境类槽位值表示槽位值是根据所述用户指令中所提及的内容在所述环境信息中做的标识。The enumeration slot value indicates that the slot value is a predefined enumeration value, and the text slot value indicates that the slot value is a substring in the user instruction or generated according to the user instruction Text, the environment class slot value indicates that the slot value is an identification made in the environment information according to the content mentioned in the user instruction.
- 一种多模态处理模块的训练装置,其特征在于,包括获取单元和处理单元,A training device for a multimodal processing module, comprising an acquisition unit and a processing unit,所述获取单元用于,获取训练数据,所述训练数据包括训练输入数据和训练目标数据,所述训练输入数据包括用户指令和车辆周围的环境信息,所述训练目标数据包括所述训练输入数据对应的行驶意图;The acquiring unit is configured to acquire training data, where the training data includes training input data and training target data, the training input data includes user instructions and environmental information around the vehicle, and the training target data includes the training input data the corresponding driving intention;所述处理单元用于,根据所述训练输入数据和所述训练目标数据训练所述多模态处理模块。The processing unit is configured to train the multimodal processing module according to the training input data and the training target data.
- 如权利要求26所述的装置,其特征在于,所述行驶意图包括至少一个意图,所述至少一个意图中的每个意图包括n个槽位,所述n个槽位中的每个槽位包括槽位名、槽位值以及所述槽位值的分类,n大于或等于0,n为整数。27. The apparatus of claim 26, wherein the driving intent includes at least one intent, each intent of the at least one intent includes n slots, each of the n slots Including slot name, slot value and classification of the slot value, n is greater than or equal to 0, n is an integer.
- 如权利要求27所述的装置,其特征在于,所述意图包括:停车、超车、减速、跟车、转向中的至少一种。28. The apparatus of claim 27, wherein the intent includes at least one of: stop, overtake, slow down, follow, and turn.
- 如权利要求27或28所述的装置,其特征在于,所述槽位名包括:停车的位置、速度值、超车或跟车的对象、转向方位中的至少一种。The device according to claim 27 or 28, wherein the slot name includes at least one of: a parking position, a speed value, an overtaking or following object, and a turning direction.
- 如权利要求27至29中任一项所述的装置,其特征在于,所述槽位值的分类为:枚举类槽位值、文本类槽位值或环境类槽位值,The device according to any one of claims 27 to 29, wherein the classification of the slot value is: an enumeration type slot value, a text type slot value or an environment type slot value,其中,所述枚举类槽位值表示槽位值是预先定义的枚举值,所述文本类槽位值表示槽位值是所述用户指令中的子串或根据所述用户指令生成的文本,所述环境类槽位值表示槽位值是根据所述文本用户指令中所提及的内容在所述环境信息中做的标识。The enumeration slot value indicates that the slot value is a predefined enumeration value, and the text slot value indicates that the slot value is a substring in the user instruction or generated according to the user instruction Text, the environment class slot value indicates that the slot value is an identification made in the environment information according to the content mentioned in the text user instruction.
- 一种多模态处理模块的处理方法,其特征在于,所述多模态处理模块根据权利要求21至25中任一项所述的训练方法训练得到;所述处理方法包括:A processing method for a multimodal processing module, wherein the multimodal processing module is obtained by training according to the training method described in any one of claims 21 to 25; the processing method comprises:所述多模态处理模块获取输入数据,所述输入数据包括用户指令以及所述车辆周围的环境信息;The multimodal processing module obtains input data, the input data includes user instructions and environmental information around the vehicle;所述多模态处理模块根据所述输入数据输出行驶意图。The multimodal processing module outputs a driving intention according to the input data.
- 一种多模态处理模块,其特征在于,所述多模态处理模块根据权利要求21至25中任一项所述的训练方法训练得到;所述多模态处理模块包括:A multimodal processing module, characterized in that, the multimodal processing module is obtained by training according to the training method described in any one of claims 21 to 25; the multimodal processing module comprises:获取单元,用于获取输入数据,所述输入数据包括用户指令以及所述车辆周围的环境信息;an acquisition unit, configured to acquire input data, where the input data includes user instructions and environmental information around the vehicle;处理单元,用于根据所述输入数据输出行驶意图。The processing unit is configured to output the driving intention according to the input data.
- 一种控制车辆行驶的装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行权利要求1至10中任一项所述的控制车辆行驶的方法。A device for controlling the driving of a vehicle, characterized in that it comprises a processor and a memory, the memory is used for storing program instructions, and the processor is used for calling the program instructions to execute any one of claims 1 to 10. method of controlling the driving of a vehicle.
- 一种多模态处理模块的训练装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行权利要求21至25中任一项所述的多模态处理模块的训练方法。A training device for a multimodal processing module, characterized in that it comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions to execute any one of claims 21 to 25 The training method of the multimodal processing module described in item.
- 一种自动驾驶车辆,其特征在于,包括权利要求11至20中任一项所述的控制车辆行驶的装置。An automatic driving vehicle is characterized by comprising the device for controlling the driving of the vehicle according to any one of claims 11 to 20.
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序指令,当所述程序指令由处理器运行时,实现权利要求1至10中任一项所述的控制车辆行驶的方法;和/或,实现权利要求21至25中任一项所述的多模态处理模块的训练方法。A computer-readable storage medium, wherein program instructions are stored in the computer-readable storage medium, and when the program instructions are executed by a processor, the control described in any one of claims 1 to 10 is implemented A method for driving a vehicle; and/or, a training method for implementing the multimodal processing module according to any one of claims 21 to 25.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202180001475.0A CN113226886A (en) | 2021-03-31 | 2021-03-31 | Method and device for controlling vehicle to run and vehicle |
PCT/CN2021/084731 WO2022205211A1 (en) | 2021-03-31 | 2021-03-31 | Method and apparatus for controlling vehicle running and vehicle |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/084731 WO2022205211A1 (en) | 2021-03-31 | 2021-03-31 | Method and apparatus for controlling vehicle running and vehicle |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022205211A1 true WO2022205211A1 (en) | 2022-10-06 |
Family
ID=77081297
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/084731 WO2022205211A1 (en) | 2021-03-31 | 2021-03-31 | Method and apparatus for controlling vehicle running and vehicle |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113226886A (en) |
WO (1) | WO2022205211A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024193983A1 (en) * | 2023-03-17 | 2024-09-26 | Jaguar Land Rover Limited | Method and apparatus for selecting a driving mode of a vehicle |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113460092A (en) * | 2021-09-01 | 2021-10-01 | 国汽智控(北京)科技有限公司 | Method, device, equipment, storage medium and product for controlling vehicle |
CN114043987B (en) * | 2021-10-13 | 2024-07-09 | 北京集度科技有限公司 | Instruction processing method, device, terminal and storage medium |
CN114171025A (en) * | 2021-12-09 | 2022-03-11 | 阿维塔科技(重庆)有限公司 | Automatic driving method, device, electronic equipment and computer readable storage medium |
CN114283601A (en) * | 2021-12-23 | 2022-04-05 | 深圳创维-Rgb电子有限公司 | Vehicle driving method, system, television and storage medium |
CN114475632B (en) * | 2022-03-11 | 2022-11-01 | 阿波罗智能技术(北京)有限公司 | Automatic driving control data determination method, device, equipment and storage medium |
CN115457959B (en) * | 2022-11-08 | 2023-02-10 | 广州小鹏汽车科技有限公司 | Voice interaction method, server and computer readable storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140365228A1 (en) * | 2013-03-15 | 2014-12-11 | Honda Motor Co., Ltd. | Interpretation of ambiguous vehicle instructions |
US20150062168A1 (en) * | 2013-03-15 | 2015-03-05 | Honda Motor Co., Ltd. | System and method for providing augmented reality based directions based on verbal and gestural cues |
CN109426256A (en) * | 2017-09-05 | 2019-03-05 | 百度(美国)有限责任公司 | The lane auxiliary system based on driver intention of automatic driving vehicle |
CN110023178A (en) * | 2016-12-12 | 2019-07-16 | 苹果公司 | The autonomous vehicle near destination is instructed using signal of intent |
CN111008532A (en) * | 2019-12-12 | 2020-04-14 | 广州小鹏汽车科技有限公司 | Voice interaction method, vehicle and computer-readable storage medium |
CN111026873A (en) * | 2019-10-24 | 2020-04-17 | 中国人民解放军军事科学院国防科技创新研究院 | Unmanned vehicle and navigation method and device thereof |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190163331A1 (en) * | 2017-11-28 | 2019-05-30 | International Business Machines Corporation | Multi-Modal Dialog Broker |
US11455982B2 (en) * | 2019-01-07 | 2022-09-27 | Cerence Operating Company | Contextual utterance resolution in multimodal systems |
-
2021
- 2021-03-31 CN CN202180001475.0A patent/CN113226886A/en active Pending
- 2021-03-31 WO PCT/CN2021/084731 patent/WO2022205211A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140365228A1 (en) * | 2013-03-15 | 2014-12-11 | Honda Motor Co., Ltd. | Interpretation of ambiguous vehicle instructions |
US20150062168A1 (en) * | 2013-03-15 | 2015-03-05 | Honda Motor Co., Ltd. | System and method for providing augmented reality based directions based on verbal and gestural cues |
CN110023178A (en) * | 2016-12-12 | 2019-07-16 | 苹果公司 | The autonomous vehicle near destination is instructed using signal of intent |
CN109426256A (en) * | 2017-09-05 | 2019-03-05 | 百度(美国)有限责任公司 | The lane auxiliary system based on driver intention of automatic driving vehicle |
CN111026873A (en) * | 2019-10-24 | 2020-04-17 | 中国人民解放军军事科学院国防科技创新研究院 | Unmanned vehicle and navigation method and device thereof |
CN111008532A (en) * | 2019-12-12 | 2020-04-14 | 广州小鹏汽车科技有限公司 | Voice interaction method, vehicle and computer-readable storage medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024193983A1 (en) * | 2023-03-17 | 2024-09-26 | Jaguar Land Rover Limited | Method and apparatus for selecting a driving mode of a vehicle |
Also Published As
Publication number | Publication date |
---|---|
CN113226886A (en) | 2021-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022205211A1 (en) | Method and apparatus for controlling vehicle running and vehicle | |
CN110550029B (en) | Obstacle avoiding method and device | |
WO2022016457A1 (en) | Method and device for controlling switching of vehicle driving mode | |
WO2021102955A1 (en) | Path planning method for vehicle and path planning apparatus for vehicle | |
WO2022027304A1 (en) | Testing method and apparatus for autonomous vehicle | |
WO2021212379A1 (en) | Lane line detection method and apparatus | |
EP4234356A2 (en) | Remote verification of the number of passengers in an autonomous vehicle | |
CN110471411A (en) | Automatic Pilot method and servomechanism | |
US11726471B2 (en) | Methods and systems for gradually adjusting vehicle sensor perspective using remote assistance | |
WO2022057737A1 (en) | Parking control method and related device | |
CN113167038B (en) | Method and device for vehicle to pass through barrier gate cross bar | |
WO2022022344A1 (en) | Automatic driving control method and apparatus | |
WO2022062825A1 (en) | Vehicle control method, device, and vehicle | |
WO2022057745A1 (en) | Assisted-driving control method and apparatus | |
WO2022017307A1 (en) | Autonomous driving scenario generation method, apparatus and system | |
CN113954858A (en) | Method for planning vehicle driving route and intelligent automobile | |
WO2022052872A1 (en) | Autonomous driving method and apparatus | |
WO2022127502A1 (en) | Control method and device | |
WO2022062582A1 (en) | Method and apparatus for controlling light supplementing time of camera module | |
EP4159564A1 (en) | Method and device for planning vehicle longitudinal motion parameters | |
WO2022061702A1 (en) | Method, apparatus, and system for driving alerts | |
WO2023102827A1 (en) | Path constraint method and device | |
EP4130921B1 (en) | Method for optimizing decision-making regulation and control, method for controlling vehicle traveling, and related devices | |
WO2023015510A1 (en) | Collision avoidance method and control apparatus | |
US20230195107A1 (en) | Systems, Methods, and Apparatus for using Remote Assistance to Annotate Images of an Environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21933874 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21933874 Country of ref document: EP Kind code of ref document: A1 |