WO2024093321A1 - Vehicle position acquiring method, model training method, and related device - Google Patents

Vehicle position acquiring method, model training method, and related device Download PDF

Info

Publication number
WO2024093321A1
WO2024093321A1 PCT/CN2023/104695 CN2023104695W WO2024093321A1 WO 2024093321 A1 WO2024093321 A1 WO 2024093321A1 CN 2023104695 W CN2023104695 W CN 2023104695W WO 2024093321 A1 WO2024093321 A1 WO 2024093321A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
vehicle
lane
around
time
Prior art date
Application number
PCT/CN2023/104695
Other languages
French (fr)
Chinese (zh)
Inventor
李姗
邓乃铭
邢国成
朱丽
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024093321A1 publication Critical patent/WO2024093321A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads

Definitions

  • the present application relates to the field of artificial intelligence, and in particular to a vehicle position acquisition method, a model training method, and related equipment.
  • Kalman filtering is mainly used to predict the position of the vehicle.
  • this method only relies on the historical trajectory of the vehicle and does not consider the lane information in the environment.
  • the present application provides a vehicle position acquisition method, a model training method and related equipment, which can predict the positions of vehicles around the vehicle.
  • the present application provides a method for obtaining the position of a vehicle, which can be used in the field of artificial intelligence.
  • the method includes:
  • first information and second information are obtained, wherein the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle; then, the first information and the second information are input into the first model to obtain prediction information generated by the first model, wherein the prediction information includes predicted position information of vehicles around the vehicle within the first time.
  • the ego vehicle and the vehicles around it form an interdependent whole, and their respective behaviors affect each other's decisions.
  • Previous studies often rely solely on the historical trajectory of the ego vehicle to predict the future trajectory of the vehicle, and the prediction results are obviously inaccurate.
  • the predicted position information of the vehicles around the ego vehicle is associated with the lane, thereby further improving the accuracy of the prediction results, providing a basis for the decision-making planning of the autonomous driving vehicle, and also improving the riding experience of the autonomous driving vehicle.
  • the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
  • the vehicle's future driving intention is bound to the lane by outputting the information of the lanes where the vehicles around the vehicle are located, which effectively utilizes the relationship between the vehicles and lanes around the vehicle and improves the prediction accuracy of the vehicle's position.
  • the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the method further includes:
  • the first lane is determined as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.
  • the future driving position of the vehicle is bound to the lane.
  • the correlation between the target vehicle and the lanes around the vehicle is output to give the probability of the target vehicle traveling in each lane in the future.
  • the lane with the highest correlation is determined as the lane where the target vehicle is located in the first time, thereby improving the accuracy of the prediction results.
  • the first model is constructed based on an attention mechanism, and the first information and the second information are input into the first model to obtain prediction information generated by the first model, including: inputting the first information and the second information into the first model, and generating fourth information based on the attention mechanism, the fourth information including a correlation between a target vehicle around the ego vehicle and the first lane set within a first time, the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;
  • the categories of the road scene include intersection scenes and non-intersection scenes
  • a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
  • the fifth information is obtained from the fourth information, and the third information is generated according to the fifth information, wherein the fifth information includes the target vehicle and the second lane set.
  • the predicted trajectory information of the vehicles around the vehicle within the first time is generated.
  • the correlation between the target vehicle and the first lane set in the first time is obtained.
  • the predicted intention is made more stable;
  • the lane to be retained is selected according to the road scene in which the vehicle is located, thereby further improving the accuracy of the prediction.
  • acquiring fifth information from fourth information, and generating third information according to the fifth information includes:
  • the normalized fifth information is input into the multi-layer perceptron to obtain the third information.
  • the fourth information includes the correlation between the target vehicle around the vehicle and the first lane set in the first time, that is, the attention score of the target vehicle relative to each lane in the first lane set.
  • the second lane set to which the target vehicle belongs can be determined, and the corresponding fifth information can be screened out from the fourth information according to the second lane set, so that the lane prediction can be carried out in a targeted manner according to the specific road scene to which the target vehicle belongs, so as to further improve the accuracy of the prediction.
  • generating fourth information according to the first information and the second information includes:
  • a normalization operation is performed on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.
  • the first information and the second information can be fused based on the attention mechanism to obtain an attention score of the target vehicle relative to each lane.
  • generating predicted trajectory information of vehicles around the vehicle within a first time according to the second information and the fourth information includes:
  • the sixth information is input into the multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle in the first time.
  • the data of the second linear matrix and the fourth information can be fused based on the attention mechanism, and the obtained third information is input into a multi-layer perceptron. Under the action of the multi-layer perceptron, the predicted trajectory information of the vehicles around the vehicle in the first time is obtained.
  • the present application provides a model training method that can be used in the field of artificial intelligence.
  • the method includes:
  • first information and second information are obtained, the first information including information about vehicles around the vehicle, and the second information including information about lanes around the vehicle; then, the first information and the second information are input into the first model to obtain prediction information generated by the first model, the prediction information including predicted position information of vehicles around the vehicle within the first time; finally, the first model is trained according to the loss function, the loss function indicates the similarity between the prediction information and the correct information, and the correct information includes the correct position information of vehicles around the vehicle within the first time.
  • the training samples used include complete information about vehicles around the vehicle and complete information about lanes around the vehicle, so that the position information output by the first model is more accurate. It is understandable that the first model can be used to perform the steps in the aforementioned first aspect or the optional implementation of the first aspect.
  • the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
  • the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the method further includes:
  • the first lane is determined as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.
  • the first model is constructed based on an attention mechanism, and the first information and the second information are input into the first model to obtain prediction information generated by the first model, including:
  • the categories of the road scene include intersection scenes and non-intersection scenes
  • a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
  • the predicted trajectory information of the vehicles around the vehicle within the first time is generated.
  • acquiring fifth information from fourth information, and generating third information according to the fifth information includes:
  • the normalized fifth information is input into the multi-layer perceptron to obtain the third information.
  • generating fourth information according to the first information and the second information includes:
  • a normalization operation is performed on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.
  • generating predicted trajectory information of vehicles around the vehicle within a first time according to the second information and the fourth information includes:
  • the sixth information is input into the multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle in the first time.
  • the present application provides a vehicle position acquisition device that can be used in the field of artificial intelligence.
  • the device includes an acquisition module and a position prediction module.
  • the acquisition module is used to acquire first information and second information, the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle;
  • the position prediction module is used to input the first information and the second information into a first model to obtain prediction information generated by the first model, and the prediction information includes predicted position information of vehicles around the vehicle within the first time.
  • the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
  • the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the device further includes:
  • the lane determination module is used to determine the first lane as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.
  • the first model is constructed based on the attention mechanism, and the position prediction module is specifically used for:
  • the categories of the road scene include intersection scenes and non-intersection scenes
  • a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
  • the predicted trajectory information of the vehicles around the vehicle within the first time is generated.
  • the location prediction module is specifically used to:
  • the normalized fifth information is input into the multi-layer perceptron to obtain the third information.
  • the location prediction module is specifically used to:
  • a normalization operation is performed on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.
  • the location prediction module is specifically used to:
  • the sixth information is input into the multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle in the first time.
  • the various modules included in the vehicle position acquisition device can also be used to implement the steps in the various possible implementation methods of the first aspect.
  • the specific implementation methods of the third aspect of the embodiment of the present application and certain steps in the various possible implementation methods of the third aspect, as well as the beneficial effects brought about by each possible implementation method you can refer to the description of the various possible implementation methods in the first aspect, and will not be repeated here one by one.
  • the present application provides a model training device that can be used in the field of artificial intelligence.
  • the device includes an acquisition module, a position prediction module, and a model training module.
  • the acquisition module is used to acquire first information and second information, the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle;
  • the position prediction module is used to input the first information and the second information into the first model to obtain the prediction information generated by the first model, and the prediction information includes the predicted position information of the vehicles around the vehicle within the first time;
  • the model training module is used to train the first model according to the loss function, and the loss function indicates the similarity between the prediction information and the correct information, and the correct information includes the correct position information of the vehicles around the vehicle within the first time.
  • the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
  • the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the device further includes:
  • the lane determination module is used to determine the first lane as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.
  • the first model is constructed based on the attention mechanism, and the position prediction module is specifically used for:
  • the categories of the road scene include intersection scenes and non-intersection scenes
  • a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
  • the predicted trajectory information of the vehicles around the vehicle within the first time is generated.
  • the location prediction module is specifically used to:
  • the normalized fifth information is input into the multi-layer perceptron to obtain the third information.
  • the location prediction module is specifically used to:
  • a normalization operation is performed on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.
  • the location prediction module is specifically used to:
  • the sixth information is input into the multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle in the first time.
  • the modules included in the training device of the model can also be used to implement the steps in the various possible implementation methods of the second aspect.
  • the specific implementation methods of the fourth aspect of the embodiment of the present application and certain steps in the various possible implementation methods of the fourth aspect, as well as the beneficial effects brought about by each possible implementation method you can refer to the description of the various possible implementation methods in the second aspect, and will not be repeated here one by one.
  • an embodiment of the present application provides an execution device, which may include a processor, the processor and a memory are coupled, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the vehicle position acquisition method described in the first aspect is implemented.
  • an execution device which may include a processor, the processor and a memory are coupled, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the vehicle position acquisition method described in the first aspect is implemented.
  • an embodiment of the present application provides an autonomous driving vehicle, which may include a processor, the processor and a memory are coupled, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the vehicle position acquisition method described in the first aspect is implemented.
  • the execution device in each possible implementation method of the processor executing the first aspect, the details can be referred to the first aspect above, and will not be repeated here.
  • an embodiment of the present application provides a training device, which may include a processor, the processor and a memory are coupled, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the training method of the model described in the second aspect is implemented.
  • a training device which may include a processor, the processor and a memory are coupled, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the training method of the model described in the second aspect is implemented.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which, when running on a computer, enables the computer to execute the method described in the first aspect or any possible implementation of the first aspect, or enables the computer to execute the method described in the second aspect or any possible implementation of the second aspect.
  • an embodiment of the present application provides a circuit system, which includes a processing circuit, and the processing circuit is configured to execute the method described in the first aspect or any possible implementation of the first aspect, or the processing circuit is configured to execute the method described in the second aspect or any possible implementation of the second aspect.
  • an embodiment of the present application provides a computer program product, which, when running on a computer, enables the computer to execute the method described in the first aspect or any possible implementation of the first aspect, or enables the computer to execute the method described in the second aspect or any possible implementation of the second aspect.
  • the present application provides a chip system, including a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method as described in the first aspect or any possible implementation of the first aspect, or to enable the computer to execute the method as described in the second aspect or any possible implementation of the second aspect.
  • the chip system can be composed of a chip, or it can include a chip and other discrete devices.
  • FIG1a is a schematic diagram of a structure of an artificial intelligence main framework
  • FIG1b is a structural schematic diagram of a road condition
  • FIG1c is a schematic diagram of a structure of an automatic driving device with an automatic driving function provided in an embodiment of the present application.
  • FIG2a is a schematic diagram of a system architecture provided by the present application.
  • FIG2b is a schematic diagram of a flow chart of a method for obtaining a vehicle position provided in the present application.
  • FIG3 is a schematic diagram of a structure of a multi-layer perceptron provided in an embodiment of the present application.
  • FIG4 is another schematic diagram of a flow chart of a method for obtaining a vehicle position provided by the present application.
  • FIG5 is a schematic diagram of a structure of a first model provided in an embodiment of the present application.
  • FIG6 is another schematic diagram of the structure of the first model provided in an embodiment of the present application.
  • FIG7 is a schematic diagram of a structure of a first embedded module provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of a structure of a second embedded module provided in an embodiment of the present application.
  • FIG9 is a schematic diagram of a structure of a first decoder module provided in an embodiment of the present application.
  • FIG10 is a flow chart of a method for training a model provided in an embodiment of the present application.
  • FIG11 is a schematic diagram of a structure of a vehicle position acquisition device provided in an embodiment of the present application.
  • FIG12 is a schematic diagram of a structure of a training device for a model provided in an embodiment of the present application.
  • FIG13 is a schematic diagram of a structure of an execution device provided in an embodiment of the present application.
  • FIG14 is a schematic diagram of a structure of a training device provided in an embodiment of the present application.
  • FIG. 15 is a schematic diagram of the structure of a chip provided in an embodiment of the present application.
  • a and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone.
  • the character "/" in this application generally indicates that the associated objects before and after are in an "or" relationship.
  • the meaning of "at least one” refers to one or more, and the meaning of “plurality” refers to two or more. It is understood that in the present application, “when”, “if” and “if” all refer to the device making corresponding processing under certain objective circumstances, and do not limit the time, nor do they require that there must be a judgment action when the device is implemented, nor do they mean that there are other limitations.
  • the special word “exemplary” means “used as an example, embodiment or illustrative”. Any embodiment described as “exemplary” is not necessarily interpreted as being superior or better than other embodiments.
  • AI artificial intelligence
  • AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.
  • Figure 1a shows a structural diagram of the main framework of artificial intelligence.
  • the following is an explanation of the above artificial intelligence theme framework from the two dimensions of "intelligent information chain” (horizontal axis) and “IT value chain” (vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be a general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensation process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecology process of the system.
  • the infrastructure provides computing power support for the artificial intelligence system, enables communication with the outside world, and is supported by the basic platform. It communicates with the outside world through sensors; computing power is provided by smart chips, such as central processing units (CPU), neural-network processing units (NPU), graphics processing units (GPU), application specific integrated circuits (ASIC) or field programmable gate arrays (FPGA) and other hardware acceleration chips; the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
  • smart chips such as central processing units (CPU), neural-network processing units (NPU), graphics processing units (GPU), application specific integrated circuits (ASIC) or field programmable gate arrays (FPGA) and other hardware acceleration chips
  • the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks,
  • the data on the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence.
  • the data involves graphics, images, voice, video,
  • the text also involves IoT data of traditional equipment, including business data of existing systems and perception data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, and training.
  • Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formalized information to perform machine thinking and solve problems based on reasoning control strategies. Typical functions are search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
  • some general capabilities can be further formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing (such as image recognition, target detection, etc.), speech recognition, etc.
  • an algorithm for example, translation, text analysis, computer vision processing (such as image recognition, target detection, etc.), speech recognition, etc.
  • Smart products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, which productizes intelligent information decision-making and realizes practical applications. Its application areas mainly include: smart manufacturing, smart transportation, smart home, smart medical care, smart security, autonomous driving, smart terminals, etc.
  • the present application can be applied to the field of autonomous driving, and specifically can realize the prediction of driving intention and driving trajectory of other vehicles in the field of autonomous driving.
  • Driving intention refers to the driving strategy that a vehicle will take in the future. Specifically, the driving intention of a vehicle can be estimated based on the vehicle's road condition information and driving status. Vehicle trajectory prediction refers to predicting the location of the vehicle at each time point in the future.
  • the above-mentioned surrounding vehicles can also be referred to as associated vehicles located around the vehicle.
  • driving intention is defined as directional intentions such as going straight, turning left, and turning right.
  • driving intention of a vehicle may include going straight, turning left, and turning right.
  • the above definition of driving intention has limited representation capabilities in complex scenarios, and directional intentions cannot cover all driving intentions in some complex intersections or other complex lane scenarios.
  • Figure 1b is a structural schematic diagram of a road condition, in which lanes 1 and 2 are left-turn lanes, lanes 3 and 4 are straight lanes, and lane 5 is an S-shaped lane.
  • a vehicle position acquisition method provided in an embodiment of the present application can be applied to an automatic driving prediction system.
  • the prediction system can predict the driving intention and predicted trajectory of other vehicles based on road condition information, the vehicle's historical driving route and other information.
  • the prediction system may include a hardware circuit (such as an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, etc.), or a combination of these hardware circuits.
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • DSP digital signal processor
  • the prediction system may be a hardware system with an instruction execution function, such as a CPU, DSP, etc., or a hardware system without an instruction execution function, such as an ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems without an instruction execution function and hardware systems with an instruction execution function.
  • the prediction system can be a hardware system with an execution instruction function
  • the vehicle position acquisition provided in the embodiment of the present application can be a software code stored in a memory.
  • the prediction system can obtain the software code from the memory and execute the obtained software code to implement the vehicle position acquisition provided in the embodiment of the present application.
  • the prediction system can be a combination of a hardware system that does not have the function of executing instructions and a hardware system that has the function of executing instructions. Some steps of obtaining the vehicle position provided in the embodiment of the present application can also be implemented by a hardware system in the prediction system that does not have the function of executing instructions, which is not limited here.
  • the prediction system can be deployed on a vehicle or a server on the cloud side.
  • the prediction process of using the prediction system to realize the driving intention of other vehicles and the predicted trajectory is described.
  • the vehicles in the embodiments of the present application may refer to internal combustion engine vehicles that use an engine as a power source, hybrid vehicles that use an engine and an electric motor as power sources, electric vehicles that use an electric motor as a power source, and the like.
  • the vehicle may include an automatic driving device 100 with an automatic driving function.
  • FIG. 1c is a functional block diagram of an automatic driving device 100 with an automatic driving function provided in an embodiment of the present application.
  • the automatic driving device 100 may include various subsystems, such as a travel system 102, a sensor system 104, a control system 106, one or more peripheral devices 108, and a power supply 110, a computer system 112, and a user interface 116.
  • the automatic driving device 100 may include more or fewer subsystems, and each subsystem may include multiple elements.
  • each subsystem and element of the automatic driving device 100 may be interconnected by wire or wirelessly.
  • the travel system 102 may include components that provide powered movement for the autonomous driving device 100.
  • the travel system 102 may include an engine 118, an energy source 119, a transmission 120, and wheels/tires 121.
  • the engine 118 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a hybrid engine consisting of a gasoline engine and an electric motor, or a hybrid engine consisting of an internal combustion engine and an air compression engine.
  • the engine 118 converts the energy source 119 into mechanical energy.
  • Examples of energy source 119 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity. Energy source 119 may also provide energy for other systems of autonomous driving device 100.
  • the transmission 120 can transmit mechanical power from the engine 118 to the wheels 121.
  • the transmission 120 may include a gearbox, a differential, and a drive shaft.
  • the transmission 120 may also include other devices, such as a clutch.
  • the drive shaft may include one or more shafts that can be coupled to one or more wheels 121.
  • the sensor system 104 may include several sensors that sense information about the environment surrounding the autonomous driving device 100.
  • the sensor system 104 may include a positioning system 122 (the positioning system may be a global positioning system (GPS) system, or a Beidou system or other positioning systems), an inertial measurement unit (IMU) 124, a radar 126, a laser rangefinder 128, and a camera 130.
  • the sensor system 104 may also include sensors of the internal systems of the monitored autonomous driving device 100 (e.g., an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensor data from one or more of these sensors may be used to detect objects and their corresponding characteristics (position, shape, direction, speed, etc.). Such detection and recognition are key functions for the safe operation of the autonomous autonomous driving device 100.
  • Positioning system 122 may be used to estimate the geographic location of autonomous driving device 100.
  • IMU 124 is used to sense position and orientation changes of autonomous driving device 100 based on inertial acceleration.
  • IMU 124 may be a combination of an accelerometer and a gyroscope.
  • the radar 126 may utilize radio signals to sense objects within the surrounding environment of the autonomous driving device 100. In some embodiments, in addition to sensing objects, the radar 126 may also be used to sense the speed and/or heading of the objects.
  • the radar 126 may include an electromagnetic wave transmitting unit and a receiving unit.
  • the radar 126 may be implemented as a pulse radar or a continuous wave radar based on the principle of radio wave transmission.
  • the radar 126 may be implemented as a frequency modulated continuous wave (FMCW) mode or a frequency shift keying (FSK) mode according to the signal waveform.
  • FMCW frequency modulated continuous wave
  • FSK frequency shift keying
  • the radar 126 can detect an object based on a time of flight (TOF) method or a phase-shift method using electromagnetic waves as a medium, and detect the position of the detected object, the distance to the detected object, and the relative speed.
  • TOF time of flight
  • the radar 126 can be configured at an appropriate position outside the vehicle.
  • the laser radar 126 can detect an object based on a TOF method or a phase-shift method using laser as a medium, and detect the position of the detected object, the distance to the detected object, and the relative speed.
  • the lidar 126 may be configured at a suitable location on the exterior of the vehicle.
  • the laser rangefinder 128 may utilize laser light to sense objects in the environment in which the autonomous driving device 100 is located.
  • the laser rangefinder 128 may include one or more laser sources, a laser scanner, and one or more detectors, among other system components.
  • the camera 130 may be used to capture multiple images of the surrounding environment of the autonomous driving device 100.
  • the camera 130 may be a still camera or a video camera. machine.
  • the camera 130 may be located at an appropriate position outside the vehicle.
  • the camera 130 may be arranged in the interior of the vehicle close to the front windshield.
  • the camera 130 may be arranged around the front bumper or radiator grille.
  • the camera 130 may be arranged in the interior of the vehicle close to the rear window glass.
  • the camera 130 may be arranged around the rear bumper, trunk, or tailgate.
  • the camera 130 may be arranged in the interior of the vehicle close to at least one of the side windows.
  • the camera 130 may be arranged around the side mirrors, fenders, or doors.
  • the road condition information, historical driving route, and historical driving routes of associated vehicles located around the target vehicle, etc. of the target vehicle can be acquired based on one or more sensors in the sensor system 104.
  • the control system 106 controls the operation of the autonomous driving device 100 and its components.
  • the control system 106 may include various elements, including a steering system 132 , a throttle 134 , a brake unit 136 , a sensor fusion algorithm 138 , a computer vision system 140 , a path control system 142 , and an obstacle avoidance system 144 .
  • the steering system 132 is operable to adjust the forward direction of the autonomous driving device 100.
  • it may be a steering wheel system.
  • the throttle 134 is used to control the operating speed of the engine 118 and thus the speed of the autopilot 100 .
  • the brake unit 136 is used to control the deceleration of the automatic driving device 100.
  • the brake unit 136 can use friction to slow down the wheel 121.
  • the brake unit 136 can convert the kinetic energy of the wheel 121 into electric current.
  • the brake unit 136 can also take other forms to slow down the rotation speed of the wheel 121 to control the speed of the automatic driving device 100.
  • the computer vision system 140 may be operable to process and analyze images captured by the camera 130 in order to identify objects and/or features in the environment surrounding the autonomous driving device 100.
  • the objects and/or features may include traffic signs, road boundaries, and obstacles.
  • the computer vision system 140 may use object recognition algorithms, structure from motion (SFM) algorithms, video tracking, and other computer vision techniques.
  • SFM structure from motion
  • video tracking and other computer vision techniques.
  • the computer vision system 140 may be used to map the environment, track objects, estimate the speed of objects, and the like.
  • the route control system 142 is used to determine the driving route of the autonomous driving device 100.
  • the route control system 142 may combine data from the sensors 138, the positioning system 122, and one or more predetermined maps to determine the driving route for the autonomous driving device 100.
  • the obstacle avoidance system 144 is used to identify, evaluate, and avoid or otherwise negotiate potential obstacles in the environment of the autonomous driving device 100 .
  • control system 106 may include additional or alternative components other than those shown and described, or may also reduce some of the components shown above.
  • the autonomous driving device 100 interacts with external sensors, other autonomous driving devices, other computer systems, or users through peripheral devices 108.
  • the peripheral devices 108 may include a wireless communication system 146, an onboard computer 148, a microphone 150, and/or a speaker 152.
  • the peripheral device 108 provides a means for the user of the autonomous driving device 100 to interact with the user interface 116.
  • the onboard computer 148 can provide information to the user of the autonomous driving device 100.
  • the user interface 116 can also operate the onboard computer 148 to receive input from the user.
  • the onboard computer 148 can be operated through a touch screen.
  • the peripheral device 108 can provide a means for the autonomous driving device 100 to communicate with other devices located in the vehicle.
  • the microphone 150 can receive audio (e.g., voice commands or other audio input) from the user of the autonomous driving device 100.
  • the speaker 152 can output audio to the user of the autonomous driving device 100.
  • the wireless communication system 146 can communicate wirelessly with one or more devices directly or via a communication network.
  • the wireless communication system 146 can use 3G cellular communication, such as code division multiple access (CDMA), EVDO, global system for mobile communications (GSM)/general packet radio service (GPRS), or 4G cellular communication, such as long term evolution (LTE), or 5G cellular communication.
  • the wireless communication system 146 can communicate with a wireless local area network (WLAN) using WiFi.
  • the wireless communication system 146 can communicate directly with the device using an infrared link, Bluetooth, or ZigBee.
  • Other wireless protocols, such as various autonomous driving device communication systems, for example, the wireless communication system 146 may include one or more dedicated short range communications (DSRC) devices, which may include public and/or private data communications between autonomous driving devices and/or roadside stations.
  • DSRC dedicated short range communications
  • the road condition information, historical driving trajectory and other information in the embodiments of the present application can be received by the vehicle from other vehicles or a cloud-side server through the wireless communication system 146.
  • the vehicle can receive driving intention information and the like for the target vehicle transmitted by the server through the wireless communication system 146 .
  • the power source 110 can provide power to the various components of the autonomous driving device 100.
  • the power source 110 can be a rechargeable lithium-ion or lead-acid battery.
  • One or more battery packs of such batteries can be configured as a power source to provide power to the various components of the autonomous driving device 100.
  • the power source 110 and the energy source 119 can be implemented together, such as in some all-electric vehicles.
  • the computer system 112 may include at least one processor 113 that executes instructions 115 stored in a non-transitory computer-readable medium such as a memory 114.
  • the computer system 112 may also be a plurality of computing devices that control individual components or subsystems of the autonomous driving device 100 in a distributed manner.
  • Processor 113 can be any conventional processor, such as a commercially available central processing unit (CPU). Alternatively, the processor can be a dedicated device such as an application specific integrated circuit (ASIC) or other hardware-based processor.
  • FIG. 1c functionally illustrates the processor, memory, and other elements of the computer 110 in the same block, it should be understood by those skilled in the art that the processor, computer, or memory may actually include multiple processors, computers, or memories that may or may not be stored in the same physical housing.
  • the memory may be a hard drive or other storage medium located in a housing different from the computer 110. Therefore, references to processors or computers will be understood to include references to a collection of processors or computers or memories that may or may not operate in parallel.
  • some components such as steering components and deceleration components can each have their own processor that performs only calculations related to the functions specific to the component.
  • the processor may be located remotely from the autonomous driving device and in wireless communication with the autonomous driving device. In other aspects, some of the processes described herein are performed on a processor disposed within the autonomous driving device and others are performed by a remote processor, including taking the necessary steps to perform a single maneuver.
  • the memory 114 may include instructions 115 (e.g., program logic) that may be executed by the processor 113 to perform various functions of the autonomous driving device 100, including those described above.
  • the memory 114 may also include additional instructions, including instructions to send data to, receive data from, interact with, and/or control one or more of the travel system 102, the sensor system 104, the control system 106, and the peripherals 108.
  • Memory 114 may store data, such as road maps, route information, the position, direction, speed, and other such autopilot data of the autopilot, and other information in addition to instructions 115. Such information may be used by the autopilot 100 and computer system 112 during operation of the autopilot 100 in autonomous, semi-autonomous, and/or manual modes.
  • the vehicle position acquisition method provided in the embodiment of the present application may be a software code stored in the memory 114.
  • the processor 113 may acquire the software code from the memory and execute the acquired software code to implement the vehicle position acquisition method provided in the embodiment of the present application.
  • the driving intention may be transmitted to the control system 106, and the control system 106 may determine the driving strategy of the vehicle based on the driving intention.
  • the user interface 116 is used to provide information to or receive information from a user of the autonomous driving device 100.
  • the user interface 116 may include one or more input/output devices within the set of peripheral devices 108, such as a wireless communication system 146, an onboard computer 148, a microphone 150, and a speaker 152.
  • Computer system 112 may control functions of autonomous driving device 100 based on input received from various subsystems (e.g., travel system 102, sensor system 104, and control system 106) and from user interface 116. For example, computer system 112 may utilize input from control system 106 in order to control steering unit 132 to avoid obstacles detected by sensor system 104 and obstacle avoidance system 144. In some embodiments, computer system 112 may be operable to provide control over many aspects of autonomous driving device 100 and its subsystems.
  • various subsystems e.g., travel system 102, sensor system 104, and control system 106
  • computer system 112 may utilize input from control system 106 in order to control steering unit 132 to avoid obstacles detected by sensor system 104 and obstacle avoidance system 144.
  • computer system 112 may be operable to provide control over many aspects of autonomous driving device 100 and its subsystems.
  • one or more of the above components may be installed or associated separately from the autonomous driving device 100.
  • the memory 114 may be partially or completely separate from the autonomous driving device 100.
  • the above components may be communicatively coupled together in a wired and/or wireless manner.
  • FIG. 1c should not be understood as a limitation on the embodiments of the present application.
  • the present embodiment provides a system architecture 200a.
  • the system architecture includes a database 230a and a client device 240a.
  • the data acquisition device 260a is used to collect data and store it in the database 230a.
  • the training module 202a is based on the data maintained in the database 230a.
  • the data generates the target model/rule 201a. The following will describe in more detail how the training module 202a obtains the target model/rule 201a based on the data.
  • the target model/rule 201a is the first model mentioned in the following embodiments of the present application.
  • the calculation module 211a may include a training module 202a, and the target model/rule obtained by the training module 202a may be applied to different systems or devices.
  • the execution device 210a is configured with a transceiver 212a, which may be a wireless transceiver, an optical transceiver, or a wired interface (such as an I/O interface), etc., to interact with external devices for data, and a "user" may input data to the transceiver 212a through a client device 240a.
  • the client device 240a may send a target task to the execution device 210a, requesting the execution device to train a neural network, and send a database for training to the execution device 210a.
  • the execution device 210a can call data, codes, etc. in the data storage system 250a, and can also store data, instructions, etc. in the data storage system 250a.
  • the calculation module 211a uses the target model/rule 201a to process the input data. Specifically, the calculation module 211a is used to: first, obtain the first information and the second information, wherein the first information includes the information of the vehicles around the vehicle, and the second information includes the information of the lanes around the vehicle; then, input the first information and the second information into the first model to obtain the prediction information generated by the first model, wherein the prediction information includes the predicted position information of the vehicles around the vehicle within the first time.
  • the transceiver 212a returns the output of the neural network to the client device 240a.
  • the user can input a text to be converted into a sign language action through the client device 240a, and the neural network outputs the sign language action or the parameters representing the sign language action and feeds it back to the client device 240a.
  • the training module 202a can obtain corresponding target models/rules 201a for different tasks based on different data to provide users with better results.
  • the data input into the execution device 210a can be determined based on the user's input data.
  • the user can operate in the interface provided by the transceiver 212a.
  • the client device 240a can automatically input data into the transceiver 212a and obtain the result. If the automatic data input of the client device 240a requires the user's authorization, the user can set the corresponding authority in the client device 240a.
  • the user can view the result output by the execution device 210a on the client device 240a, and the specific presentation form can be a specific method such as display, sound, action, etc.
  • the client device 240a can also serve as a data collection terminal to store the collected data associated with the target task into the database 230a.
  • the training or updating process mentioned in the present application can be performed by the training module 202a. It is understandable that the training process of the neural network is to learn the way to control the spatial transformation, more specifically, to learn the weight matrix. The purpose of training the neural network is to make the output of the neural network as close to the expected value as possible. Therefore, the weight vector of each layer of the neural network in the neural network can be updated according to the difference between the predicted value and the expected value of the current network (of course, the weight vector can usually be initialized before the first update, that is, the parameters of each layer in the deep neural network are pre-configured).
  • the value of the weight in the weight matrix is adjusted to reduce the predicted value, and after continuous adjustment, the value output by the neural network is close to or equal to the expected value.
  • the difference between the predicted value and the expected value of the neural network can be measured by a loss function or an objective function. Taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference.
  • the training of the neural network can be understood as a process of minimizing the loss as much as possible.
  • a target model/rule 201 a is obtained through training according to the training module 202 a .
  • the target model/rule 201 a may be the first model in the present application.
  • the database 230a can be used to store sample sets for training.
  • the execution device 210a generates a target model/rule 201a for processing samples, and iteratively trains the target model/rule 201a using the sample set in the database to obtain a mature target model/rule 201a, which is specifically represented by a neural network.
  • the neural network obtained by the execution device 210a can be applied to different systems or devices.
  • the execution device 210a can call the data, code, etc. in the data storage system 250a, or store the data, instructions, etc. in the data storage system 250a.
  • the data storage system 250a can be placed in the execution device 210a, or the data storage system 250a can be an external memory relative to the execution device 210a.
  • the calculation module 211a can process the samples obtained by the execution device 210a through the neural network to obtain the prediction result.
  • the specific expression form of the prediction result is related to the function of the neural network.
  • FIG. 2a is only an exemplary schematic diagram of a system architecture provided in an embodiment of the present application, and the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation.
  • the data storage system 250a is located relative to the execution device 210a is an external memory. In other scenarios, the data storage system 250a may also be placed in the execution device 210a.
  • the target model/rule 201a trained by the training module 202a can be applied to different systems or devices, such as mobile phones, tablet computers, laptops, augmented reality (AR)/virtual reality (VR), vehicle terminals, etc., and can also be servers or cloud devices.
  • systems or devices such as mobile phones, tablet computers, laptops, augmented reality (AR)/virtual reality (VR), vehicle terminals, etc., and can also be servers or cloud devices.
  • FIG. 2b is a flow chart of a method for obtaining the position of a vehicle provided in an embodiment of the present application.
  • the method can be executed by an execution device 210a as shown in FIG. 2a.
  • the method specifically includes: 201b, obtaining first information and second information, the first information including information about vehicles around the vehicle, and the second information including information about lanes around the vehicle; 202b, inputting the first information and the second information into a first model to obtain prediction information generated by the first model, the prediction information including predicted position information of vehicles around the vehicle within a first time.
  • a neural network can be composed of neural units. Specifically, it can be understood as a neural network with an input layer, a hidden layer, and an output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in between are all hidden layers. Among them, a neural network with many hidden layers is called a deep neural network (DNN).
  • DNN deep neural network
  • the work of each layer in a neural network can be described by mathematical expressions. From a physical level, the work of each layer in a neural network can be understood as completing the transformation from input space to output space (i.e., the row space to column space of a matrix) through five operations on the input space (a set of input vectors). These five operations include: 1. Dimension increase/reduction; 2. Enlargement/reduction; 3.
  • W is the weight matrix of each layer of the neural network. Each value in the matrix represents the weight value of a neuron in this layer.
  • the matrix W determines the spatial transformation from the input space to the output space described above, that is, the W of each layer of the neural network controls how to transform the space.
  • the purpose of training a neural network is to eventually obtain the weight matrices of all layers of the trained neural network. Therefore, the training process of a neural network is essentially learning how to control spatial transformation, more specifically learning the weight matrix.
  • Convolutional neural network is a deep neural network with convolutional structure.
  • Convolutional neural network contains a feature extractor composed of convolution layer and subsampling layer.
  • the feature extractor can be regarded as a filter, and the convolution process can be regarded as convolving the same trainable filter with an input image or convolution feature plane (feature map).
  • Convolution layer refers to the neuron layer in convolutional neural network that performs convolution processing on the input signal.
  • a neuron can only be connected to some neurons in the adjacent layer.
  • a convolution layer usually contains several feature planes, each of which can be composed of some rectangular arranged neural units.
  • the neural units in the same feature plane share weights, and the shared weights here are convolution kernels.
  • Shared weights can be understood as the way to extract image information is independent of position.
  • the implicit principle is that the statistical information of a part of the image is the same as that of other parts. This means that the image information learned in a part can also be used in another part. So for all positions on the image, the same learned image information can be used.
  • multiple convolution kernels can be used to extract different image information. Generally speaking, the more convolution kernels there are, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
  • the direct benefit of shared weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • Deep Neural Network also known as multi-layer neural network
  • DNN Deep Neural Network
  • the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in between is All are hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • DNN The definition of these parameters in DNN is as follows: Take the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the layer number of coefficient W, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input.
  • the coefficients from the kth neuron in the L-1th layer to the jth neuron in the Lth layer are defined as
  • the input layer does not have a W parameter.
  • W weight matrix
  • more hidden layers allow the network to better describe complex situations in the real world. Theoretically, the more parameters a model has, the higher its complexity and the greater its "capacity", which means it can complete more complex learning tasks.
  • Training a deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by many layers of vectors W).
  • the loss function is used to characterize the gap between the predicted category and the true category
  • the cross entropy loss function cross entropy loss
  • the error back propagation (BP) algorithm can be used to correct the size of the parameters in the initial neural network model, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the forward transmission of the input signal to the output will generate error loss, and the parameters in the initial neural network model are updated by back propagating the error loss information, so that the error loss converges.
  • the back propagation algorithm is a back propagation movement dominated by error loss, which aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • the Transformer structure is a feature extraction network that includes an encoder and a decoder (similar to a convolutional neural network).
  • Encoder Performs feature learning in the global receptive field through self-attention, such as pixel features.
  • Decoder Learn the features of the required modules, such as the features of the output box, through self-attention and cross-attention.
  • the attention mechanism imitates the internal process of biological observation behavior, that is, a mechanism that aligns internal experience and external sensations to increase the observation precision of some areas, and can use limited attention resources to quickly filter out high-value information from a large amount of information.
  • the attention mechanism can quickly extract important features of sparse data, and is therefore widely used in natural language processing tasks, especially machine translation.
  • the self-attention mechanism is an improvement on the attention mechanism, which reduces dependence on external information and is better at capturing the internal correlation of data or features.
  • the essential idea of the attention mechanism can be rewritten as the following formula:
  • the self-attention mechanism provides an effective modeling method to capture global context information through QKV. Assume that the input is Q (query), and the context is stored in the form of a key-value pair (K, V). Then, the attention mechanism is actually a mapping function from query to a series of key-value pairs (key, value). The essence of the attention function can be described as a mapping from a query to a series of (key-value) pairs. Attention essentially assigns a weight coefficient to each element in the sequence, which can also be understood as soft addressing. If each element in the sequence is stored in the form of (K, V), then attention completes the addressing by calculating the similarity between Q and K. The similarity calculated between Q and K reflects the importance of the retrieved V value, that is, the weight, and then the weighted sum is used to obtain the final eigenvalue.
  • the calculation of attention is mainly divided into three steps.
  • the first step is to calculate the similarity between the query and each key to obtain the weight.
  • Commonly used similarity functions include dot product, concatenation, perceptron, etc.
  • the second step is generally to use a softmax function (on the one hand, it can be normalized, The probability distribution that the sum of all weight coefficients is 1 is obtained.
  • the characteristics of the softmax function can be used to highlight the weights of important elements) to normalize these weights; finally, the weights and the corresponding key values are weighted and summed to obtain the final eigenvalues.
  • attention includes self-attention and cross-attention.
  • Self-attention can be understood as a special attention, that is, the input of QKV is consistent.
  • the input of QKV in cross-attention is inconsistent.
  • Attention uses the similarity between features (such as inner product) as weight to integrate the queried features as the update value of the current feature.
  • Self-attention is the attention extracted based on the attention of the feature map itself.
  • Multi-Layer Perceptron (MLP) Multi-Layer Perceptron
  • Multilayer Perceptron also known as Multilayer Perceptron, is a feedforward artificial neural network model.
  • MLP is an artificial neural network (ANN) based on a fully connected (FC) forward structure, which contains artificial neurons ranging from a dozen to hundreds of thousands (AN, hereinafter referred to as neurons).
  • ANN artificial neural network
  • FC fully connected
  • MLP organizes neurons into a multi-layer structure, and uses a full connection method between layers to form an ANN with multi-weighted connection layers connected layer by layer. Its basic structure is shown in Figure 3.
  • the fully connected layers of MLP that contain calculations are numbered from 1, and the total number of layers is L.
  • the input layer number is set to 0, and the fully connected layers of MLP are divided into two categories: odd layers and even layers.
  • MLP contains an input layer (this layer does not actually contain calculations), one or more hidden layers, and an output layer.
  • a feature is an input variable, i.e. the x variable in a simple linear regression.
  • a simple machine learning task might use a single feature, while a more complex machine learning task might use millions of features.
  • the label is the y variable in simple linear regression, and the label can include multiple meanings.
  • the label can refer to the classification category of the input data. By labeling each of the different categories of input data, the label is used to indicate to the computing device the specific information represented by the data. Therefore, labeling the data is to tell the computing device what the multiple features of the input variable describe (i.e., y). y can be called a label or a target (i.e., a target value).
  • a sample refers to a specific instance of data.
  • a sample x represents an object.
  • Samples are divided into labeled samples and unlabeled samples. Labeled samples contain both features and labels, while unlabeled samples contain features but not labels.
  • the task of machine learning is often to learn the potential patterns in the input d-dimensional training sample set (which can be simply referred to as the training set).
  • the error back propagation (BP) algorithm can be used to correct the size of the parameters in the initial neural network model, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the forward transmission of the input signal to the output will generate error loss, and the parameters in the initial neural network model are updated by back propagating the error loss information, so that the error loss converges.
  • the back propagation algorithm is a back propagation movement dominated by error loss, which aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • neural networks can also include other functional networks, such as region proposal networks (RPNs) and feature pyramid networks (FPNs), which are used to further process the features extracted by the backbone network, such as identifying the classification of features and performing semantic segmentation on features.
  • RPNs region proposal networks
  • FPNs feature pyramid networks
  • Matrix multiplication is a binary operation that obtains a third matrix from two matrices.
  • the third matrix is the product of the first two, and is also commonly called the matrix product.
  • Matrices can be used to represent linear mappings, and matrix products can be used to represent the composition of linear mappings.
  • the normalized (softmax) function also known as the normalized exponential function, is a generalization of the logistic function.
  • the softmax function can transform a K-dimensional vector Z containing any real number into another K-dimensional vector ⁇ (Z), so that each element of the transformed vector ⁇ (Z) is between (0, 1) and the sum of all elements is 1.
  • the calculation method of the softmax function can be shown in Formula 1.
  • ⁇ (Z) j represents the value of the jth element in the vector after the softmax function transformation
  • Z j represents the value of the jth element in the vector Z
  • Z k represents the value of the kth element in vector Z
  • represents the sum.
  • the embedding layer may be referred to as an input embedding layer.
  • the current input may be a text input, for example, a paragraph of text or a sentence.
  • the text may be a Chinese text, an English text, or a text in another language.
  • the embedding layer may embed each word in the current input, and obtain a feature vector of each word.
  • the embedding layer includes an input embedding layer and a positional encoding layer. In the input embedding layer, each word in the current input may be subjected to word embedding processing, thereby obtaining a word embedding vector of each word.
  • the position of each word in the current input may be obtained, and then a position vector may be generated for the position of each word.
  • the position of each word may be the absolute position of each word in the current input.
  • the position vector of each word and the corresponding word embedding vector may be combined to obtain a feature vector of each word, that is, to obtain multiple feature vectors corresponding to the current input.
  • Multiple feature vectors may be represented as embedding vectors with preset dimensions. The number of feature vectors in the multiple feature vectors may be set to M, and the preset dimension may be H, so that the multiple feature vectors may be represented as M ⁇ H embedding vectors.
  • the vehicle location acquisition method can be executed by the vehicle's location acquisition device, or by a component of the vehicle's location acquisition device (such as a processor, a chip, or a chip system, etc.).
  • the vehicle's location acquisition device can be a cloud device, or it can be a vehicle or terminal device (such as a vehicle-mounted terminal, an aircraft terminal, etc.).
  • the method can also be executed by a system consisting of a cloud device and a vehicle.
  • the method can be processed by a CPU in the vehicle's location acquisition device, or it can be processed jointly by a CPU and a GPU, or it can be processed without a GPU, and other processors suitable for neural network calculations can be used, which is not limited by the present application.
  • the application scenario of the method can be used in intelligent driving scenarios.
  • the reasoning stage describes the process of how the execution device 210a uses the target model/rule 201a to process the collected information data to generate a prediction result.
  • Figure 4 is another flow chart of the vehicle position acquisition method provided in the embodiment of the present application.
  • Figure 4 takes the embodiment of the present application applied to the field of autonomous driving as an example for explanation. The method may include steps 401 to 403.
  • An execution device obtains first information and second information.
  • the first information includes information about vehicles around the vehicle
  • the second information includes information about lanes around the vehicle
  • the execution device obtains information about vehicles and lanes around the vehicle.
  • the execution device may be the vehicle, and the vehicle may directly collect information about vehicles and lanes around the vehicle through a collection device, such as a camera device, a radar device, etc.
  • the execution device may also receive information sent by other external devices, or select information from a database, etc., which are not specifically limited here.
  • the vehicle when the vehicle is driving, in order to accurately predict whether other vehicles around it will affect the driving safety of the vehicle, whether it will affect the driving decision of the vehicle, and how to control the driving strategy of the vehicle based on the surrounding vehicles, it is necessary to determine the driving intention of at least one associated vehicle located around the vehicle.
  • the target vehicle in the embodiment of the present application is any one of the at least one associated vehicle located around the vehicle.
  • associated vehicles can be understood as vehicles that are within a certain preset range of distance from the own vehicle, that is, based on the distance, it is determined which vehicles are associated with the own vehicle, and then these associated vehicles are regarded as associated vehicles of the own vehicle; in addition, “associated vehicles” can also be understood as vehicles that will affect the driving state decision of the own vehicle in the future, that is, based on whether it will affect the driving strategy of the own vehicle in the future, it is determined which vehicles are associated with the own vehicle, and then these associated vehicles are regarded as associated vehicles of the own vehicle.
  • the processor of the vehicle can control the relevant sensors on the vehicle to obtain vehicle information and lane information of surrounding vehicles based on the software code related to step 401 in the memory 114, and determine which vehicles are associated vehicles based on the acquired information, that is, determine which vehicles need to be predicted for intention.
  • the above process of determining the target vehicle can be determined by other vehicles or a server on the cloud side, which is not limited here.
  • the target vehicle's position can be the target vehicle's position in the map.
  • the absolute position of the target vehicle can also be the relative position between the target vehicle and the own vehicle.
  • the absolute position of the target vehicle can be determined based on the absolute position of the own vehicle and the relative position between the target vehicle and the own vehicle.
  • the driving status information of the target vehicle can be obtained, where the driving status information may include the position of the target vehicle.
  • the position of the target vehicle can be sensed by the sensor carried by the vehicle itself, or the position of the target vehicle can be obtained through interaction with other vehicles and servers on the cloud side.
  • the position of the target vehicle can be acquired in real time, or the position of the target vehicle can be acquired once at a certain interval.
  • preprocessing includes basic data processing operations such as data outlier processing, which will not be repeated here.
  • the main steps of acquiring and processing the first information and the second information are as follows:
  • the first information includes the information of the vehicles around the vehicle.
  • the vehicle information includes 8 feature data, namely the vehicle's horizontal coordinate, vertical coordinate, type, length, width, height, current speed, and direction of speed (i.e. the direction in which the vehicle is moving).
  • the specific collection method can be: obtain a frame of vector data every 0.2s, obtain a total of ten frames of vector data within the historical 2s, and add the vector data of the current frame to obtain a total of 11 frames of vector data, wherein each frame of vector data collected includes 8 features. Assuming that only 64 data are taken, the data of the targets that are farther away are selected according to the distance between the vehicles around the vehicle and the vehicle itself, and the data of the targets that are farther away are deleted.
  • the specific data of the collected vehicles can be found in Table 1 below.
  • the second information includes the information of the lanes around the vehicle.
  • the lane information includes 8 feature data. Take 20 waypoints on each lane around the vehicle, and each waypoint corresponds to 8 feature data, which are the horizontal and vertical coordinates of the waypoint, the type of lane (such as non-motorized vehicle lane and motor vehicle lane), whether the lane can go straight, whether the lane can turn left, whether the lane can turn right, whether the lane can turn around, and the lane number.
  • the specific collection method can be: collect the features of all lanes within 200 meters around the vehicle, take 20 waypoints for each lane, and take 8 features for each waypoint.
  • the waypoints on the lane can be selected by sampling the farthest point.
  • the collected lane data can be specifically referred to in Table 1 below.
  • the characteristic data contained in the information of vehicles around the vehicle and the lane information can be set according to actual needs, and no limitation is made here.
  • this embodiment only the positions of vehicles around the vehicle are predicted, so only the information of vehicles around the vehicle is collected.
  • obstacles such as non-motor vehicles and pedestrians are also included around the vehicle.
  • the information of non-motor vehicles and pedestrians around the vehicle can also be collected, and the positions of non-motor vehicles and pedestrians can be predicted by using the position acquisition method provided in the embodiment of the present application.
  • the trajectory of the vehicle in the first time in the future is actually obtained, and the trajectory data is labeled for use in the subsequent training stage.
  • the category labels include:
  • Track label indicates the actual driving track of the target vehicle in the first time. For example, the actual driving track in the next 3 seconds, the position of the target vehicle is collected every 0.2 seconds, a total of 15 points, each position includes x and y coordinates, and the output is (15,2) data.
  • Intersection label Indicates the exit lane selected by the target vehicle when leaving the intersection.
  • Non-intersection label Indicates the non-exit lane where the target vehicle is located at the first time.
  • the execution device inputs the first information and the second information into the first model to obtain prediction information generated by the first model, where the prediction information includes predicted position information of vehicles around the vehicle within the first time.
  • the execution device after the execution device obtains the information of vehicles and lanes around the vehicle, it can input this information into the first model, so as to predict the predicted position information of any vehicle around the vehicle through the first model based on the information of any vehicle and lane around the vehicle.
  • the first model includes an encoder and a decoder based on an attention mechanism.
  • Figure 5 is a schematic diagram of the structure of the first model provided in an embodiment of the present application.
  • the execution device inputs the acquired first information and second information into the first model, and outputs prediction information based on the structure of the encoder and decoder of the attention mechanism.
  • the predicted position information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
  • the prediction information obtained by the execution device from the first model may include two aspects: one aspect refers to the trajectory information of the vehicles around the vehicle in the first time in the future, and the second aspect refers to the third information, which indicates the lanes where the vehicles around the vehicle are located in the first time in the future.
  • the third information includes the correlation between the target vehicle around the vehicle and at least one lane around the vehicle in the first time, and the target vehicle is a vehicle around the vehicle.
  • the execution device can simultaneously collect information about several vehicles around the vehicle and output prediction information about several vehicles.
  • the prediction information of a target vehicle When the prediction information of a target vehicle is needed, it can be directly obtained from the prediction information of several vehicles.
  • the correlation described in the third information specifically refers to the attention score of the target vehicle and the lanes around the vehicle in the first time.
  • FIG. 6 is another structural schematic diagram of the first model provided in an embodiment of the present application.
  • the encoder in the first model includes an embedding module and an attention module
  • the decoder includes a first decoder module and a second decoder module. Each module is described in detail below.
  • the encoder consists of an embedding module and an attention module.
  • the embedded modules include a first embedded module and a second embedded module.
  • the execution device inputs the first information and the second information into an embedding module, and obtains three different weight matrices, namely, matrix Q, matrix K and matrix V, after embedding processing is performed on the input sequence.
  • the embedding module processes the first information through a first embedding module, and the first embedding module includes three submodules, namely a first submodule, a second submodule and a third submodule.
  • the first submodule specifically includes a two-dimensional convolution layer Conv2d1, a two-dimensional batch normalization layer BatchNorm2d, an activation function layer ReLU, and a two-dimensional convolution layer Conv2d2.
  • the second submodule and the third submodule have the same composition, both including a one-dimensional convolution layer Conv1d1, a one-dimensional batch normalization layer BatchNorm1d, an activation function layer ReLU, and a one-dimensional convolution layer Conv1d2.
  • the convolution kernel size kernel_size of the convolution layer Conv2d is (1,1), and the step size stride and zero padding are both default values.
  • the processing process of the first embedded module is specifically as follows:
  • the vehicle data (16, 64, 11, 8), the position data (16, 64, 2) and the time data (16, 11) are obtained from the first information, and the vehicle data, the position data and the time data are respectively input into the first submodule, the second submodule and the third submodule, and the data of (16, 64, 11, 256), (16, 64, 1, 256) and (16, 1, 11, 256) are respectively output. Subsequently, the three data are fused by the embedding method, and the data of (16, 64, 11, 256) is output, that is, the data form of the matrix Q is (16, 64, 11, 256).
  • 16 represents the batch size
  • 64 represents the number of vehicles around the vehicle (i.e. the number of data in Table 1)
  • 11 represents the number of collected
  • 2 represents the position of the vehicle (i.e., the horizontal and vertical coordinates shown in Table 1).
  • the information obtained from the 64 vehicles around the vehicle is the first information.
  • the data of the vehicle is obtained at 11 intervals and 11 times, and each acquired data includes a number of feature data (corresponding to the 8 features in Table 1).
  • characteristic data corresponding to the vehicles around the vehicle can be set according to actual needs or experiments. This is only an example and not a limitation.
  • the embedding module processes the second information through a second embedding module, and the second embedding module includes a fourth submodule.
  • FIG8 is a schematic diagram of the structure of the second embedding module provided in an embodiment of the present application.
  • the fourth submodule specifically includes a two-dimensional convolution layer Conv2d1, a two-dimensional batch normalization layer BatchNorm2d, an activation function layer ReLU, and a two-dimensional convolution layer Conv2d2.
  • the convolution kernel size kernel_size of the convolution layer Conv2d is (1,1), and the stride and zero padding are both default values.
  • the processing of the second embedding module is specifically described in combination with the examples in Table 1 above.
  • the data format of the second information is (16, 256, 20, 8), and the data is input into the fourth submodule to output (16, 256, 11, 256) data as the data format of matrix K and then matrix V.
  • 16 refers to batch size
  • 256 represents the number of lanes
  • 20 represents 20 waypoint data on each lane
  • 8 represents 8 feature data corresponding to each waypoint data.
  • the information taken from the 256 lanes around the car is the second information. Taking any lane as an example, 20 waypoint data on the lane are taken, and each waypoint data includes 8 feature data.
  • lane data around the vehicle can be set according to actual needs or tests, and is only used as an example and is not limited here.
  • Embedding is a mapping method, and the convolution and RELU activation function method used in the embedding module is only one of the implementation methods. In actual operation, other general embedding methods can also be used to embed the first information and the second information.
  • the calculation of the attention module is mainly divided into three steps.
  • the first step is to calculate the similarity between the query and each key to obtain the weight.
  • Commonly used similarity functions include dot product, concatenation, perceptron, etc.
  • the second step is to use the normalized softmax function to normalize these weights
  • the third step is to perform weighted summation of the weight and the corresponding key value to obtain the final eigenvalue.
  • the main functions of the normalization function include: on the one hand, the probability distribution of the sum of all weight coefficients being 1 can be obtained through normalization.
  • the score is converted into a matrix with values distributed between 0 and 1, and the result is the relevance of each lane to the current vehicle; on the other hand, the weights of important elements can be more highlighted through the inherent mechanism of softmax.
  • normalization can stabilize the gradient during training.
  • the first information and the second information are input into the first model, and based on the attention mechanism, the fourth information is generated.
  • the fourth information includes the correlation between the target vehicle around the own vehicle and the first lane set within the first time.
  • the target vehicle is a vehicle around the own vehicle, and the first lane set includes all lanes around the own vehicle included in the second information.
  • matrix Q, matrix K and matrix V are used as inputs of the attention module.
  • matrix Q, matrix K and matrix V are linearly mapped respectively to obtain a first linear matrix and a second linear matrix.
  • the first linear matrix is used as matrix Q
  • the second linear matrix is used as matrix K and matrix V
  • matrix Q, matrix K and matrix V are used as inputs of the attention module.
  • the correlation between each two input vectors is calculated through matrix Q and matrix K, that is, the correlation between the target vehicles around the vehicle and the first lane set in the first time, that is, the attention score, and the fourth information is output.
  • the sixth information is obtained, and the sixth information includes the predicted trajectory information of the target vehicles around the vehicle.
  • the main structure of the first model is an encoder-decoder structure based on the attention mechanism.
  • the prediction can be made based on the multi-attention mechanism through the redundancy of the first model.
  • other neural networks can also be used to replace the first model.
  • the specific structure and composition of the first model are only illustrated here by way of example and are not limited.
  • the decoder includes a first decoder module and a second decoder module.
  • the first decoder module is mainly used to process the predicted lane information around the vehicle.
  • the main working process of the first decoder module is:
  • the categories of the road scene include intersection scenes and non-intersection scenes
  • a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
  • the fifth information is obtained from the fourth information, and the third information is generated based on the fifth information.
  • the fifth information includes the correlation between the target vehicle and the second lane set within the first time, and the third information includes the correlation between the target vehicle around the own vehicle and at least one lane around the own vehicle within the first time.
  • Figure 9 is a structural diagram of the first decoder module provided in an embodiment of the present application. Assuming that the road scene to which the target vehicle belongs is an intersection scene, it means that the vehicle will travel on the intersection lane in the first time in the future, then the lanes in the non-intersection scene can be eliminated. Therefore, the second lane set can be selected from the first lane set, and the attention scores of the target vehicle relative to all non-intersection lanes are set to the minimum value. After the second lane set is filtered out, the fifth information is obtained from the fourth information, that is, the attention score of the target vehicle relative to each filtered lane.
  • the category of the road scene to which the target vehicle belongs can be directly obtained through a map or through other methods, which are not limited here.
  • a specific process of generating the third information according to the fifth information may include:
  • a normalization operation is performed on the fifth information to obtain normalized fifth information, and then the normalized fifth information is input into a multi-layer perceptron to obtain the third information.
  • the fifth information is normalized so that the range of each attention score in the normalized fifth information is between (0, 1), and the sum of all elements is 1. Subsequently, the normalized fifth information is input into a multi-layer perceptron, under the action of the perceptron, the correlation between the target vehicle and the lanes around the vehicle in the first time is output, and the intended lane and probability of the other vehicle in the future are given.
  • the second decoder module is mainly used to process the predicted trajectory information of target vehicles around the vehicle.
  • the main working process of the second decoder module is:
  • the sixth information is input into a multi-layer perceptron to obtain predicted trajectory information of vehicles around the vehicle within the first time.
  • the second decoder module includes an MLP
  • the sixth information is input into the second decoder module to output the trajectory of the target vehicle within the first future time. For example, assuming that the driving trajectory of the next 3 seconds is output, and one point is output every 0.2 seconds, the coordinates of a total of 15 points can be output.
  • the second decoder module can use a variety of methods to obtain the predicted trajectory of the vehicle.
  • the above method is only an example and is not limited here.
  • the execution device determines the first lane as the lane where the target vehicle is located in the first time, wherein the first lane is a lane with the highest correlation with the target vehicle in the first time among at least one lane around the vehicle.
  • the output of the first model is the predicted position information of the vehicles around the vehicle within the first time and the third information, wherein the third information includes the correlation between the target vehicle around the vehicle and at least one lane around the vehicle within the first time, that is, the attention score corresponding to the target vehicle and each lane, or it can be understood as the probability of the target vehicle driving in each lane in the future.
  • the execution device can select the lane with the highest attention score from the attention scores corresponding to the multiple lanes as the predicted lane where the target vehicle is located within the first time.
  • step 403 is an optional step.
  • the third information included in the prediction information generated by the first model indicates the lanes where the vehicles around the vehicle are located in the first time. How to further determine the lanes where the vehicles around the vehicle are located in the first time through the third information can be expanded into multiple implementation methods.
  • the execution device uses the lane with the highest correlation as the lane where the vehicles around the vehicle are located in the first time. In actual application, the attention can be The force score is used as one of the prediction bases, and is combined with other features or methods to predict the position, which is not limited here.
  • different numbers of lanes can be selected as environmental features to be considered according to the actual situation of the target vehicle, and the target vehicle trajectory can be combined with the surrounding lane features to form an attention feature by adopting an attention mechanism, so that the network can pay attention to the characteristics of the surrounding lanes when learning, making the vehicle trajectory prediction results more in line with actual driving rules and improving the accuracy of the prediction results.
  • the training stage describes the process of how the training device 220 generates a mature neural network using the image data set in the database 230a.
  • FIG. 10 is a flow chart of the training method of the model provided in the embodiment of the present application.
  • the training method of the model provided in the embodiment of the present application may include:
  • a training device obtains first information and second information, where the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle.
  • the training device obtains a data set of the first information and the second information, divides the data set into a training set, a validation set, and a test set, trains the model with the training set, adjusts the parameters with the validation set, and evaluates the performance with the test set.
  • the data division ratio of the training set, validation set, and test set can be set according to actual needs and is not limited here.
  • the specific implementation method of the training device executing step 1001 can refer to the description of the specific implementation method of step 401 in the embodiment corresponding to Figure 4, which will not be repeated here.
  • the training samples used include complete information about vehicles around the vehicle and complete information about lanes around the vehicle, so that the position information output by the first model is also more accurate.
  • the training device inputs the first information and the second information into the first model to obtain prediction information generated by the first model, where the prediction information includes predicted position information of vehicles around the vehicle within the first time.
  • the training device inputs the acquired first information and second information into the first model to obtain prediction information generated by the first model.
  • the prediction information includes the predicted trajectory information of vehicles around the vehicle within the first time and third information
  • the third information includes the correlation between the target vehicle around the vehicle and at least one lane around the vehicle within the first time.
  • the first model includes an encoder and a decoder
  • the decoder includes a first decoder module and a second decoder module, wherein after the training device inputs the first information and the second information into the encoder of the first model, it will output the third information through the first decoder module, and output the predicted trajectory information of the vehicles around the vehicle within the first time through the second decoder module.
  • the specific implementation method of the training device executing step 1302 can refer to the description of the specific implementation method of step 402 in the embodiment corresponding to Figure 4, which will not be repeated here.
  • the training device trains the first model according to a loss function, where the loss function indicates the similarity between the predicted information and correct information, and the correct information includes correct position information of vehicles around the vehicle within the first time.
  • the training device is pre-configured with training data, and the training data includes expected results corresponding to the information of vehicles and lanes around the vehicle. After obtaining the prediction results corresponding to the information of vehicles and lanes around the vehicle, the training device can calculate the function value of the target loss function according to the prediction results and the expected results, and update the parameter value of the model to be trained according to the function value of the target loss function and the back propagation algorithm to complete one training of the model to be trained.
  • the "model to be trained” can also be understood as the “target model to be trained”.
  • the meaning represented by the “expected result corresponding to the information and lane information of the vehicles around the vehicle” is similar to the meaning of the “prediction result corresponding to the information and lane information of the vehicles around the vehicle", the difference is that the "prediction result corresponding to the information and lane information of the vehicles around the vehicle” is the prediction result generated by the model to be trained, and the "expected result corresponding to the information and lane information of the vehicles around the vehicle” is the correct result corresponding to the information and lane information of the vehicles around the vehicle.
  • the prediction result is used to indicate the expected position of at least one object in the target environment
  • the expected result is used to indicate the expected position (also referred to as the correct position) of at least one object in the target environment.
  • the training device can repeat steps 1001 to 1003 multiple times to achieve iterative training of the model to be trained until the preset conditions are met and the trained model to be trained is obtained, wherein the preset conditions can be the convergence conditions for reaching the target loss function, or the number of iterations of steps 1001 to 1003 reaches a preset number.
  • the training device collects the data set, obtains the required original data set and its corresponding category labels, and divides the training set, validation set, and test set into proportional quantities, which are used for subsequent training, validation, and evaluation of the model.
  • the training device builds the first model based on the attention mechanism.
  • the training device inputs the data of the training set into the first model, trains the first model using the first loss function and the second loss function, updates the recognition model through the back propagation algorithm, and uses the data of the verification set to screen out the optimal first model.
  • the loss function used by the training device includes a first loss function and a second loss function.
  • the prediction information output by the first model includes the predicted trajectory information of the vehicles around the vehicle within the first time and the correlation between the target vehicle around the vehicle and at least one lane around the vehicle within the first time
  • the loss value between the predicted trajectory information of the vehicles around the vehicle within the first time output by the first model and the correct trajectory information is calculated by the first loss function
  • the loss value between the correlation between at least one lane around the vehicle within the first time and the correct information is calculated by the second loss function.
  • the formula of the first loss function is specifically:
  • l n represents the loss value between the predicted coordinates and the real coordinates of the target vehicle corresponding to the nth sample
  • x n represents the vector data of the predicted coordinates of the target vehicle corresponding to the nth sample
  • y n represents the vector data of the real coordinates of the target vehicle corresponding to the nth sample
  • beta represents the error threshold.
  • the coordinates of the target vehicle in the next 3 seconds are collected every 0.2 seconds, and the position coordinates of 15 points can be obtained, and each sampling point corresponds to one sample.
  • l n is the loss value loss corresponding to the nth sample
  • x n is the predicted lane where the nth sample is located in the first time
  • y n represents the actual lane where the nth sample is located in the first time
  • sample represents the vehicles around the vehicle
  • w represents the weight.
  • the third information output by the first model includes the correlation between the target vehicle around the own vehicle and at least one lane around the own vehicle within the first time, it is not convenient to obtain the true value of the attention score during the actual model training process. Therefore, based on the third information output by the first model, the lane with the highest correlation with the target vehicle within the first time among the at least one lane around the own vehicle can be used as the lane where the target vehicle is located within the first time, and the predicted lane information of the target lane can be compared with the actual lane information to train the model.
  • the training device trains the first model according to the loss function
  • the training device trains the first model using the training set, verifies it on the validation set, and saves the network model parameters that perform best on the validation set.
  • the specific process of the training device training the first model according to the loss function is:
  • the first loss function is used to train the first model, and after the training is completed, the model with the smallest first loss value is saved.
  • the model obtained in (1) is trained using the second loss function, and after the training is completed, the model with the smallest second loss value is saved.
  • the training device can update the parameters of the first model through the error back propagation algorithm.
  • the training device can correct the size of the parameters in the initial denoising classification network during the training of the first model through the error back propagation algorithm, so that the error loss becomes smaller and smaller.
  • the forward transmission of the input signal until the output will generate error loss, and the parameters in the initial first model are updated by back propagating the error loss information, so that the error loss converges.
  • the back propagation algorithm is a back propagation movement dominated by error loss, which aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • the training device uses the data of the test set to test the prediction performance of the first model and obtain the final model recognition accuracy.
  • the model recognition accuracy reaches the set threshold
  • the data to be predicted is input into the first model for recognition; otherwise, it returns to the third step until the model recognition accuracy reaches the set threshold.
  • the accuracy reaches the set threshold.
  • the prediction information output by the first model includes two parts, namely the predicted trajectory information of the target vehicle and the attention score of the target vehicle relative to the lane, the accuracy evaluation is performed on these two parts respectively.
  • the formula for evaluating the accuracy of the predicted trajectory information is specifically:
  • FDE Fluorescence error
  • MR Magnetic Distance Rate
  • N the batch-size, which corresponds to 256 in Table 1
  • n the number of points in each trajectory
  • x and y are the horizontal and vertical coordinates of the point, respectively.
  • dist_threshold refers to the tolerance distance, which can be set to 1.5 meters; valid_num represents the number of valid data collected.
  • the accuracy of the intersection lane intention is evaluated by the following formula:
  • Acc exit represents the accuracy of the exit lane.
  • the lane where the target vehicle leaves the intersection scene is called the exit lane.
  • exit_lane_right_num represents the number of real lanes where the target vehicle will be located in the first time in the future and the predicted lanes.
  • the real lane where the target vehicle will be located in the first time in the future can be obtained from the intersection label.
  • valid_exit_lane_num represents the number of valid data collected.
  • the vehicle has two behaviors during driving: changing lanes or not changing lanes.
  • Acc cutin represents the accuracy of the target vehicle changing lanes in the first time
  • cutin_right_num represents that the target vehicle actually changes lanes in the first time in the future
  • the vehicle lane change is predicted by the first model
  • valid_cutin_num represents the number of valid data collected.
  • Acc keep indicates the accuracy of the target vehicle not changing lanes in the first time, that is, the accuracy of lane change false alarms.
  • keep_right_num indicates the number of real lanes where the target vehicle is located in the first time in the future that are consistent with the predicted lanes. The real lane where the target vehicle is located in the first time in the future can be obtained from the non-intersection label.
  • valid_keep_num indicates the number of valid data collected.
  • FIG. 11 is a structural schematic diagram of a vehicle position acquisition device provided in the embodiment of the present application.
  • the vehicle position acquisition device 1100 may include:
  • the acquisition module 1101 is used to acquire first information and second information, where the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle.
  • the position prediction module 1102 is used to input the first information and the second information into the first model to obtain prediction information generated by the first model, where the prediction information includes the predicted position information of vehicles around the vehicle within the first time.
  • the specific description of the acquisition module 1101 and the position prediction module 1102 can refer to the description of step 401 to step 402 in the above embodiment, and will not be repeated here.
  • the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
  • the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the vehicle position acquisition device 1100 further includes:
  • the lane determination module is used to determine the first lane as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.
  • the first model is constructed based on the attention mechanism, and the position prediction module 1102 is specifically used to:
  • the categories of the road scene include intersection scenes and non-intersection scenes
  • a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
  • the predicted trajectory information of the vehicles around the vehicle within the first time is generated.
  • the location prediction module 1102 is specifically used for:
  • the normalized fifth information is input into the multi-layer perceptron to obtain the third information.
  • the location prediction module 1102 is specifically configured to:
  • a normalization operation is performed on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.
  • the location prediction module 1102 is specifically used for:
  • the sixth information is input into the multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle in the first time.
  • the present application embodiment further provides a model training device.
  • FIG. 12 is a schematic diagram of a structure of the model training device provided in the present application embodiment.
  • the model training device 1200 may include:
  • the acquisition module 1201 is used to acquire first information and second information, where the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle.
  • the position prediction module 1202 is used to input the first information and the second information into the first model to obtain prediction information generated by the first model, where the prediction information includes the predicted position information of vehicles around the vehicle within the first time.
  • the model training module 1203 is used to train the first model according to the loss function, where the loss function indicates the similarity between the predicted information and the correct information, and the correct information includes the correct position information of the vehicles around the vehicle within the first time.
  • the specific description of the acquisition module 1201, the position prediction module 1202 and the model training module 1203 can refer to the description of steps 1001 to 1003 in the above embodiment, and will not be repeated here.
  • the prediction information includes predicted trajectory information of vehicles around the vehicle within the first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
  • the third information includes the relationship between the target vehicle around the vehicle and at least one lane around the vehicle within the first time.
  • the target vehicle is a vehicle around the vehicle
  • the model training device 1200 also includes:
  • the lane determination module is used to determine the first lane as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.
  • the third information output by the first model includes the correlation between the target vehicle around the own vehicle and at least one lane around the own vehicle within the first time, it is not convenient to obtain the correct value of the attention score during the actual model training process. Therefore, based on the third information output by the first model, the lane with the highest correlation with the target vehicle within the first time among the at least one lane around the own vehicle can be used as the lane where the target vehicle is located within the first time, and the predicted lane information of the target lane is compared with the actual lane information to train the model.
  • the lane in which the target vehicle is located in the first time can be determined by the lane determination module first, and then the first model can be trained through the model training module 1203; or, in one implementation, during the actual operation, if the correlation between the target vehicle around the vehicle and at least one lane around the vehicle can be correctly measured within the first time, the first model can be trained directly according to the error between the correlations, and there is no need to execute the lane determination module.
  • whether the training device needs to use the lane determination module can be set according to actual needs and is not limited here.
  • the first model is constructed based on the attention mechanism, and the position prediction module 1202 is specifically used to:
  • the categories of the road scene include intersection scenes and non-intersection scenes
  • a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
  • the predicted trajectory information of the vehicles around the vehicle within the first time is generated.
  • the location prediction module 1202 is specifically used for:
  • the normalized fifth information is input into the multi-layer perceptron to obtain the third information.
  • the location prediction module 1202 is specifically used for:
  • a normalization operation is performed on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.
  • the location prediction module 1202 is specifically used for:
  • the sixth information is input into the multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle in the first time.
  • FIG. 13 is a structural schematic diagram of an execution device provided in an embodiment of the present application.
  • the execution device 1300 can be specifically manifested as a vehicle, a mobile robot, a monitoring data processing device or other equipment, etc., which is not limited here.
  • the execution device 1300 includes: a receiver 1301, a transmitter 1302, a processor 1303 and a memory 1304 (wherein the number of processors 1303 in the execution device 1300 can be one or more, and one processor is taken as an example in Figure 13), wherein the processor 1303 may include an application processor 13031 and a communication processor 13032.
  • the receiver 1301, the transmitter 1302, the processor 1303 and the memory 1304 may be connected via a bus or other means.
  • the memory 1304 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1303. A portion of the memory 1304 may also include a non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1304 stores processor and operation instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein:
  • the operation instructions may include various operation instructions for implementing various operations.
  • the processor 1303 controls the operation of the execution device.
  • the various components of the execution device are coupled together through a bus system, wherein the bus system includes not only a data bus but also a power bus, a control bus, and a status signal bus, etc.
  • the bus system includes not only a data bus but also a power bus, a control bus, and a status signal bus, etc.
  • various buses are referred to as bus systems in the figure.
  • the method disclosed in the above embodiment of the present application can be applied to the processor 1303, or implemented by the processor 1303.
  • the processor 1303 can be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by the hardware integrated logic circuit in the processor 1303 or the instruction in the form of software.
  • the above processor 1303 can be a general processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and can further include an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • the processor 1303 can implement or execute the various methods, steps and logic block diagrams disclosed in the embodiment of the present application.
  • the general processor can be a microprocessor or the processor can also be any conventional processor, etc.
  • the steps of the method disclosed in the embodiment of the present application can be directly embodied as a hardware decoding processor to be executed, or a combination of hardware and software modules in the decoding processor can be executed.
  • the software module may be located in a storage medium mature in the art, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, etc.
  • the storage medium is located in the memory 1304, and the processor 1303 reads the information in the memory 1304 and completes the steps of the above method in combination with its hardware.
  • the receiver 1301 can be used to receive input digital or character information and generate signal input related to the relevant settings and function control of the execution device.
  • the transmitter 1302 can be used to output digital or character information through the first interface; the transmitter 1302 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1302 can also include a display device such as a display screen.
  • the application processor 13031 in the processor 1303 is used to execute the vehicle position acquisition method executed by the execution device in the embodiments corresponding to Figures 4 to 9.
  • the specific manner in which the application processor 13031 executes the aforementioned steps is based on the same concept as the various method embodiments corresponding to Figures 4 to 9 in the present application, and the technical effects brought about are the same as the various method embodiments corresponding to Figures 4 to 9 in the present application.
  • the embodiment of the present application also provides a training device, please refer to Figure 14, which is a structural diagram of a training device provided by the embodiment of the present application.
  • the training device 1400 is implemented by one or more servers, and the training device 1400 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (CPU) 1422 (for example, one or more processors) and a memory 1432, and one or more storage media 1430 (for example, one or more mass storage devices) storing application programs 1442 or data 1444.
  • the memory 1432 and the storage medium 1430 can be short-term storage or permanent storage.
  • the program stored in the storage medium 1430 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the training device. Furthermore, the central processor 1422 can be configured to communicate with the storage medium 1430 to execute a series of instruction operations in the storage medium 1430 on the training device 1400.
  • the training device 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input and output interfaces 1458, and/or, one or more operating systems 1441, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • operating systems 1441 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the central processor 1422 is used to execute the vehicle position acquisition method executed by the training device in the embodiment corresponding to Figure 10. It should be noted that the specific manner in which the central processor 1422 executes the aforementioned steps is based on the same concept as the various method embodiments corresponding to Figure 10 in the present application, and the technical effects brought about are the same as the various method embodiments corresponding to Figure 10 in the present application. For specific contents, please refer to the description in the method embodiments shown in the previous embodiment of the present application, and no further description will be given here.
  • Also provided in an embodiment of the present application is a computer program product, which, when executed on a computer, enables the computer to execute the steps executed by the execution device in the method described in the embodiments shown in Figures 4 to 9 above, or enables the computer to execute the steps executed by the training device in the method described in the embodiment shown in Figure 10 above.
  • a computer-readable storage medium is also provided in an embodiment of the present application, which stores a program for signal processing.
  • the computer executes the steps executed by the execution device in the method described in the embodiments shown in Figures 4 to 9 above, or the computer executes the steps executed by the training device in the method described in the embodiment shown in Figure 10 above.
  • the vehicle position acquisition device, model training device, execution device and training device provided in the embodiments of the present application can be specifically a chip, and the chip includes: a processing unit and a communication unit, the processing unit can be, for example, a processor, and the communication unit can be, for example, an input/output interface, a pin or a circuit, etc.
  • the processing unit can execute the computer execution instructions stored in the storage unit, so that the chip executes the vehicle position acquisition method described in the embodiments shown in Figures 4 to 9 above, or so that the chip executes the model training method described in the embodiment shown in Figure 10 above.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit can also be a storage unit located outside the chip in the wireless access device end, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.
  • ROM read-only memory
  • RAM random access memory
  • FIG. 15 is a schematic diagram of a structure of a chip provided in an embodiment of the present application, wherein the chip may be a neural network processor NPU 150, which is mounted on the host CPU (Host CPU) as a coprocessor and is assigned tasks by the Host CPU.
  • the core part of the NPU is the operation circuit 1503, which is controlled by the controller 1504 to extract matrix data from the memory and perform multiplication operations.
  • the operation circuit 1503 includes multiple processing units (Process Engine, PE) inside.
  • the operation circuit 1503 is a two-dimensional systolic array.
  • the operation circuit 1503 can also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
  • the operation circuit 1503 is a general-purpose matrix processor.
  • the operation circuit takes the corresponding data of matrix B from the weight memory 1502 and caches it on each PE in the operation circuit.
  • the operation circuit takes the matrix A data from the input memory 1501 and performs matrix operation with matrix B, and the partial result or final result of the matrix is stored in the accumulator 1508.
  • Unified memory 1506 is used to store input data and output data. Weight data is directly transferred to weight memory 1502 through Direct Memory Access Controller (DMAC) 1505. Input data is also transferred to unified memory 1506 through DMAC.
  • DMAC Direct Memory Access Controller
  • BIU stands for Bus Interface Unit, that is, the bus interface unit 1510, which is used for the interaction between AXI bus and DMAC and instruction fetch buffer (IFB) 1509.
  • IOB instruction fetch buffer
  • the bus interface unit 1510 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1509 to obtain instructions from the external memory, and is also used for the storage unit access controller 1505 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • BIU Bus Interface Unit
  • DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1506 or to transfer weight data to the weight memory 1502 or to transfer input data to the input memory 1501.
  • the vector calculation unit 1507 includes multiple operation processing units, which further process the output of the operation circuit when necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as Batch Normalization, pixel-level summation, upsampling of feature planes, etc.
  • the vector calculation unit 1507 can store the processed output vector to the unified memory 1506.
  • the vector calculation unit 1507 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1503, such as linear interpolation of the feature plane extracted by the convolution layer, and then, for example, a vector of accumulated values to generate an activation value.
  • the vector calculation unit 1507 generates a normalized value, a pixel-level summed value, or both.
  • the processed output vector can be used as an activation input to the operation circuit 1503, for example, for use in a subsequent layer in a neural network.
  • An instruction fetch buffer 1509 connected to the controller 1504 is used to store instructions used by the controller 1504;
  • Unified memory 1506, input memory 1501, weight memory 1502 and instruction fetch memory 1509 are all on-chip memories. External memories are private to the NPU hardware architecture.
  • each layer in the target model shown in Figures 4 to 10 can be performed by the operation circuit 1503 or the vector calculation unit 1507.
  • the processor mentioned in any of the above places may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the above-mentioned first aspect method.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed over multiple network units. Some or all of the modules may be selected to implement the present invention according to actual needs.
  • the connection relationship between modules indicates that there is a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines.
  • the technical solution of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, a U disk, a mobile hard disk, a ROM, a RAM, a disk or an optical disk, etc., including a number of instructions to enable a computer device (which can be a personal computer, a training device, or a network device, etc.) to execute the methods described in each embodiment of the present application.
  • a computer device which can be a personal computer, a training device, or a network device, etc.
  • all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof.
  • all or part of the embodiments may be implemented in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website site, a computer, a training device, or a data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, training device, or data center.
  • the computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device, a data center, etc. that includes one or more available media integrations.
  • the available medium may be a magnetic medium, (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)), etc.
  • a magnetic medium e.g., a floppy disk, a hard disk, a tape
  • an optical medium e.g., a DVD
  • a semiconductor medium e.g., a solid-state drive (SSD)

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Traffic Control Systems (AREA)

Abstract

A vehicle position acquiring method, a model training method, and a related device. The method comprises: acquiring first information and second information, the first information comprising information on a vehicle in the vicinity of an ego vehicle, and the second information comprising information on a lane in the vicinity of the ego vehicle; and inputting the first information and the second information into a first model to obtain predicted information generated by the first model, the predicted information comprising predicted position information of the vehicle in the vicinity of the ego vehicle within a first time. By taking into account information on a lane in the vicinity of an ego vehicle and associating predicted position information of a vehicle in the vicinity of the ego vehicle with a lane, the accuracy of the prediction result is improved.

Description

车辆的位置获取方法、模型的训练方法以及相关设备Vehicle position acquisition method, model training method and related equipment
本申请要求于2022年10月31日提交中国专利局、申请号为202211350093.1、发明名称为“车辆的位置获取方法、模型的训练方法以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on October 31, 2022, with application number 202211350093.1 and invention name “Vehicle position acquisition method, model training method and related equipment”, the entire contents of which are incorporated by reference in this application.
技术领域Technical Field
本申请涉及人工智能领域,尤其涉及一种车辆的位置获取方法、模型的训练方法以及相关设备。The present application relates to the field of artificial intelligence, and in particular to a vehicle position acquisition method, a model training method, and related equipment.
背景技术Background technique
自动驾驶车辆在道路中行驶,需要考虑周围车辆的行驶轨迹。当周围车辆的行驶意图发生变化时,自动驾驶车辆需要做出对应的反应,避免与周围车辆发生碰撞事故。因此,准确的他车行驶意图预测对于自动驾驶车辆十分重要。When an autonomous vehicle is driving on the road, it needs to consider the driving trajectories of surrounding vehicles. When the driving intentions of surrounding vehicles change, the autonomous vehicle needs to respond accordingly to avoid collisions with surrounding vehicles. Therefore, accurate prediction of other vehicles’ driving intentions is very important for autonomous vehicles.
目前,主要采用卡尔曼滤波等方法来预测车辆的位置,但是,该方法仅依赖于的车辆的历史轨迹,并没有考虑环境中的车道信息。At present, methods such as Kalman filtering are mainly used to predict the position of the vehicle. However, this method only relies on the historical trajectory of the vehicle and does not consider the lane information in the environment.
发明内容Summary of the invention
本申请提供了一种车辆的位置获取方法、模型的训练方法以及相关设备,能够对自车周围车辆的位置进行预测。The present application provides a vehicle position acquisition method, a model training method and related equipment, which can predict the positions of vehicles around the vehicle.
第一方面,本申请提供了一种车辆的位置获取方法,可用于人工智能领域中。方法包括:In a first aspect, the present application provides a method for obtaining the position of a vehicle, which can be used in the field of artificial intelligence. The method includes:
首先,获取第一信息和第二信息,其中,第一信息包括自车周围的车辆的信息,第二信息包括自车周围的车道的信息;随后,将第一信息和第二信息输入第一模型中,得到第一模型生成的预测信息,其中,预测信息包括自车周围的车辆在第一时间内的预测位置信息。First, first information and second information are obtained, wherein the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle; then, the first information and the second information are input into the first model to obtain prediction information generated by the first model, wherein the prediction information includes predicted position information of vehicles around the vehicle within the first time.
现实场景中,自车与其周围的车辆组成一个相互依赖的整体,各自的行为影响着彼此的决策,而以往的研究往往仅依靠于自车的历史轨迹来预测车辆的未来轨迹,预测结果显然是不准确的。在本申请中,通过结合自车周围的车道信息,将自车周围的车辆的预测位置信息与车道相关联,从而进一步提高了预测结果的准确率,为自动驾驶车辆的决策规划提供了依据,也提升了自动驾驶车辆的乘车体验。In real-life scenarios, the ego vehicle and the vehicles around it form an interdependent whole, and their respective behaviors affect each other's decisions. Previous studies often rely solely on the historical trajectory of the ego vehicle to predict the future trajectory of the vehicle, and the prediction results are obviously inaccurate. In this application, by combining the lane information around the ego vehicle, the predicted position information of the vehicles around the ego vehicle is associated with the lane, thereby further improving the accuracy of the prediction results, providing a basis for the decision-making planning of the autonomous driving vehicle, and also improving the riding experience of the autonomous driving vehicle.
在第一方面的一种可能的实现方式中,预测信息包括自车周围的车辆在第一时间内的预测轨迹信息和第三信息,第三信息指示自车周围的车辆在第一时间内所在的车道。In a possible implementation manner of the first aspect, the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
该种可能的实现方式中,在考虑自车周围的车道信息的基础上,通过输出自车周围的车辆所在的车道的信息,将车辆未来行驶意图与车道绑定,有效利用了自车周围的车辆与车道之间的关系,提高了车辆位置的预测准确率。In this possible implementation, based on the consideration of the lane information around the vehicle, the vehicle's future driving intention is bound to the lane by outputting the information of the lanes where the vehicles around the vehicle are located, which effectively utilizes the relationship between the vehicles and lanes around the vehicle and improves the prediction accuracy of the vehicle's position.
在第一方面的一种可能的实现方式中,第三信息包括自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关联度,目标车辆为自车周围的一个车辆,方法还包括:In a possible implementation manner of the first aspect, the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the method further includes:
将第一车道确定为目标车辆在第一时间内所在的车道,其中,第一车道为自车周围的至少一个车道中与目标车辆在第一时间内的关联度最高的一个车道。The first lane is determined as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.
该种可能的实现方式中,将车辆未来的行驶位置与车道绑定,在获取自车周围的车道信息的基础上,通过输出目标车辆与自车周围的车道的关联度,给出目标车辆未来行驶在每一条车道的概率,并将关联度最高的车道确定为目标车辆在第一时间内所在的车道,从而提高预测结果的准确率。In this possible implementation, the future driving position of the vehicle is bound to the lane. Based on the lane information around the vehicle, the correlation between the target vehicle and the lanes around the vehicle is output to give the probability of the target vehicle traveling in each lane in the future. The lane with the highest correlation is determined as the lane where the target vehicle is located in the first time, thereby improving the accuracy of the prediction results.
在第一方面的一种可能的实现方式中,第一模型基于注意力机制构建,将第一信息和第二信息输入第一模型中,得到第一模型生成的预测信息,包括:将第一信息和第二信息输入第一模型中,基于注意力机制,生成第四信息,第四信息包括自车周围的目标车辆与第一车道集合在第一时间内的关联度,目标车辆为自车周围的一个车辆,第一车道集合包括第二信息中包括的自车周围的所有车道;In a possible implementation manner of the first aspect, the first model is constructed based on an attention mechanism, and the first information and the second information are input into the first model to obtain prediction information generated by the first model, including: inputting the first information and the second information into the first model, and generating fourth information based on the attention mechanism, the fourth information including a correlation between a target vehicle around the ego vehicle and the first lane set within a first time, the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;
获取目标车辆所属的道路场景的类别,道路场景的类别包括路口场景和非路口场景;Obtaining the category of the road scene to which the target vehicle belongs, the categories of the road scene include intersection scenes and non-intersection scenes;
根据目标车辆所属的道路场景的类别,从第一车道集合中选取第二车道集合,第二车道集合包括自车周围的车辆在第一时间内所在的车道;According to the category of the road scene to which the target vehicle belongs, a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
从第四信息中获取第五信息,并根据第五信息生成第三信息,第五信息包括目标车辆与第二车道集 合在第一时间内的关联度;The fifth information is obtained from the fourth information, and the third information is generated according to the fifth information, wherein the fifth information includes the target vehicle and the second lane set. The relevance of the combination in the first place;
根据第二信息和第四信息,生成自车周围的车辆在第一时间内的预测轨迹信息。Based on the second information and the fourth information, the predicted trajectory information of the vehicles around the vehicle within the first time is generated.
该种可能的实现方式中,通过将自车周围的车辆的信息和车道的信息输入第一模型中,基于注意力机制,得到目标车辆与第一车道集合在第一时间内的关联度,一方面,通过将车辆未来行驶意图与车道绑定,使得预测意图更加稳定;另一方面,通过筛选目标车辆所属的道路场景,根据车辆所处的道路场景选择需要保留的车道,从而进一步提高预测的准确率。In this possible implementation, by inputting the information of vehicles and lanes around the vehicle into the first model, based on the attention mechanism, the correlation between the target vehicle and the first lane set in the first time is obtained. On the one hand, by binding the vehicle's future driving intention with the lane, the predicted intention is made more stable; on the other hand, by screening the road scene to which the target vehicle belongs, the lane to be retained is selected according to the road scene in which the vehicle is located, thereby further improving the accuracy of the prediction.
在第一方面的一种可能的实现方式中,从第四信息中获取第五信息,并根据第五信息生成第三信息,包括:In a possible implementation manner of the first aspect, acquiring fifth information from fourth information, and generating third information according to the fifth information includes:
从第四信息中获取第五信息,并对第五信息进行归一化操作,得到归一化后的第五信息;Acquire fifth information from the fourth information, and perform a normalization operation on the fifth information to obtain normalized fifth information;
将归一化后的第五信息输入多层感知机中,得到第三信息。The normalized fifth information is input into the multi-layer perceptron to obtain the third information.
该种可能的实现方式中,第四信息包括自车周围的目标车辆与第一车道集合在第一时间内的关联度,即目标车辆相对于第一车道集合内的每一条车道的注意力分数,根据目标车辆所属的道路场景,可以判断出目标车辆所属的第二车道集合,并根据第二车道集合从第四信息中筛选出对应的第五信息,从而能够根据目标车辆所属的具体道路场景,针对性地进行车道预测,以进一步提高预测的准确性。另外,通过对第五信息中包含每个元素进行归一化操作,使得所有元素之和为1,然后输入到多层感知机中,在多层感知机的作用下,输出目标车辆与自车周围的车道在第一时间内的关联度。In this possible implementation, the fourth information includes the correlation between the target vehicle around the vehicle and the first lane set in the first time, that is, the attention score of the target vehicle relative to each lane in the first lane set. According to the road scene to which the target vehicle belongs, the second lane set to which the target vehicle belongs can be determined, and the corresponding fifth information can be screened out from the fourth information according to the second lane set, so that the lane prediction can be carried out in a targeted manner according to the specific road scene to which the target vehicle belongs, so as to further improve the accuracy of the prediction. In addition, by normalizing each element contained in the fifth information so that the sum of all elements is 1, and then inputting it into the multi-layer perceptron, under the action of the multi-layer perceptron, the correlation between the target vehicle and the lanes around the vehicle in the first time is output.
在第一方面的一种可能的实现方式中,根据第一信息和第二信息,生成第四信息,包括:In a possible implementation manner of the first aspect, generating fourth information according to the first information and the second information includes:
分别对第一信息和第二信息进行向量化处理和线性映射,得到第一线性矩阵和第二线性矩阵;Performing vectorization processing and linear mapping on the first information and the second information respectively to obtain a first linear matrix and a second linear matrix;
对第一线性矩阵和第二线性矩阵的矩阵乘积执行归一化操作,得到第四信息。A normalization operation is performed on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.
该种可能的实现方式中,能够基于注意力机制,对第一信息和第二信息进行数据融合,得到目标车辆相对于每一条车道的注意力分数。In this possible implementation, the first information and the second information can be fused based on the attention mechanism to obtain an attention score of the target vehicle relative to each lane.
在第一方面的一种可能的实现方式中,根据第二信息和第四信息,生成自车周围的车辆在第一时间内的预测轨迹信息,包括:In a possible implementation manner of the first aspect, generating predicted trajectory information of vehicles around the vehicle within a first time according to the second information and the fourth information includes:
对第二线性矩阵与第四信息执行矩阵乘运算,得到第六信息;Performing a matrix multiplication operation on the second linear matrix and the fourth information to obtain sixth information;
将第六信息输入多层感知机中,得到自车周围的车辆在第一时间内的预测轨迹信息。The sixth information is input into the multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle in the first time.
该种可能的实现方式中,能够基于注意力机制,将第二线性矩阵和第四信息的数据进行融合,并将得到的第三信息输入多层感知机中,在多层感知机的作用下,得到自车周围的车辆在第一时间内的预测轨迹信息。In this possible implementation, the data of the second linear matrix and the fourth information can be fused based on the attention mechanism, and the obtained third information is input into a multi-layer perceptron. Under the action of the multi-layer perceptron, the predicted trajectory information of the vehicles around the vehicle in the first time is obtained.
第二方面,本申请提供了一种模型的训练方法,可用于人工智能领域中。方法包括:In a second aspect, the present application provides a model training method that can be used in the field of artificial intelligence. The method includes:
首先,获取第一信息和第二信息,第一信息包括自车周围的车辆的信息,第二信息包括自车周围的车道的信息;随后,将第一信息和第二信息输入第一模型中,得到第一模型生成的预测信息,预测信息包括自车周围的车辆在第一时间内的预测位置信息;最后,根据损失函数对第一模型进行训练,损失函数指示预测信息和正确信息之间的相似度,正确信息包括自车周围的车辆在第一时间内的正确的位置信息。First, first information and second information are obtained, the first information including information about vehicles around the vehicle, and the second information including information about lanes around the vehicle; then, the first information and the second information are input into the first model to obtain prediction information generated by the first model, the prediction information including predicted position information of vehicles around the vehicle within the first time; finally, the first model is trained according to the loss function, the loss function indicates the similarity between the prediction information and the correct information, and the correct information includes the correct position information of vehicles around the vehicle within the first time.
本申请中,在对第一模型进行训练时,所使用的训练样本包括自车周围的车辆的完整信息和自车周围的车道的完整信息,从而使第一模型输出的位置信息也更准确。可以理解的是,该第一模型可以用于执行前述第一方面或第一方面的可选实施方式中的步骤。In the present application, when training the first model, the training samples used include complete information about vehicles around the vehicle and complete information about lanes around the vehicle, so that the position information output by the first model is more accurate. It is understandable that the first model can be used to perform the steps in the aforementioned first aspect or the optional implementation of the first aspect.
在第二方面的一种可能的实现方式中,预测信息包括自车周围的车辆在第一时间内的预测轨迹信息和第三信息,第三信息指示自车周围的车辆在第一时间内所在的车道。In a possible implementation manner of the second aspect, the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
在第二方面的一种可能的实现方式中,第三信息包括自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关联度,目标车辆为自车周围的一个车辆,方法还包括:In a possible implementation manner of the second aspect, the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the method further includes:
将第一车道确定为目标车辆在第一时间内所在的车道,其中,第一车道为自车周围的至少一个车道中与目标车辆在第一时间内的关联度最高的一个车道。The first lane is determined as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.
在第二方面的一种可能的实现方式中,第一模型基于注意力机制构建,将第一信息和第二信息输入第一模型中,得到第一模型生成的预测信息,包括: In a possible implementation manner of the second aspect, the first model is constructed based on an attention mechanism, and the first information and the second information are input into the first model to obtain prediction information generated by the first model, including:
将第一信息和第二信息输入第一模型中,基于注意力机制,生成第四信息,第四信息包括自车周围的目标车辆与第一车道集合在第一时间内的关联度,目标车辆为自车周围的一个车辆,第一车道集合包括第二信息中包括的自车周围的所有车道;Input the first information and the second information into the first model, and generate fourth information based on the attention mechanism, wherein the fourth information includes the correlation between the target vehicle around the ego vehicle and the first lane set within the first time, the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;
获取目标车辆所属的道路场景的类别,道路场景的类别包括路口场景和非路口场景;Obtaining the category of the road scene to which the target vehicle belongs, the categories of the road scene include intersection scenes and non-intersection scenes;
根据目标车辆所属的道路场景的类别,从第一车道集合中选取第二车道集合,第二车道集合包括自车周围的车辆在第一时间内所在的车道;According to the category of the road scene to which the target vehicle belongs, a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
从第四信息中获取第五信息,并根据第五信息生成第三信息,第五信息包括目标车辆与第二车道集合在第一时间内的关联度;Acquire fifth information from the fourth information, and generate third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;
根据第二信息和第四信息,生成自车周围的车辆在第一时间内的预测轨迹信息。Based on the second information and the fourth information, the predicted trajectory information of the vehicles around the vehicle within the first time is generated.
在第二方面的一种可能的实现方式中,从第四信息中获取第五信息,并根据第五信息生成第三信息,包括:In a possible implementation manner of the second aspect, acquiring fifth information from fourth information, and generating third information according to the fifth information includes:
从第四信息中获取第五信息,并对第五信息进行归一化操作,得到归一化后的第五信息;Acquire fifth information from the fourth information, and perform a normalization operation on the fifth information to obtain normalized fifth information;
将归一化后的第五信息输入多层感知机中,得到第三信息。The normalized fifth information is input into the multi-layer perceptron to obtain the third information.
在第二方面的一种可能的实现方式中,根据第一信息和第二信息,生成第四信息,包括:In a possible implementation manner of the second aspect, generating fourth information according to the first information and the second information includes:
分别对第一信息和第二信息进行向量化处理和线性映射,得到第一线性矩阵和第二线性矩阵;Performing vectorization processing and linear mapping on the first information and the second information respectively to obtain a first linear matrix and a second linear matrix;
对第一线性矩阵和第二线性矩阵的矩阵乘积执行归一化操作,得到第四信息。A normalization operation is performed on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.
在第二方面的一种可能的实现方式中,根据第二信息和第四信息,生成自车周围的车辆在第一时间内的预测轨迹信息,包括:In a possible implementation manner of the second aspect, generating predicted trajectory information of vehicles around the vehicle within a first time according to the second information and the fourth information includes:
对第二线性矩阵与第四信息执行矩阵乘运算,得到第六信息;Performing a matrix multiplication operation on the second linear matrix and the fourth information to obtain sixth information;
将第六信息输入多层感知机中,得到自车周围的车辆在第一时间内的预测轨迹信息。The sixth information is input into the multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle in the first time.
对于本申请实施例第二方面以及第二方面的各种可能实现方式中名词的具体含义,均可以参考第一方面中各种可能的实现方式中的描述,此处不再一一赘述。For the specific meanings of the nouns in the second aspect of the embodiment of the present application and the various possible implementations of the second aspect, reference may be made to the descriptions in the various possible implementations of the first aspect, and will not be repeated here one by one.
第三方面,本申请提供了一种车辆的位置获取装置,可用于人工智能领域中。装置包括获取模块和位置预测模块。其中,获取模块,用于获取第一信息和第二信息,第一信息包括自车周围的车辆的信息,第二信息包括自车周围的车道的信息;位置预测模块,用于将第一信息和第二信息输入第一模型中,得到第一模型生成的预测信息,预测信息包括自车周围的车辆在第一时间内的预测位置信息。In a third aspect, the present application provides a vehicle position acquisition device that can be used in the field of artificial intelligence. The device includes an acquisition module and a position prediction module. The acquisition module is used to acquire first information and second information, the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle; the position prediction module is used to input the first information and the second information into a first model to obtain prediction information generated by the first model, and the prediction information includes predicted position information of vehicles around the vehicle within the first time.
在第三方面的一种可能的实现方式中,预测信息包括自车周围的车辆在第一时间内的预测轨迹信息和第三信息,第三信息指示自车周围的车辆在第一时间内所在的车道。In a possible implementation manner of the third aspect, the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
在第三方面的一种可能的实现方式中,第三信息包括自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关联度,目标车辆为自车周围的一个车辆,装置还包括:In a possible implementation manner of the third aspect, the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the device further includes:
车道确定模块,用于将第一车道确定为目标车辆在第一时间内所在的车道,其中,第一车道为自车周围的至少一个车道中与目标车辆在第一时间内的关联度最高的一个车道。The lane determination module is used to determine the first lane as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.
在第三方面的一种可能的实现方式中,第一模型基于注意力机制构建,位置预测模块具体用于:In a possible implementation manner of the third aspect, the first model is constructed based on the attention mechanism, and the position prediction module is specifically used for:
将第一信息和第二信息输入第一模型中,基于注意力机制,生成第四信息,第四信息包括自车周围的目标车辆与第一车道集合在第一时间内的关联度,目标车辆为自车周围的一个车辆,第一车道集合包括第二信息中包括的自车周围的所有车道;Input the first information and the second information into the first model, and generate fourth information based on the attention mechanism, wherein the fourth information includes the correlation between the target vehicle around the ego vehicle and the first lane set within the first time, the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;
获取目标车辆所属的道路场景的类别,道路场景的类别包括路口场景和非路口场景;Obtaining the category of the road scene to which the target vehicle belongs, the categories of the road scene include intersection scenes and non-intersection scenes;
根据目标车辆所属的道路场景的类别,从第一车道集合中选取第二车道集合,第二车道集合包括自车周围的车辆在第一时间内所在的车道;According to the category of the road scene to which the target vehicle belongs, a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
从第四信息中获取第五信息,并根据第五信息生成第三信息,第五信息包括目标车辆与第二车道集合在第一时间内的关联度;Acquire fifth information from the fourth information, and generate third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;
根据第二信息和第四信息,生成自车周围的车辆在第一时间内的预测轨迹信息。Based on the second information and the fourth information, the predicted trajectory information of the vehicles around the vehicle within the first time is generated.
在第三方面的一种可能的实现方式中,位置预测模块具体用于:In a possible implementation manner of the third aspect, the location prediction module is specifically used to:
从第四信息中获取第五信息,并对第五信息进行归一化操作,得到归一化后的第五信息;Acquire fifth information from the fourth information, and perform a normalization operation on the fifth information to obtain normalized fifth information;
将归一化后的第五信息输入多层感知机中,得到第三信息。 The normalized fifth information is input into the multi-layer perceptron to obtain the third information.
在第三方面的一种可能的实现方式中,位置预测模块具体用于:In a possible implementation manner of the third aspect, the location prediction module is specifically used to:
分别对第一信息和第二信息进行向量化处理和线性映射,得到第一线性矩阵和第二线性矩阵;Performing vectorization processing and linear mapping on the first information and the second information respectively to obtain a first linear matrix and a second linear matrix;
对第一线性矩阵和第二线性矩阵的矩阵乘积执行归一化操作,得到第四信息。A normalization operation is performed on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.
在第三方面的一种可能的实现方式中,位置预测模块具体用于:In a possible implementation manner of the third aspect, the location prediction module is specifically used to:
对第二线性矩阵与第四信息执行矩阵乘运算,得到第六信息;Performing a matrix multiplication operation on the second linear matrix and the fourth information to obtain sixth information;
将第六信息输入多层感知机中,得到自车周围的车辆在第一时间内的预测轨迹信息。The sixth information is input into the multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle in the first time.
本申请第三方面中,车辆的位置获取装置包括的各个模块还可以用于实现第一方面各种可能实现方式中的步骤,对于本申请实施例第三方面以及第三方面的各种可能实现方式中某些步骤的具体实现方式,以及每种可能实现方式所带来的有益效果,均可以参考第一方面中各种可能的实现方式中的描述,此处不再一一赘述。In the third aspect of the present application, the various modules included in the vehicle position acquisition device can also be used to implement the steps in the various possible implementation methods of the first aspect. For the specific implementation methods of the third aspect of the embodiment of the present application and certain steps in the various possible implementation methods of the third aspect, as well as the beneficial effects brought about by each possible implementation method, you can refer to the description of the various possible implementation methods in the first aspect, and will not be repeated here one by one.
第四方面,本申请提供了一种模型的训练装置,可用于人工智能领域中。装置包括获取模块、位置预测模块和模型训练模块。其中,获取模块,用于获取第一信息和第二信息,第一信息包括自车周围的车辆的信息,第二信息包括自车周围的车道的信息;位置预测模块,用于将第一信息和第二信息输入第一模型中,得到第一模型生成的预测信息,预测信息包括自车周围的车辆在第一时间内的预测位置信息;模型训练模块,用于根据损失函数对第一模型进行训练,损失函数指示预测信息和正确信息之间的相似度,正确信息包括自车周围的车辆在第一时间内的正确的位置信息。In a fourth aspect, the present application provides a model training device that can be used in the field of artificial intelligence. The device includes an acquisition module, a position prediction module, and a model training module. Among them, the acquisition module is used to acquire first information and second information, the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle; the position prediction module is used to input the first information and the second information into the first model to obtain the prediction information generated by the first model, and the prediction information includes the predicted position information of the vehicles around the vehicle within the first time; the model training module is used to train the first model according to the loss function, and the loss function indicates the similarity between the prediction information and the correct information, and the correct information includes the correct position information of the vehicles around the vehicle within the first time.
在第四方面的一种可能的实现方式中,预测信息包括自车周围的车辆在第一时间内的预测轨迹信息和第三信息,第三信息指示自车周围的车辆在第一时间内所在的车道。In a possible implementation manner of the fourth aspect, the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
在第四方面的一种可能的实现方式中,第三信息包括自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关联度,目标车辆为自车周围的一个车辆,装置还包括:In a possible implementation manner of the fourth aspect, the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the device further includes:
车道确定模块,用于将第一车道确定为目标车辆在第一时间内所在的车道,其中,第一车道为自车周围的至少一个车道中与目标车辆在第一时间内的关联度最高的一个车道。The lane determination module is used to determine the first lane as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.
在第四方面的一种可能的实现方式中,第一模型基于注意力机制构建,位置预测模块具体用于:In a possible implementation manner of the fourth aspect, the first model is constructed based on the attention mechanism, and the position prediction module is specifically used for:
将第一信息和第二信息输入第一模型中,基于注意力机制,生成第四信息,第四信息包括自车周围的目标车辆与第一车道集合在第一时间内的关联度,目标车辆为自车周围的一个车辆,第一车道集合包括第二信息中包括的自车周围的所有车道;Input the first information and the second information into the first model, and generate fourth information based on the attention mechanism, wherein the fourth information includes the correlation between the target vehicle around the ego vehicle and the first lane set within the first time, the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;
获取目标车辆所属的道路场景的类别,道路场景的类别包括路口场景和非路口场景;Obtaining the category of the road scene to which the target vehicle belongs, the categories of the road scene include intersection scenes and non-intersection scenes;
根据目标车辆所属的道路场景的类别,从第一车道集合中选取第二车道集合,第二车道集合包括自车周围的车辆在第一时间内所在的车道;According to the category of the road scene to which the target vehicle belongs, a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
从第四信息中获取第五信息,并根据第五信息生成第三信息,第五信息包括目标车辆与第二车道集合在第一时间内的关联度;Acquire fifth information from the fourth information, and generate third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;
根据第二信息和第四信息,生成自车周围的车辆在第一时间内的预测轨迹信息。Based on the second information and the fourth information, the predicted trajectory information of the vehicles around the vehicle within the first time is generated.
在第四方面的一种可能的实现方式中,位置预测模块具体用于:In a possible implementation manner of the fourth aspect, the location prediction module is specifically used to:
从第四信息中获取第五信息,并对第五信息进行归一化操作,得到归一化后的第五信息;Acquire fifth information from the fourth information, and perform a normalization operation on the fifth information to obtain normalized fifth information;
将归一化后的第五信息输入多层感知机中,得到第三信息。The normalized fifth information is input into the multi-layer perceptron to obtain the third information.
在第四方面的一种可能的实现方式中,位置预测模块具体用于:In a possible implementation manner of the fourth aspect, the location prediction module is specifically used to:
分别对第一信息和第二信息进行向量化处理和线性映射,得到第一线性矩阵和第二线性矩阵;Performing vectorization processing and linear mapping on the first information and the second information respectively to obtain a first linear matrix and a second linear matrix;
对第一线性矩阵和第二线性矩阵的矩阵乘积执行归一化操作,得到第四信息。A normalization operation is performed on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.
在第四方面的一种可能的实现方式中,位置预测模块具体用于:In a possible implementation manner of the fourth aspect, the location prediction module is specifically used to:
对第二线性矩阵与第四信息执行矩阵乘运算,得到第六信息;Performing a matrix multiplication operation on the second linear matrix and the fourth information to obtain sixth information;
将第六信息输入多层感知机中,得到自车周围的车辆在第一时间内的预测轨迹信息。The sixth information is input into the multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle in the first time.
本申请第四方面中,模型的训练装置包括的各个模块还可以用于实现第二方面各种可能实现方式中的步骤,对于本申请实施例第四方面以及第四方面的各种可能实现方式中某些步骤的具体实现方式,以及每种可能实现方式所带来的有益效果,均可以参考第二方面中各种可能的实现方式中的描述,此处不再一一赘述。 In the fourth aspect of the present application, the modules included in the training device of the model can also be used to implement the steps in the various possible implementation methods of the second aspect. For the specific implementation methods of the fourth aspect of the embodiment of the present application and certain steps in the various possible implementation methods of the fourth aspect, as well as the beneficial effects brought about by each possible implementation method, you can refer to the description of the various possible implementation methods in the second aspect, and will not be repeated here one by one.
第五方面,本申请实施例提供了一种执行设备,可以包括处理器,处理器和存储器耦合,存储器存储有程序指令,当存储器存储的程序指令被处理器执行时实现上述第一方面所述的车辆的位置获取方法。对于处理器执行第一方面的各个可能实现方式中执行设备执行的步骤,具体均可以参阅上述第一方面,此处不再赘述。In a fifth aspect, an embodiment of the present application provides an execution device, which may include a processor, the processor and a memory are coupled, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the vehicle position acquisition method described in the first aspect is implemented. For the steps executed by the execution device in each possible implementation method of the processor executing the first aspect, please refer to the first aspect above for details, and no further description is given here.
第六方面,本申请实施例提供了一种自动驾驶车辆,可以包括处理器,处理器和存储器耦合,存储器存储有程序指令,当存储器存储的程序指令被处理器执行时实现上述第一方面所述的车辆的位置获取方法。对于处理器执行第一方面的各个可能实现方式中执行设备执行的步骤,具体均可以参阅上述第一方面,此处不再赘述。In a sixth aspect, an embodiment of the present application provides an autonomous driving vehicle, which may include a processor, the processor and a memory are coupled, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the vehicle position acquisition method described in the first aspect is implemented. For the steps executed by the execution device in each possible implementation method of the processor executing the first aspect, the details can be referred to the first aspect above, and will not be repeated here.
第七方面,本申请实施例提供了一种训练设备,可以包括处理器,处理器和存储器耦合,存储器存储有程序指令,当存储器存储的程序指令被处理器执行时实现上述第二方面所述的模型的训练方法。对于处理器执行第二方面的各个可能实现方式中训练设备执行的步骤,具体均可以参阅第二方面,此处不再赘述。In a seventh aspect, an embodiment of the present application provides a training device, which may include a processor, the processor and a memory are coupled, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the training method of the model described in the second aspect is implemented. For the steps performed by the training device in each possible implementation method of the processor performing the second aspect, please refer to the second aspect for details, and no further description is given here.
第八方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上行驶时,使得计算机执行上述第一方面或第一方面的任一种可能实现方式所述的方法,或者,使得计算机执行上述第二方面或第二方面的任一种可能实现方式所述的方法。In an eighth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which, when running on a computer, enables the computer to execute the method described in the first aspect or any possible implementation of the first aspect, or enables the computer to execute the method described in the second aspect or any possible implementation of the second aspect.
第九方面,本申请实施例提供了一种电路系统,所述电路系统包括处理电路,所述处理电路配置为执行上述第一方面或第一方面的任一种可能实现方式所述的方法,或者,所述处理电路配置为执行上述第二方面或第二方面的任一种可能实现方式所述的方法。In the ninth aspect, an embodiment of the present application provides a circuit system, which includes a processing circuit, and the processing circuit is configured to execute the method described in the first aspect or any possible implementation of the first aspect, or the processing circuit is configured to execute the method described in the second aspect or any possible implementation of the second aspect.
第十方面,本申请实施例提供了一种计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面的任一种可能实现方式所述的方法,或者,使得计算机执行上述第二方面或第二方面的任一种可能实现方式所述的方法。In the tenth aspect, an embodiment of the present application provides a computer program product, which, when running on a computer, enables the computer to execute the method described in the first aspect or any possible implementation of the first aspect, or enables the computer to execute the method described in the second aspect or any possible implementation of the second aspect.
第十一方面,本申请提供了一种芯片系统,包括处理器和存储器,存储器用于存储计算机程序,处理器用于调用并运行存储器中存储的计算机程序,以执行如上述第一方面或第一方面的任一种可能实现方式所述的方法,或者,使得计算机执行上述第二方面或第二方面的任一种可能实现方式的方法。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。In the eleventh aspect, the present application provides a chip system, including a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method as described in the first aspect or any possible implementation of the first aspect, or to enable the computer to execute the method as described in the second aspect or any possible implementation of the second aspect. The chip system can be composed of a chip, or it can include a chip and other discrete devices.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1a为人工智能主体框架的一种结构示意图;FIG1a is a schematic diagram of a structure of an artificial intelligence main framework;
图1b为路况的一种结构示意图;FIG1b is a structural schematic diagram of a road condition;
图1c为本申请实施例提供的具有自动驾驶功能的自动驾驶装置的一种结构示意图;FIG1c is a schematic diagram of a structure of an automatic driving device with an automatic driving function provided in an embodiment of the present application;
图2a为本申请提供的一种系统架构示意图;FIG2a is a schematic diagram of a system architecture provided by the present application;
图2b为本申请提供的车辆的位置获取方法的一种流程示意图;FIG2b is a schematic diagram of a flow chart of a method for obtaining a vehicle position provided in the present application;
图3为本申请实施例提供的多层感知机的一种结构示意图;FIG3 is a schematic diagram of a structure of a multi-layer perceptron provided in an embodiment of the present application;
图4为本申请提供的车辆的位置获取方法的另一种流程示意图;FIG4 is another schematic diagram of a flow chart of a method for obtaining a vehicle position provided by the present application;
图5为本申请实施例提供的第一模型的一种结构示意图;FIG5 is a schematic diagram of a structure of a first model provided in an embodiment of the present application;
图6为本申请实施例提供的第一模型的另一种结构示意图;FIG6 is another schematic diagram of the structure of the first model provided in an embodiment of the present application;
图7为本申请实施例提供的第一嵌入模块的一种结构示意图;FIG7 is a schematic diagram of a structure of a first embedded module provided in an embodiment of the present application;
图8为本申请实施例提供的第二嵌入模块的一种结构示意图;FIG8 is a schematic diagram of a structure of a second embedded module provided in an embodiment of the present application;
图9为本申请实施例提供的第一解码器模块的一种结构示意图;FIG9 is a schematic diagram of a structure of a first decoder module provided in an embodiment of the present application;
图10为本申请实施例提供的模型的训练方法的一种流程示意图;FIG10 is a flow chart of a method for training a model provided in an embodiment of the present application;
图11为本申请实施例提供的车辆的位置获取装置的一种结构示意图;FIG11 is a schematic diagram of a structure of a vehicle position acquisition device provided in an embodiment of the present application;
图12为本申请实施例提供的模型的训练装置的一种结构示意图;FIG12 is a schematic diagram of a structure of a training device for a model provided in an embodiment of the present application;
图13为本申请实施例提供的执行设备的一种结构示意图;FIG13 is a schematic diagram of a structure of an execution device provided in an embodiment of the present application;
图14是本申请实施例提供的训练设备一种结构示意图;FIG14 is a schematic diagram of a structure of a training device provided in an embodiment of the present application;
图15为本申请实施例提供的芯片的一种结构示意图。FIG. 15 is a schematic diagram of the structure of a chip provided in an embodiment of the present application.
具体实施方式 Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请的部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。本领域技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only some embodiments of the present application, not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without making creative work are within the scope of protection of this application. It is known to those skilled in the art that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或模块。The terms "first", "second", etc. in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable where appropriate, so that the embodiments described herein can be implemented in an order other than that illustrated or described herein. In addition, the terms "including" and "having" and any of their variations are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device that includes a series of steps or modules is not necessarily limited to those steps or modules that are clearly listed, but may include other steps or modules that are not clearly listed or inherent to these processes, methods, products or devices.
本申请中出现的术语“和/或”,可以是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本申请中字符“/”,一般表示前后关联对象是一种“或”的关系。The term "and/or" in this application can be a description of the association relationship of associated objects, indicating that three relationships can exist. For example, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone. In addition, the character "/" in this application generally indicates that the associated objects before and after are in an "or" relationship.
还应该注意的是,在一些替代实施中,所注明的功能/动作可以不按附图的顺序出现。例如,取决于涉及的功能/动作,事实上可以实质上同时发生或可以有时以相反的顺序执行连续示出的两个附图。It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order of the drawings. For example, two figures shown in succession may in fact occur substantially simultaneously or may sometimes be performed in the reverse order, depending on the functions/acts involved.
本申请实施例,除非另有说明,“至少一个”的含义是指一个或多个,“多个”的含义是指两个或两个以上。可以理解,在本申请中,“当…时”、“若”以及“如果”均指在某种客观情况下装置会做出相应的处理,并非是限定时间,且也不要求装置实现时一定要有判断的动作,也不意味着存在其它限定。另外,专用的词“示例性”意为“用作例子、实施例或说明性”。作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。In the embodiments of the present application, unless otherwise specified, the meaning of "at least one" refers to one or more, and the meaning of "plurality" refers to two or more. It is understood that in the present application, "when", "if" and "if" all refer to the device making corresponding processing under certain objective circumstances, and do not limit the time, nor do they require that there must be a judgment action when the device is implemented, nor do they mean that there are other limitations. In addition, the special word "exemplary" means "used as an example, embodiment or illustrative". Any embodiment described as "exemplary" is not necessarily interpreted as being superior or better than other embodiments.
本申请提供的车辆的位置获取方法可以应用于人工智能(artificial intelligence,AI)场景中。AI是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。The vehicle position acquisition method provided in this application can be applied to artificial intelligence (AI) scenarios. AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments of the present application are described below in conjunction with the accompanying drawings. It is known to those skilled in the art that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
首先对人工智能系统总体工作流程进行描述,请参阅图1a,图1a示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1a. Figure 1a shows a structural diagram of the main framework of artificial intelligence. The following is an explanation of the above artificial intelligence theme framework from the two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be a general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensation process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecology process of the system.
(1)基础设施:(1) Infrastructure:
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片,如中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(英语:graphics processing unit,GPU)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。The infrastructure provides computing power support for the artificial intelligence system, enables communication with the outside world, and is supported by the basic platform. It communicates with the outside world through sensors; computing power is provided by smart chips, such as central processing units (CPU), neural-network processing units (NPU), graphics processing units (GPU), application specific integrated circuits (ASIC) or field programmable gate arrays (FPGA) and other hardware acceleration chips; the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
(2)数据(2) Data
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、视频、 文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。The data on the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data involves graphics, images, voice, video, The text also involves IoT data of traditional equipment, including business data of existing systems and perception data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理(3) Data processing
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, and training.
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formalized information to perform machine thinking and solve problems based on reasoning control strategies. Typical functions are search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
(4)通用能力(4) General capabilities
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理(如图像识别、目标检测等),语音识别等等。After the data has undergone the data processing mentioned above, some general capabilities can be further formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing (such as image recognition, target detection, etc.), speech recognition, etc.
(5)智能产品及行业应用(5) Smart products and industry applications
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶,智能终端等。Smart products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, which productizes intelligent information decision-making and realizes practical applications. Its application areas mainly include: smart manufacturing, smart transportation, smart home, smart medical care, smart security, autonomous driving, smart terminals, etc.
本申请可以应用于自动驾驶领域,具体可以实现自动驾驶领域中的他车的行车意图预测以及行车轨迹预测。The present application can be applied to the field of autonomous driving, and specifically can realize the prediction of driving intention and driving trajectory of other vehicles in the field of autonomous driving.
行车意图可以指车辆在未来要进行的行驶策略,具体可以根据车辆的路况信息以及行车状态等信息对车辆的行车意图进行估计。车辆的行车轨迹预测是指预测车辆在未来一定时间内,每个时间点所在的位置。Driving intention refers to the driving strategy that a vehicle will take in the future. Specifically, the driving intention of a vehicle can be estimated based on the vehicle's road condition information and driving status. Vehicle trajectory prediction refers to predicting the location of the vehicle at each time point in the future.
在自动驾驶领域,通过实时、准确、可靠地对周围车辆的行车意图进行估计,并预测车辆未来的行车轨迹,可以帮助自车预知前方的交通状况,建立自车周围的交通态势,有助于对周围他车目标重要性判断,筛选交互的关键目标,便于自车提前进行路径规划,安全通过复杂场景。应理解,本申请实施例中也可以将上述周围车辆称之为位于自车周围的关联车。In the field of autonomous driving, by estimating the driving intentions of surrounding vehicles in real time, accurately and reliably, and predicting the future driving trajectory of the vehicle, it can help the vehicle to predict the traffic conditions ahead, establish the traffic situation around the vehicle, help judge the importance of the surrounding vehicle targets, screen the key targets for interaction, and facilitate the vehicle to plan the path in advance and pass through complex scenes safely. It should be understood that in the embodiments of the present application, the above-mentioned surrounding vehicles can also be referred to as associated vehicles located around the vehicle.
在现有技术中,行车意图被定义为直行、左转、右转等方向性意图,例如在路口场景下车辆的行车意图可以包括直行、左转、右转等。然而上述行车意图的定义方式在复杂的场景表示能力有限,方向性意图在一些复杂路口或者其他复杂的车道场景中并不能覆盖全部的行车意图。例如参照图1b,图1b为路况的一种结构示意图,其中,车道1和车道2为左转车道,车道3和车道4为直行车道,车道5则为S型车道。In the prior art, driving intention is defined as directional intentions such as going straight, turning left, and turning right. For example, in an intersection scenario, the driving intention of a vehicle may include going straight, turning left, and turning right. However, the above definition of driving intention has limited representation capabilities in complex scenarios, and directional intentions cannot cover all driving intentions in some complex intersections or other complex lane scenarios. For example, referring to Figure 1b, Figure 1b is a structural schematic diagram of a road condition, in which lanes 1 and 2 are left-turn lanes, lanes 3 and 4 are straight lanes, and lane 5 is an S-shaped lane.
本申请实施例提供的一种车辆的位置获取方法,可以应用于自动驾驶的预测系统,预测系统可以基于路况信息、车辆的历史行驶路线等信息进行他车的行车意图以及预测轨迹的预测。A vehicle position acquisition method provided in an embodiment of the present application can be applied to an automatic driving prediction system. The prediction system can predict the driving intention and predicted trajectory of other vehicles based on road condition information, the vehicle's historical driving route and other information.
本申请实施例中,预测系统可以包括硬件电路(如专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器等等)、或这些硬件电路的组合,例如,预测系统可以为具有执行指令功能的硬件系统,如CPU、DSP等,或者为不具有执行指令功能的硬件系统,如ASIC、FPGA等,或者为上述不具有执行指令功能的硬件系统以及具有执行指令功能的硬件系统的组合。In an embodiment of the present application, the prediction system may include a hardware circuit (such as an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, etc.), or a combination of these hardware circuits. For example, the prediction system may be a hardware system with an instruction execution function, such as a CPU, DSP, etc., or a hardware system without an instruction execution function, such as an ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems without an instruction execution function and hardware systems with an instruction execution function.
具体地,预测系统可以为具有执行指令功能的硬件系统,本申请实施例提供的车辆的位置获取可以为存储在存储器中的软件代码,预测系统可以从存储器中获取到软件代码,并执行获取到的软件代码来实现本申请实施例提供的车辆的位置获取。Specifically, the prediction system can be a hardware system with an execution instruction function, and the vehicle position acquisition provided in the embodiment of the present application can be a software code stored in a memory. The prediction system can obtain the software code from the memory and execute the obtained software code to implement the vehicle position acquisition provided in the embodiment of the present application.
应理解,预测系统可以为不具有执行指令功能的硬件系统以及具有执行指令功能的硬件系统的组合,本申请实施例提供的车辆的位置获取的部分步骤还可以通过预测系统中不具有执行指令功能的硬件系统实现,这里并不限定。 It should be understood that the prediction system can be a combination of a hardware system that does not have the function of executing instructions and a hardware system that has the function of executing instructions. Some steps of obtaining the vehicle position provided in the embodiment of the present application can also be implemented by a hardware system in the prediction system that does not have the function of executing instructions, which is not limited here.
本申请实施例中,预测系统可以部署于车辆或者云侧的服务器上,接下来以预测系统部署于车辆为例,结合车辆上的软硬件模块,对以预测系统实现他车的行车意图以及预测轨迹的预测过程进行描述。In an embodiment of the present application, the prediction system can be deployed on a vehicle or a server on the cloud side. Next, taking the deployment of the prediction system on a vehicle as an example, combined with the software and hardware modules on the vehicle, the prediction process of using the prediction system to realize the driving intention of other vehicles and the predicted trajectory is described.
本申请实施例中的车辆,例如本申请实施例中的目标车辆、目标车辆周围的关联车辆等可以是指将引擎作为动力源的内燃机车辆、将引擎和电动马达作为动力源的混合动力车辆、将电动马达作为动力源的电动汽车等等。The vehicles in the embodiments of the present application, such as the target vehicle in the embodiments of the present application, the associated vehicles around the target vehicle, etc., may refer to internal combustion engine vehicles that use an engine as a power source, hybrid vehicles that use an engine and an electric motor as power sources, electric vehicles that use an electric motor as a power source, and the like.
本申请实施例中,车辆可以包括具有自动驾驶功能的自动驾驶装置100。In an embodiment of the present application, the vehicle may include an automatic driving device 100 with an automatic driving function.
参照图1c,图1c是本申请实施例提供的具有自动驾驶功能的自动驾驶装置100的功能框图。在一个实施例中,自动驾驶装置100可包括各种子系统,例如行进系统102、传感器系统104、控制系统106、一个或多个外围设备108以及电源110、计算机系统112和用户接口116。可选地,自动驾驶装置100可包括更多或更少的子系统,并且每个子系统可包括多个元件。另外,自动驾驶装置100的每个子系统和元件可以通过有线或者无线互连。Referring to FIG. 1c, FIG. 1c is a functional block diagram of an automatic driving device 100 with an automatic driving function provided in an embodiment of the present application. In one embodiment, the automatic driving device 100 may include various subsystems, such as a travel system 102, a sensor system 104, a control system 106, one or more peripheral devices 108, and a power supply 110, a computer system 112, and a user interface 116. Optionally, the automatic driving device 100 may include more or fewer subsystems, and each subsystem may include multiple elements. In addition, each subsystem and element of the automatic driving device 100 may be interconnected by wire or wirelessly.
行进系统102可包括为自动驾驶装置100提供动力运动的组件。在一个实施例中,行进系统102可包括引擎118、能量源119、传动装置120和车轮/轮胎121。引擎118可以是内燃引擎、电动机、空气压缩引擎或其他类型的引擎组合,例如汽油发动机和电动机组成的混动引擎,内燃引擎和空气压缩引擎组成的混动引擎。引擎118将能量源119转换成机械能量。The travel system 102 may include components that provide powered movement for the autonomous driving device 100. In one embodiment, the travel system 102 may include an engine 118, an energy source 119, a transmission 120, and wheels/tires 121. The engine 118 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a hybrid engine consisting of a gasoline engine and an electric motor, or a hybrid engine consisting of an internal combustion engine and an air compression engine. The engine 118 converts the energy source 119 into mechanical energy.
能量源119的示例包括汽油、柴油、其他基于石油的燃料、丙烷、其他基于压缩气体的燃料、乙醇、太阳能电池板、电池和其他电力来源。能量源119也可以为自动驾驶装置100的其他系统提供能量。Examples of energy source 119 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity. Energy source 119 may also provide energy for other systems of autonomous driving device 100.
传动装置120可以将来自引擎118的机械动力传送到车轮121。传动装置120可包括变速箱、差速器和驱动轴。在一个实施例中,传动装置120还可以包括其他器件,比如离合器。其中,驱动轴可包括可耦合到一个或多个车轮121的一个或多个轴。The transmission 120 can transmit mechanical power from the engine 118 to the wheels 121. The transmission 120 may include a gearbox, a differential, and a drive shaft. In one embodiment, the transmission 120 may also include other devices, such as a clutch. Among them, the drive shaft may include one or more shafts that can be coupled to one or more wheels 121.
传感器系统104可包括感测关于自动驾驶装置100周边的环境的信息的若干个传感器。例如,传感器系统104可包括定位系统122(定位系统可以是全球定位系统(global positioning system,GPS)系统,也可以是北斗系统或者其他定位系统)、惯性测量单元(inertial measurement unit,IMU)124、雷达126、激光测距仪128以及相机130。传感器系统104还可包括被监视自动驾驶装置100的内部系统的传感器(例如,车内空气质量监测器、燃油量表、机油温度表等)。来自这些传感器中的一个或多个的传感器数据可用于检测对象及其相应特性(位置、形状、方向、速度等)。这种检测和识别是自主自动驾驶装置100的安全操作的关键功能。The sensor system 104 may include several sensors that sense information about the environment surrounding the autonomous driving device 100. For example, the sensor system 104 may include a positioning system 122 (the positioning system may be a global positioning system (GPS) system, or a Beidou system or other positioning systems), an inertial measurement unit (IMU) 124, a radar 126, a laser rangefinder 128, and a camera 130. The sensor system 104 may also include sensors of the internal systems of the monitored autonomous driving device 100 (e.g., an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensor data from one or more of these sensors may be used to detect objects and their corresponding characteristics (position, shape, direction, speed, etc.). Such detection and recognition are key functions for the safe operation of the autonomous autonomous driving device 100.
定位系统122可用于估计自动驾驶装置100的地理位置。IMU 124用于基于惯性加速度来感测自动驾驶装置100的位置和朝向变化。在一个实施例中,IMU 124可以是加速度计和陀螺仪的组合。Positioning system 122 may be used to estimate the geographic location of autonomous driving device 100. IMU 124 is used to sense position and orientation changes of autonomous driving device 100 based on inertial acceleration. In one embodiment, IMU 124 may be a combination of an accelerometer and a gyroscope.
雷达126可利用无线电信号来感测自动驾驶装置100的周边环境内的物体。在一些实施例中,除了感测物体以外,雷达126还可用于感测物体的速度和/或前进方向。The radar 126 may utilize radio signals to sense objects within the surrounding environment of the autonomous driving device 100. In some embodiments, in addition to sensing objects, the radar 126 may also be used to sense the speed and/or heading of the objects.
雷达126可包括电磁波发送部、接收部。雷达126在电波发射原理上可实现为脉冲雷达(pulse radar)方式或连续波雷达(continuous wave radar)方式。雷达126在连续波雷达方式中可根据信号波形而实现为调频连续波(frequency modulated continuous wave,FMCW)方式或频移监控(frequency shift keying,FSK)方式。The radar 126 may include an electromagnetic wave transmitting unit and a receiving unit. The radar 126 may be implemented as a pulse radar or a continuous wave radar based on the principle of radio wave transmission. In the continuous wave radar mode, the radar 126 may be implemented as a frequency modulated continuous wave (FMCW) mode or a frequency shift keying (FSK) mode according to the signal waveform.
雷达126可以电磁波作为媒介,基于飞行时间(time of flight,TOF)方式或相移(phase-shift)方式来检测对象,并检测被检测出的对象的位置、与检测出的对象的距离以及相对速度。为了检测位于车辆的前方、后方或侧方的对象,雷达126可配置在车辆的外部的适当的位置。激光雷达126可以激光作为媒介,基于TOF方式或相移方式检测对象,并检测被检测出的对象的位置、与检测出的对象的距离以及相对速度。The radar 126 can detect an object based on a time of flight (TOF) method or a phase-shift method using electromagnetic waves as a medium, and detect the position of the detected object, the distance to the detected object, and the relative speed. In order to detect an object located in front of, behind, or to the side of the vehicle, the radar 126 can be configured at an appropriate position outside the vehicle. The laser radar 126 can detect an object based on a TOF method or a phase-shift method using laser as a medium, and detect the position of the detected object, the distance to the detected object, and the relative speed.
可选地,为了检测位于车辆的前方、后方或侧方的对象,激光雷达126可配置在车辆的外部的适当的位置。Alternatively, in order to detect objects located in front of, behind, or to the sides of the vehicle, the lidar 126 may be configured at a suitable location on the exterior of the vehicle.
激光测距仪128可利用激光来感测自动驾驶装置100所位于的环境中的物体。在一些实施例中,激光测距仪128可包括一个或多个激光源、激光扫描器以及一个或多个检测器,以及其他系统组件。The laser rangefinder 128 may utilize laser light to sense objects in the environment in which the autonomous driving device 100 is located. In some embodiments, the laser rangefinder 128 may include one or more laser sources, a laser scanner, and one or more detectors, among other system components.
相机130可用于捕捉自动驾驶装置100的周边环境的多个图像。相机130可以是静态相机或视频相 机。The camera 130 may be used to capture multiple images of the surrounding environment of the autonomous driving device 100. The camera 130 may be a still camera or a video camera. machine.
可选地,为了获取车辆外部影像,相机130可位于车辆的外部的适当的位置。例如,为了获取车辆前方的影像,相机130可在车辆的室内与前风挡相靠近地配置。或者,相机130可配置在前保险杠或散热器格栅周边。例如,为了获取车辆后方的影像,相机130可在车辆的室内与后窗玻璃相靠近地配置。或者,相机130可配置在后保险杠、后备箱或尾门周边。例如,为了获取车辆侧方的影像,相机130可在车辆的室内与侧窗中的至少一方相靠近地配置。或者,相机130可配置在侧镜、挡泥板或车门周边。Optionally, in order to obtain images of the outside of the vehicle, the camera 130 may be located at an appropriate position outside the vehicle. For example, in order to obtain images in front of the vehicle, the camera 130 may be arranged in the interior of the vehicle close to the front windshield. Alternatively, the camera 130 may be arranged around the front bumper or radiator grille. For example, in order to obtain images of the rear of the vehicle, the camera 130 may be arranged in the interior of the vehicle close to the rear window glass. Alternatively, the camera 130 may be arranged around the rear bumper, trunk, or tailgate. For example, in order to obtain images of the side of the vehicle, the camera 130 may be arranged in the interior of the vehicle close to at least one of the side windows. Alternatively, the camera 130 may be arranged around the side mirrors, fenders, or doors.
本申请实施例中,可以基于传感器系统104中的一个或多个传感器来获取到目标车辆的路况信息、历史行驶路线、以及位于目标车辆周围的关联车的历史行驶路线等等。In the embodiment of the present application, the road condition information, historical driving route, and historical driving routes of associated vehicles located around the target vehicle, etc. of the target vehicle can be acquired based on one or more sensors in the sensor system 104.
控制系统106为控制自动驾驶装置100及其组件的操作。控制系统106可包括各种元件,其中包括转向系统132、油门134、制动单元136、传感器融合算法138、计算机视觉系统140、路线控制系统142以及障碍物避免系统144。The control system 106 controls the operation of the autonomous driving device 100 and its components. The control system 106 may include various elements, including a steering system 132 , a throttle 134 , a brake unit 136 , a sensor fusion algorithm 138 , a computer vision system 140 , a path control system 142 , and an obstacle avoidance system 144 .
转向系统132可操作来调整自动驾驶装置100的前进方向。例如在一个实施例中可以为方向盘系统。The steering system 132 is operable to adjust the forward direction of the autonomous driving device 100. For example, in one embodiment, it may be a steering wheel system.
油门134用于控制引擎118的操作速度并进而控制自动驾驶装置100的速度。The throttle 134 is used to control the operating speed of the engine 118 and thus the speed of the autopilot 100 .
制动单元136用于控制自动驾驶装置100减速。制动单元136可使用摩擦力来减慢车轮121。在其他实施例中,制动单元136可将车轮121的动能转换为电流。制动单元136也可采取其他形式来减慢车轮121转速从而控制自动驾驶装置100的速度。The brake unit 136 is used to control the deceleration of the automatic driving device 100. The brake unit 136 can use friction to slow down the wheel 121. In other embodiments, the brake unit 136 can convert the kinetic energy of the wheel 121 into electric current. The brake unit 136 can also take other forms to slow down the rotation speed of the wheel 121 to control the speed of the automatic driving device 100.
计算机视觉系统140可以操作来处理和分析由相机130捕捉的图像以便识别自动驾驶装置100周边环境中的物体和/或特征。该物体和/或特征可包括交通信号、道路边界和障碍物。计算机视觉系统140可使用物体识别算法、运动中恢复结构(structure frommotion,SFM)算法、视频跟踪和其他计算机视觉技术。在一些实施例中,计算机视觉系统140可以用于为环境绘制地图、跟踪物体、估计物体的速度等等。The computer vision system 140 may be operable to process and analyze images captured by the camera 130 in order to identify objects and/or features in the environment surrounding the autonomous driving device 100. The objects and/or features may include traffic signs, road boundaries, and obstacles. The computer vision system 140 may use object recognition algorithms, structure from motion (SFM) algorithms, video tracking, and other computer vision techniques. In some embodiments, the computer vision system 140 may be used to map the environment, track objects, estimate the speed of objects, and the like.
路线控制系统142用于确定自动驾驶装置100的行驶路线。在一些实施例中,路线控制系统142可结合来自传感器138、定位系统122和一个或多个预定地图的数据以为自动驾驶装置100确定行驶路线。The route control system 142 is used to determine the driving route of the autonomous driving device 100. In some embodiments, the route control system 142 may combine data from the sensors 138, the positioning system 122, and one or more predetermined maps to determine the driving route for the autonomous driving device 100.
障碍规避系统144用于识别、评估和规避或者以其他方式越过自动驾驶装置100的环境中的潜在障碍物。The obstacle avoidance system 144 is used to identify, evaluate, and avoid or otherwise negotiate potential obstacles in the environment of the autonomous driving device 100 .
当然,在一个实例中,控制系统106可以增加或替换地包括除了所示出和描述的那些以外的组件。或者也可以减少一部分上述示出的组件。Of course, in one example, the control system 106 may include additional or alternative components other than those shown and described, or may also reduce some of the components shown above.
自动驾驶装置100通过外围设备108与外部传感器、其他自动驾驶装置、其他计算机系统或用户之间进行交互。外围设备108可包括无线通信系统146、车载电脑148、麦克风150和/或扬声器152。The autonomous driving device 100 interacts with external sensors, other autonomous driving devices, other computer systems, or users through peripheral devices 108. The peripheral devices 108 may include a wireless communication system 146, an onboard computer 148, a microphone 150, and/or a speaker 152.
在一些实施例中,外围设备108提供自动驾驶装置100的用户与用户接口116交互的手段。例如,车载电脑148可向自动驾驶装置100的用户提供信息。用户接口116还可操作车载电脑148来接收用户的输入。车载电脑148可以通过触摸屏进行操作。在其他情况中,外围设备108可提供用于自动驾驶装置100与位于车内的其它设备通信的手段。例如,麦克风150可从自动驾驶装置100的用户接收音频(例如,语音命令或其他音频输入)。类似地,扬声器152可向自动驾驶装置100的用户输出音频。In some embodiments, the peripheral device 108 provides a means for the user of the autonomous driving device 100 to interact with the user interface 116. For example, the onboard computer 148 can provide information to the user of the autonomous driving device 100. The user interface 116 can also operate the onboard computer 148 to receive input from the user. The onboard computer 148 can be operated through a touch screen. In other cases, the peripheral device 108 can provide a means for the autonomous driving device 100 to communicate with other devices located in the vehicle. For example, the microphone 150 can receive audio (e.g., voice commands or other audio input) from the user of the autonomous driving device 100. Similarly, the speaker 152 can output audio to the user of the autonomous driving device 100.
无线通信系统146可以直接地或者经由通信网络来与一个或多个设备无线通信。例如,无线通信系统146可使用3G蜂窝通信,例如码分多址(code division multiple access,CDMA)、EVD0、全球移动通信系统(global system for mobile communications,GSM)/是通用分组无线服务技术(general packet radio service,GPRS),或者4G蜂窝通信,例如长期演进(long term evolution,LTE),或者5G蜂窝通信。无线通信系统146可利用WiFi与无线局域网(wireless local area network,WLAN)通信。在一些实施例中,无线通信系统146可利用红外链路、蓝牙或ZigBee与设备直接通信。其他无线协议,例如各种自动驾驶装置通信系统,例如,无线通信系统146可包括一个或多个专用短程通信(dedicated short range communications,DSRC)设备,这些设备可包括自动驾驶装置和/或路边台站之间的公共和/或私有数据通信。The wireless communication system 146 can communicate wirelessly with one or more devices directly or via a communication network. For example, the wireless communication system 146 can use 3G cellular communication, such as code division multiple access (CDMA), EVDO, global system for mobile communications (GSM)/general packet radio service (GPRS), or 4G cellular communication, such as long term evolution (LTE), or 5G cellular communication. The wireless communication system 146 can communicate with a wireless local area network (WLAN) using WiFi. In some embodiments, the wireless communication system 146 can communicate directly with the device using an infrared link, Bluetooth, or ZigBee. Other wireless protocols, such as various autonomous driving device communication systems, for example, the wireless communication system 146 may include one or more dedicated short range communications (DSRC) devices, which may include public and/or private data communications between autonomous driving devices and/or roadside stations.
在一种实现中,本申请实施例中的路况信息、历史行车轨迹等信息可以为车辆通过无线通信系统146从其他车辆或者云侧服务器接收的。 In one implementation, the road condition information, historical driving trajectory and other information in the embodiments of the present application can be received by the vehicle from other vehicles or a cloud-side server through the wireless communication system 146.
在预测系统位于云侧的服务器时,车辆可以通过无线通信系统146接收到服务器传递的针对于目标车辆的行车意图信息等等。When the prediction system is located on a server on the cloud side, the vehicle can receive driving intention information and the like for the target vehicle transmitted by the server through the wireless communication system 146 .
电源110可向自动驾驶装置100的各种组件提供电力。在一个实施例中,电源110可以为可再充电锂离子或铅酸电池。这种电池的一个或多个电池组可被配置为电源为自动驾驶装置100的各种组件提供电力。在一些实施例中,电源110和能量源119可一起实现,例如一些全电动车中那样。The power source 110 can provide power to the various components of the autonomous driving device 100. In one embodiment, the power source 110 can be a rechargeable lithium-ion or lead-acid battery. One or more battery packs of such batteries can be configured as a power source to provide power to the various components of the autonomous driving device 100. In some embodiments, the power source 110 and the energy source 119 can be implemented together, such as in some all-electric vehicles.
自动驾驶装置100的部分或所有功能受计算机系统112控制。计算机系统112可包括至少一个处理器113,处理器113执行存储在例如存储器114这样的非暂态计算机可读介质中的指令115。计算机系统112还可以是采用分布式方式控制自动驾驶装置100的个体组件或子系统的多个计算设备。Some or all functions of the autonomous driving device 100 are controlled by a computer system 112. The computer system 112 may include at least one processor 113 that executes instructions 115 stored in a non-transitory computer-readable medium such as a memory 114. The computer system 112 may also be a plurality of computing devices that control individual components or subsystems of the autonomous driving device 100 in a distributed manner.
处理器113可以是任何常规的处理器,诸如商业可获得的中央处理器(central processing unit,CPU)。替选地,该处理器可以是诸如专用集成电路(application specific integrated circuits,ASIC)或其它基于硬件的处理器的专用设备。尽管图1c功能性地图示了处理器、存储器、和在相同块中的计算机110的其它元件,但是本领域的普通技术人员应该理解该处理器、计算机、或存储器实际上可以包括可以或者可以不存储在相同的物理外壳内的多个处理器、计算机、或存储器。例如,存储器可以是硬盘驱动器或位于不同于计算机110的外壳内的其它存储介质。因此,对处理器或计算机的引用将被理解为包括对可以或者可以不并行操作的处理器或计算机或存储器的集合的引用。不同于使用单一的处理器来执行此处所描述的步骤,诸如转向组件和减速组件的一些组件每个都可以具有其自己的处理器,该处理器只执行与特定于组件的功能相关的计算。Processor 113 can be any conventional processor, such as a commercially available central processing unit (CPU). Alternatively, the processor can be a dedicated device such as an application specific integrated circuit (ASIC) or other hardware-based processor. Although FIG. 1c functionally illustrates the processor, memory, and other elements of the computer 110 in the same block, it should be understood by those skilled in the art that the processor, computer, or memory may actually include multiple processors, computers, or memories that may or may not be stored in the same physical housing. For example, the memory may be a hard drive or other storage medium located in a housing different from the computer 110. Therefore, references to processors or computers will be understood to include references to a collection of processors or computers or memories that may or may not operate in parallel. Different from using a single processor to perform the steps described herein, some components such as steering components and deceleration components can each have their own processor that performs only calculations related to the functions specific to the component.
在此处所描述的各个方面中,处理器可以位于远离该自动驾驶装置并且与该自动驾驶装置进行无线通信。在其它方面中,此处所描述的过程中的一些在布置于自动驾驶装置内的处理器上执行而其它则由远程处理器执行,包括采取执行单一操纵的必要步骤。In various aspects described herein, the processor may be located remotely from the autonomous driving device and in wireless communication with the autonomous driving device. In other aspects, some of the processes described herein are performed on a processor disposed within the autonomous driving device and others are performed by a remote processor, including taking the necessary steps to perform a single maneuver.
在一些实施例中,存储器114可包含指令115(例如,程序逻辑),指令115可被处理器113执行来执行自动驾驶装置100的各种功能,包括以上描述的那些功能。存储器114也可包含额外的指令,包括向行进系统102、传感器系统104、控制系统106和外围设备108中的一个或多个发送数据、从其接收数据、与其交互和/或对其进行控制的指令。In some embodiments, the memory 114 may include instructions 115 (e.g., program logic) that may be executed by the processor 113 to perform various functions of the autonomous driving device 100, including those described above. The memory 114 may also include additional instructions, including instructions to send data to, receive data from, interact with, and/or control one or more of the travel system 102, the sensor system 104, the control system 106, and the peripherals 108.
除了指令115以外,存储器114还可存储数据,例如道路地图、路线信息,自动驾驶装置的位置、方向、速度以及其它这样的自动驾驶装置数据,以及其他信息。这种信息可在自动驾驶装置100在自主、半自主和/或手动模式中操作期间被自动驾驶装置100和计算机系统112使用。Memory 114 may store data, such as road maps, route information, the position, direction, speed, and other such autopilot data of the autopilot, and other information in addition to instructions 115. Such information may be used by the autopilot 100 and computer system 112 during operation of the autopilot 100 in autonomous, semi-autonomous, and/or manual modes.
本申请实施例提供的车辆的位置获取方法可以为存储在存储器114中的软件代码,处理器113可以从存储器中获取到软件代码,并执行获取到的软件代码来实现本申请实施例提供的车辆的位置获取方法。在得到目标车辆的行车意图后,可以将行车意图传递至控制系统106,控制系统106可以基于行车意图来进行自车行驶策略的确定。The vehicle position acquisition method provided in the embodiment of the present application may be a software code stored in the memory 114. The processor 113 may acquire the software code from the memory and execute the acquired software code to implement the vehicle position acquisition method provided in the embodiment of the present application. After obtaining the driving intention of the target vehicle, the driving intention may be transmitted to the control system 106, and the control system 106 may determine the driving strategy of the vehicle based on the driving intention.
用户接口116,用于向自动驾驶装置100的用户提供信息或从其接收信息。可选地,用户接口116可包括在外围设备108的集合内的一个或多个输入/输出设备,例如无线通信系统146、车载电脑148、麦克风150和扬声器152。The user interface 116 is used to provide information to or receive information from a user of the autonomous driving device 100. Optionally, the user interface 116 may include one or more input/output devices within the set of peripheral devices 108, such as a wireless communication system 146, an onboard computer 148, a microphone 150, and a speaker 152.
计算机系统112可基于从各种子系统(例如,行进系统102、传感器系统104和控制系统106)以及从用户接口116接收的输入来控制自动驾驶装置100的功能。例如,计算机系统112可利用来自控制系统106的输入以便控制转向单元132来避免由传感器系统104和障碍物避免系统144检测到的障碍物。在一些实施例中,计算机系统112可操作来对自动驾驶装置100及其子系统的许多方面提供控制。Computer system 112 may control functions of autonomous driving device 100 based on input received from various subsystems (e.g., travel system 102, sensor system 104, and control system 106) and from user interface 116. For example, computer system 112 may utilize input from control system 106 in order to control steering unit 132 to avoid obstacles detected by sensor system 104 and obstacle avoidance system 144. In some embodiments, computer system 112 may be operable to provide control over many aspects of autonomous driving device 100 and its subsystems.
可选地,上述这些组件中的一个或多个可与自动驾驶装置100分开安装或关联。例如,存储器114可以部分或完全地与自动驾驶装置100分开存在。上述组件可以按有线和/或无线方式来通信地耦合在一起。Optionally, one or more of the above components may be installed or associated separately from the autonomous driving device 100. For example, the memory 114 may be partially or completely separate from the autonomous driving device 100. The above components may be communicatively coupled together in a wired and/or wireless manner.
可选地,上述组件只是一个示例,实际应用中,上述各个模块中的组件有可能根据实际需要增添或者删除,图1c不应理解为对本申请实施例的限制。Optionally, the above components are only an example. In practical applications, the components in the above modules may be added or deleted according to actual needs. FIG. 1c should not be understood as a limitation on the embodiments of the present application.
参见附图2a,本申请实施例提供了一种系统架构200a。该系统架构中包括数据库230a、客户设备240a。数据采集设备260a用于采集数据并存入数据库230a,训练模块202a基于数据库230a中维护的 数据生成目标模型/规则201a。下面将更详细地描述训练模块202a如何基于数据得到目标模型/规则201a,目标模型/规则201a即本申请以下实施方式中所提及的第一模型。Referring to FIG. 2a, the present embodiment provides a system architecture 200a. The system architecture includes a database 230a and a client device 240a. The data acquisition device 260a is used to collect data and store it in the database 230a. The training module 202a is based on the data maintained in the database 230a. The data generates the target model/rule 201a. The following will describe in more detail how the training module 202a obtains the target model/rule 201a based on the data. The target model/rule 201a is the first model mentioned in the following embodiments of the present application.
计算模块211a可以包括训练模块202a,训练模块202a得到的目标模型/规则可以应用不同的系统或设备中。在附图2a中,执行设备210a配置收发器212a,该收发器212a可以是无线收发器、光收发器或有线接口(如I/O接口)等,与外部设备进行数据交互,“用户”可以通过客户设备240a向收发器212a输入数据,例如,客户设备240a可以向执行设备210a发送目标任务,请求执行设备训练神经网络,并向执行设备210a发送用于训练的数据库。The calculation module 211a may include a training module 202a, and the target model/rule obtained by the training module 202a may be applied to different systems or devices. In FIG2a, the execution device 210a is configured with a transceiver 212a, which may be a wireless transceiver, an optical transceiver, or a wired interface (such as an I/O interface), etc., to interact with external devices for data, and a "user" may input data to the transceiver 212a through a client device 240a. For example, the client device 240a may send a target task to the execution device 210a, requesting the execution device to train a neural network, and send a database for training to the execution device 210a.
执行设备210a可以调用数据存储系统250a中的数据、代码等,也可以将数据、指令等存入数据存储系统250a中。The execution device 210a can call data, codes, etc. in the data storage system 250a, and can also store data, instructions, etc. in the data storage system 250a.
计算模块211a使用目标模型/规则201a对输入的数据进行处理。具体地,计算模块211a用于:首先,获取第一信息和第二信息,其中,第一信息包括自车周围的车辆的信息,第二信息包括自车周围的车道的信息;随后,将第一信息和第二信息输入第一模型中,得到第一模型生成的预测信息,其中,预测信息包括自车周围的车辆在第一时间内的预测位置信息。The calculation module 211a uses the target model/rule 201a to process the input data. Specifically, the calculation module 211a is used to: first, obtain the first information and the second information, wherein the first information includes the information of the vehicles around the vehicle, and the second information includes the information of the lanes around the vehicle; then, input the first information and the second information into the first model to obtain the prediction information generated by the first model, wherein the prediction information includes the predicted position information of the vehicles around the vehicle within the first time.
最后,收发器212a将神经网络的输出结果返回给客户设备240a。如用户可以通过客户设备240a输入一段待转换为手语动作的文本,通过神经网络输出手语动作或者表示手语动作的参数,并反馈给客户设备240a。Finally, the transceiver 212a returns the output of the neural network to the client device 240a. For example, the user can input a text to be converted into a sign language action through the client device 240a, and the neural network outputs the sign language action or the parameters representing the sign language action and feeds it back to the client device 240a.
更深层地,训练模块202a可以针对不同的任务,基于不同的数据得到相应的目标模型/规则201a,以给用户提供更佳的结果。More deeply, the training module 202a can obtain corresponding target models/rules 201a for different tasks based on different data to provide users with better results.
在附图2a中所示情况下,可以根据用户的输入数据确定输入执行设备210a中的数据,例如,用户可以在收发器212a提供的界面中操作。另一种情况下,客户设备240a可以自动地向收发器212a输入数据并获得结果,若客户设备240a自动输入数据需要获得用户的授权,用户可以在客户设备240a中设置相应权限。用户可以在客户设备240a查看执行设备210a输出的结果,具体地呈现形式可以是显示、声音、动作等具体方式。客户设备240a也可以作为数据采集端将采集到与目标任务关联的数据存入数据库230a。In the case shown in FIG. 2a, the data input into the execution device 210a can be determined based on the user's input data. For example, the user can operate in the interface provided by the transceiver 212a. In another case, the client device 240a can automatically input data into the transceiver 212a and obtain the result. If the automatic data input of the client device 240a requires the user's authorization, the user can set the corresponding authority in the client device 240a. The user can view the result output by the execution device 210a on the client device 240a, and the specific presentation form can be a specific method such as display, sound, action, etc. The client device 240a can also serve as a data collection terminal to store the collected data associated with the target task into the database 230a.
在本申请所提及的训练或者更新过程可以由训练模块202a来执行。可以理解的是,神经网络的训练过程即学习控制空间变换的方式,更具体即学习权重矩阵。训练神经网络的目的是使神经网络的输出尽可能接近期望值,因此可以通过比较当前网络的预测值和期望值,再根据两者之间的差异情况来更新神经网络中的每一层神经网络的权重向量(当然,在第一次更新之前通常可以先对权重向量进行初始化,即为深度神经网络中的各层预先配置参数)。例如,如果网络的预测值过高,则调整权重矩阵中的权重的值从而降低预测值,经过不断的调整,直到神经网络输出的值接近期望值或者等于期望值。具体地,可以通过损失函数(loss function)或目标函数(objective function)来衡量神经网络的预测值和期望值之间的差异。以损失函数举例,损失函数的输出值(loss)越高表示差异越大,神经网络的训练可以理解为尽可能缩小loss的过程。The training or updating process mentioned in the present application can be performed by the training module 202a. It is understandable that the training process of the neural network is to learn the way to control the spatial transformation, more specifically, to learn the weight matrix. The purpose of training the neural network is to make the output of the neural network as close to the expected value as possible. Therefore, the weight vector of each layer of the neural network in the neural network can be updated according to the difference between the predicted value and the expected value of the current network (of course, the weight vector can usually be initialized before the first update, that is, the parameters of each layer in the deep neural network are pre-configured). For example, if the predicted value of the network is too high, the value of the weight in the weight matrix is adjusted to reduce the predicted value, and after continuous adjustment, the value output by the neural network is close to or equal to the expected value. Specifically, the difference between the predicted value and the expected value of the neural network can be measured by a loss function or an objective function. Taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. The training of the neural network can be understood as a process of minimizing the loss as much as possible.
如图2a所示,根据训练模块202a训练得到目标模型/规则201a,该目标模型/规则201a在本申请实施例中可以是本申请中的第一模型。As shown in FIG. 2 a , a target model/rule 201 a is obtained through training according to the training module 202 a . In the embodiment of the present application, the target model/rule 201 a may be the first model in the present application.
其中,在训练阶段,数据库230a可以用于存储有用于训练的样本集。执行设备210a生成用于处理样本的目标模型/规则201a,并利用数据库中的样本集合对目标模型/规则201a进行迭代训练,得到成熟的目标模型/规则201a,该目标模型/规则201a具体表现为神经网络。执行设备210a得到的神经网络可以应用不同的系统或设备中。In the training phase, the database 230a can be used to store sample sets for training. The execution device 210a generates a target model/rule 201a for processing samples, and iteratively trains the target model/rule 201a using the sample set in the database to obtain a mature target model/rule 201a, which is specifically represented by a neural network. The neural network obtained by the execution device 210a can be applied to different systems or devices.
在推理阶段,执行设备210a可以调用数据存储系统250a中的数据、代码等,也可以将数据、指令等存入数据存储系统250a中。数据存储系统250a可以置于执行设备210a中,也可以为数据存储系统250a相对执行设备210a是外部存储器。计算模块211a可以通过神经网络对执行设备210a获取到的样本进行处理,得到预测结果,预测结果的具体表现形式与神经网络的功能相关。In the inference phase, the execution device 210a can call the data, code, etc. in the data storage system 250a, or store the data, instructions, etc. in the data storage system 250a. The data storage system 250a can be placed in the execution device 210a, or the data storage system 250a can be an external memory relative to the execution device 210a. The calculation module 211a can process the samples obtained by the execution device 210a through the neural network to obtain the prediction result. The specific expression form of the prediction result is related to the function of the neural network.
需要说明的是,附图2a仅是本申请实施例提供的一种系统架构的示例性的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如,在附图2a中,数据存储系统250a相对执行设备 210a是外部存储器,在其它场景中,也可以将数据存储系统250a置于执行设备210a中。It should be noted that FIG. 2a is only an exemplary schematic diagram of a system architecture provided in an embodiment of the present application, and the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 2a, the data storage system 250a is located relative to the execution device 210a is an external memory. In other scenarios, the data storage system 250a may also be placed in the execution device 210a.
根据训练模块202a训练得到的目标模型/规则201a可以应用于不同的系统或设备中,如应用于手机,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端设备等。The target model/rule 201a trained by the training module 202a can be applied to different systems or devices, such as mobile phones, tablet computers, laptops, augmented reality (AR)/virtual reality (VR), vehicle terminals, etc., and can also be servers or cloud devices.
具体地,一种可能的实现方式中,请参阅图2b,图2b为本申请实施例提供的车辆的位置获取方法的一种流程示意图。该方法可以由如图2a中所示的执行设备210a执行。该方法具体包括:201b,获取第一信息和第二信息,第一信息包括自车周围的车辆的信息,第二信息包括自车周围的车道的信息;202b,将第一信息和第二信息输入第一模型中,得到第一模型生成的预测信息,预测信息包括自车周围的车辆在第一时间内的预测位置信息。Specifically, in a possible implementation, please refer to FIG. 2b, which is a flow chart of a method for obtaining the position of a vehicle provided in an embodiment of the present application. The method can be executed by an execution device 210a as shown in FIG. 2a. The method specifically includes: 201b, obtaining first information and second information, the first information including information about vehicles around the vehicle, and the second information including information about lanes around the vehicle; 202b, inputting the first information and the second information into a first model to obtain prediction information generated by the first model, the prediction information including predicted position information of vehicles around the vehicle within a first time.
以上结合介绍了本申请实施例的应用架构,接下来针对本申请实施例提供的车辆的位置获取方法进行详细描述。The above is an introduction to the application architecture of the embodiment of the present application. Next, the vehicle position acquisition method provided in the embodiment of the present application is described in detail.
首先,为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的相关术语和概念进行介绍。First of all, in order to better understand the solutions of the embodiments of the present application, the relevant terms and concepts that may be involved in the embodiments of the present application are introduced below.
(1)神经网络(1) Neural Network
神经网络可以是由神经单元组成的,具体可以理解为具有输入层、隐含层、输出层的神经网络,一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。其中,具有很多层隐含层的神经网络则称为深度神经网络(deep neural network,DNN)。神经网络中的每一层的工作可以用数学表达式来描述,从物理层面,神经网络中的每一层的工作可以理解为通过五种对输入空间(输入向量的集合)的操作,完成输入空间到输出空间的变换(即矩阵的行空间到列空间),这五种操作包括:1、升维/降维;2、放大/缩小;3、旋转;4、平移;5、“弯曲”。其中1、2、3的操作由完成,4的操作由“+b”完成,5的操作则由“a()”来实现。这里之所以用“空间”二字来表述是因为被分类的对象并不是单个事物,而是一类事物,空间是指这类事物所有个体的集合,其中,W是神经网络各层的权重矩阵,该矩阵中的每一个值表示该层的一个神经元的权重值。该矩阵W决定着上文所述的输入空间到输出空间的空间变换,即神经网络每一层的W控制着如何变换空间。训练神经网络的目的,也就是最终得到训练好的神经网络的所有层的权重矩阵。因此,神经网络的训练过程本质上就是学习控制空间变换的方式,更具体地就是学习权重矩阵。A neural network can be composed of neural units. Specifically, it can be understood as a neural network with an input layer, a hidden layer, and an output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in between are all hidden layers. Among them, a neural network with many hidden layers is called a deep neural network (DNN). The work of each layer in a neural network can be described by mathematical expressions. From a physical level, the work of each layer in a neural network can be understood as completing the transformation from input space to output space (i.e., the row space to column space of a matrix) through five operations on the input space (a set of input vectors). These five operations include: 1. Dimension increase/reduction; 2. Enlargement/reduction; 3. Rotation; 4. Translation; 5. "Bending". Among them, operations 1, 2, and 3 are completed by, operation 4 is completed by "+b", and operation 5 is implemented by "a()". The reason why the word "space" is used here is that the classified object is not a single thing, but a class of things. Space refers to the collection of all individuals of this type of thing. Among them, W is the weight matrix of each layer of the neural network. Each value in the matrix represents the weight value of a neuron in this layer. The matrix W determines the spatial transformation from the input space to the output space described above, that is, the W of each layer of the neural network controls how to transform the space. The purpose of training a neural network is to eventually obtain the weight matrices of all layers of the trained neural network. Therefore, the training process of a neural network is essentially learning how to control spatial transformation, more specifically learning the weight matrix.
(2)卷积神经网络(2) Convolutional Neural Network
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使同一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。Convolutional neural network (CNN) is a deep neural network with convolutional structure. Convolutional neural network contains a feature extractor composed of convolution layer and subsampling layer. The feature extractor can be regarded as a filter, and the convolution process can be regarded as convolving the same trainable filter with an input image or convolution feature plane (feature map). Convolution layer refers to the neuron layer in convolutional neural network that performs convolution processing on the input signal. In the convolution layer of convolutional neural network, a neuron can only be connected to some neurons in the adjacent layer. A convolution layer usually contains several feature planes, each of which can be composed of some rectangular arranged neural units. The neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of position. The implicit principle is that the statistical information of a part of the image is the same as that of other parts. This means that the image information learned in a part can also be used in another part. So for all positions on the image, the same learned image information can be used. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally speaking, the more convolution kernels there are, the richer the image information reflected by the convolution operation.
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network. In addition, the direct benefit of shared weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
(3)深度神经网络(3) Deep Neural Networks
深度神经网络(Deep Neural Network,DNN),也称多层神经网络,可以理解为具有很多层隐含层的神经网络,这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数 都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:其中,是输入向量,是输出向量,是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量经过如此简单的操作得到输出向量由于DNN层数多,则系数W和偏移向量的数量也就很多了。这些参数在DNN中的定义如下该:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。Deep Neural Network (DNN), also known as multi-layer neural network, can be understood as a neural network with many hidden layers. There is no special metric for "many" here. According to the position of different layers in DNN, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in between is All are hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN looks complicated, the work of each layer is not complicated. In simple terms, it can be expressed as the following linear relationship: in, is the input vector, is the output vector, is the bias vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer simply performs such a simple operation on the input vector to obtain the output vector. Since the number of DNN layers is large, the coefficient W and the bias vector The number of these parameters is also very large. The definition of these parameters in DNN is as follows: Take the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the layer number of coefficient W, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input.
总结就是:第L-1层的第k个神经元到第L层的第j个神经元的系数定义为 In summary, the coefficients from the kth neuron in the L-1th layer to the jth neuron in the Lth layer are defined as
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。It should be noted that the input layer does not have a W parameter. In a deep neural network, more hidden layers allow the network to better describe complex situations in the real world. Theoretically, the more parameters a model has, the higher its complexity and the greater its "capacity", which means it can complete more complex learning tasks. Training a deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by many layers of vectors W).
(4)损失函数(loss function)(4) Loss function
在训练神经网络的过程中,因为希望神经网络的输出尽可能的接近真正想要预测的值,可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重矩阵(当然,在第一次更新之前通常会有初始化的过程,即为神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重矩阵让它预测低一些,不断的调整,直到神经网络能够预测出真正想要的目标值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数或目标函数,它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么神经网络的训练就变成了尽可能缩小这个loss的过程。例如,在分类任务中,损失函数用于表征预测类别与真实类别之间的差距,交叉熵损失函数(cross entropy loss)则是分类任务中常用的损失函数。In the process of training a neural network, because we want the output of the neural network to be as close as possible to the value we really want to predict, we can compare the predicted value of the current network with the target value we really want, and then update the weight matrix of each layer of the neural network according to the difference between the two (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the neural network). For example, if the predicted value of the network is high, adjust the weight matrix to make it predict lower, and keep adjusting until the neural network can predict the target value we really want. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function or objective function, which is an important equation for measuring the difference between the predicted value and the target value. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, so the training of the neural network becomes a process of minimizing this loss as much as possible. For example, in classification tasks, the loss function is used to characterize the gap between the predicted category and the true category, and the cross entropy loss function (cross entropy loss) is a commonly used loss function in classification tasks.
在神经网络的训练过程中,可以采用误差反向传播(back propagation,BP)算法修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中的参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。In the training process of the neural network, the error back propagation (BP) algorithm can be used to correct the size of the parameters in the initial neural network model, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the forward transmission of the input signal to the output will generate error loss, and the parameters in the initial neural network model are updated by back propagating the error loss information, so that the error loss converges. The back propagation algorithm is a back propagation movement dominated by error loss, which aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
(5)Transformer结构(5) Transformer structure
Transformer结构是一种包含编码器与解码器的特征提取网络(类别于卷积神经网络)。The Transformer structure is a feature extraction network that includes an encoder and a decoder (similar to a convolutional neural network).
编码器:通过自注意力的方式在全局感受野下进行特征学习,例如像素点的特征。Encoder: Performs feature learning in the global receptive field through self-attention, such as pixel features.
解码器:通过自注意力与交叉注意力来学习所需模块的特征,例如输出框的特征。Decoder: Learn the features of the required modules, such as the features of the output box, through self-attention and cross-attention.
(6)注意力机制(attention mechanism)(6) Attention mechanism
注意力机制模仿了生物观察行为的内部过程,即一种将内部经验和外部感觉对齐从而增加部分区域的观察精细度的机制,能够利用有限的注意力资源从大量信息中快速筛选出高价值信息。注意力机制可以快速提取稀疏数据的重要特征,因而被广泛用于自然语言处理任务,特别是机器翻译。而自注意力机制(self-attention mechanism)是注意力机制的改进,其减少了对外部信息的依赖,更擅长捕捉数据或特征的内部相关性。注意力机制的本质思想可以改写为如下公式:The attention mechanism imitates the internal process of biological observation behavior, that is, a mechanism that aligns internal experience and external sensations to increase the observation precision of some areas, and can use limited attention resources to quickly filter out high-value information from a large amount of information. The attention mechanism can quickly extract important features of sparse data, and is therefore widely used in natural language processing tasks, especially machine translation. The self-attention mechanism is an improvement on the attention mechanism, which reduces dependence on external information and is better at capturing the internal correlation of data or features. The essential idea of the attention mechanism can be rewritten as the following formula:
自注意力机制通过QKV提供了一种有效的捕捉全局上下文信息的建模方式。假定输入为Q(query),以键值对(K,V)形式存储上下文。那么,注意力机制其实是query到一系列键值对(key,value)上的映射函数。attention函数的本质可以被描述为一个查询(query)到一系列(键key-值value)对的映射。attention本质上是为序列中每个元素都分配一个权重系数,这也可以理解为软寻址。如果序列中每一个元素都以(K,V)形式存储,那么attention则通过计算Q和K的相似度来完成寻址。Q和K计算出来的相似度反映了取出来的V值的重要程度,即权重,然后加权求和就得到最后的特征值。The self-attention mechanism provides an effective modeling method to capture global context information through QKV. Assume that the input is Q (query), and the context is stored in the form of a key-value pair (K, V). Then, the attention mechanism is actually a mapping function from query to a series of key-value pairs (key, value). The essence of the attention function can be described as a mapping from a query to a series of (key-value) pairs. Attention essentially assigns a weight coefficient to each element in the sequence, which can also be understood as soft addressing. If each element in the sequence is stored in the form of (K, V), then attention completes the addressing by calculating the similarity between Q and K. The similarity calculated between Q and K reflects the importance of the retrieved V value, that is, the weight, and then the weighted sum is used to obtain the final eigenvalue.
注意力的计算主要分为三步,第一步是将query和每个key进行相似度计算得到权重,常用的相似度函数有点积,拼接,感知机等;然后第二步一般是使用一个softmax函数(一方面可以进行归一化, 得到所有权重系数之和为1的概率分布。另一方面可以用softmax函数的特性突出重要元素的权重)对这些权重进行归一化;最后将权重和相应的键值value进行加权求和得到最后的特征值。The calculation of attention is mainly divided into three steps. The first step is to calculate the similarity between the query and each key to obtain the weight. Commonly used similarity functions include dot product, concatenation, perceptron, etc. Then the second step is generally to use a softmax function (on the one hand, it can be normalized, The probability distribution that the sum of all weight coefficients is 1 is obtained. On the other hand, the characteristics of the softmax function can be used to highlight the weights of important elements) to normalize these weights; finally, the weights and the corresponding key values are weighted and summed to obtain the final eigenvalues.
另外,注意力包括自注意力与交叉注意力,自注意可以理解为是特殊的注意力,即QKV的输入一致。而交叉注意力中的QKV的输入不一致。注意力是利用特征之间的相似程度(例如内积)作为权重来集成被查询特征作为当前特征的更新值。自注意力是基于特征图本身的关注而提取的注意力。In addition, attention includes self-attention and cross-attention. Self-attention can be understood as a special attention, that is, the input of QKV is consistent. However, the input of QKV in cross-attention is inconsistent. Attention uses the similarity between features (such as inner product) as weight to integrate the queried features as the update value of the current feature. Self-attention is the attention extracted based on the attention of the feature map itself.
(7)多层感知机(Multi-Layer Perceptron,MLP)(7) Multi-Layer Perceptron (MLP)
多层感知机,也可以称为多层感知器,是一种前馈人工神经网络模型。MLP是一种基于全连接(Fully-Connected,FC)前向结构的人工神经网络(Artificial Neural Network,ANN),其包含从十几个到成百上千不等的人工神经元(Artificial Neuron,AN,下文简称神经元)。MLP将神经元组织为多层的结构,层间采用全连接方法,形成逐层连接的多权连接层的ANN,其基本结构如图3所示,将MLP各个包含计算的全连接层从1开始进行编号,总层数为L,输入层编号设置为0,将MLP的全连接层分为奇数层和偶数层两大类。一般来讲,MLP包含一个输入层(该层实际不包含运算)、一个或者多个隐层以及一个输出层。Multilayer Perceptron, also known as Multilayer Perceptron, is a feedforward artificial neural network model. MLP is an artificial neural network (ANN) based on a fully connected (FC) forward structure, which contains artificial neurons ranging from a dozen to hundreds of thousands (AN, hereinafter referred to as neurons). MLP organizes neurons into a multi-layer structure, and uses a full connection method between layers to form an ANN with multi-weighted connection layers connected layer by layer. Its basic structure is shown in Figure 3. The fully connected layers of MLP that contain calculations are numbered from 1, and the total number of layers is L. The input layer number is set to 0, and the fully connected layers of MLP are divided into two categories: odd layers and even layers. Generally speaking, MLP contains an input layer (this layer does not actually contain calculations), one or more hidden layers, and an output layer.
(8)特征、标签和样本(8) Features, labels, and samples
特征是指输入变量,即简单线性回归中的x变量,简单的机器学习任务可能会使用单个特征,而比较复杂的机器学习任务可能会使用数百万个特征。A feature is an input variable, i.e. the x variable in a simple linear regression. A simple machine learning task might use a single feature, while a more complex machine learning task might use millions of features.
标签是简单线性回归中的y变量,标签可以包括多种含义。在本申请的一些实施例中,标签可以是指输入的数据的分类类别。通过给输入的不同类别的数据各打上一个标签,该标签就用于向计算设备指示该数据代表的具体信息。因此,给数据打标签,就是告诉计算设备,输入变量的多个特征描述的是什么(即y),y可以称之为label,也可以称之为target(即目标值)。The label is the y variable in simple linear regression, and the label can include multiple meanings. In some embodiments of the present application, the label can refer to the classification category of the input data. By labeling each of the different categories of input data, the label is used to indicate to the computing device the specific information represented by the data. Therefore, labeling the data is to tell the computing device what the multiple features of the input variable describe (i.e., y). y can be called a label or a target (i.e., a target value).
样本是指数据的特定实例,一个样本x代表的是一个对象,样本x通常用一个特征向量x=(x1,x2,…,xd)∈Rd表示,其中,d代表样本x的维度(即特征个数),样本分为有标签样本和无标签样本,有标签样本同时包含特征和标签,无标签样本包含特征但不包含标签,机器学习的任务往往就是学习输入的d维训练样本集(可简称为训练集)中潜在的模式。A sample refers to a specific instance of data. A sample x represents an object. Sample x is usually represented by a feature vector x=(x1,x2,…,xd)∈Rd, where d represents the dimension of sample x (i.e. the number of features). Samples are divided into labeled samples and unlabeled samples. Labeled samples contain both features and labels, while unlabeled samples contain features but not labels. The task of machine learning is often to learn the potential patterns in the input d-dimensional training sample set (which can be simply referred to as the training set).
(9)反向传播算法(9) Back propagation algorithm
在神经网络的训练过程中,可以采用误差反向传播(back propagation,BP)算法修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中的参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。In the training process of the neural network, the error back propagation (BP) algorithm can be used to correct the size of the parameters in the initial neural network model, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the forward transmission of the input signal to the output will generate error loss, and the parameters in the initial neural network model are updated by back propagating the error loss information, so that the error loss converges. The back propagation algorithm is a back propagation movement dominated by error loss, which aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
(10)主干网络(Backbone)(10) Backbone
在检测器、分割器或分类器等中用来对输入信息做特征提取的网络结构。通常,在神经网络中,除了主干网络之外,还可以包括其他的功能性网络,如区域生成网络(region proposal network,RPN)、特征金字塔网络(feature Pyramid network,FPN)等网络,用于对主干网络提取到的特征进行进一步处理,如识别特征的分类、对特征进行语义分割等。A network structure used to extract features from input information in detectors, segmenters, or classifiers. Usually, in addition to the backbone network, neural networks can also include other functional networks, such as region proposal networks (RPNs) and feature pyramid networks (FPNs), which are used to further process the features extracted by the backbone network, such as identifying the classification of features and performing semantic segmentation on features.
(11)矩阵乘操作(MatMul)(11) Matrix multiplication operation (MatMul)
矩阵乘法是一种根据两个矩阵得到第三个矩阵的二元运算,第三个矩阵即前两者的乘积,通常也称为矩阵积。矩阵可以用来表示线性映射,矩阵积则可以用来表示线性映射的复合。Matrix multiplication is a binary operation that obtains a third matrix from two matrices. The third matrix is the product of the first two, and is also commonly called the matrix product. Matrices can be used to represent linear mappings, and matrix products can be used to represent the composition of linear mappings.
(12)归一化函数(12) Normalization function
归一化(softmax)函数又称归一化指数函数,是逻辑函数的一种推广。softmax函数能将一个含任意实数的K维向量Z变换为另一个K维向量σ(Z),使得变换后的向量σ(Z)中的每一个元素的范围都在(0,1)之间,并且所有元素的和为1。softmax函数的计算方式可以如公式一所示。
The normalized (softmax) function, also known as the normalized exponential function, is a generalization of the logistic function. The softmax function can transform a K-dimensional vector Z containing any real number into another K-dimensional vector σ(Z), so that each element of the transformed vector σ(Z) is between (0, 1) and the sum of all elements is 1. The calculation method of the softmax function can be shown in Formula 1.
其中,σ(Z)j表示经过softmax函数变换后的向量中第j个元素的值,Zj表示向量Z中第j个元素的 值,Zk表示向量Z中第k个元素的值,∑表示求和。Among them, σ(Z) j represents the value of the jth element in the vector after the softmax function transformation, and Z j represents the value of the jth element in the vector Z. value, Z k represents the value of the kth element in vector Z, and ∑ represents the sum.
(13)嵌入层(embedding层)(13) Embedding layer
嵌入层可以称为输入嵌入(input embedding)层。当前输入可以为文本输入,例如可以为一段文本,也可以为一个句子。文本可以为中文文本,也可以为英文文本,还可以为其他语言文本。嵌入层在获取当前输入后,可以对该当前输入中各个词进行嵌入处理,可得到各个词的特征向量。在一些实施例中,所述嵌入层包括输入嵌入层和位置编码(positional encoding)层。在输入嵌入层,可以对当前输入中的各个词进行词嵌入处理,从而得到各个词的词嵌入向量。在位置编码层,可以获取各个词在该当前输入中的位置,进而对各个词的位置生成位置向量。在一些示例中,各个词的位置可以为各个词在该当前输入中的绝对位置。当得到当前输入中各个词的词嵌入向量和位置向量时,可以将各个词的位置向量和对应的词嵌入向量进行组合,得到各个词特征向量,即得到该当前输入对应的多个特征向量。多个特征向量可以表示为具有预设维度的嵌入向量。可以设定该多个特征向量中的特征向量个数为M,预设维度为H维,则该多个特征向量可以表示为M×H的嵌入向量。The embedding layer may be referred to as an input embedding layer. The current input may be a text input, for example, a paragraph of text or a sentence. The text may be a Chinese text, an English text, or a text in another language. After obtaining the current input, the embedding layer may embed each word in the current input, and obtain a feature vector of each word. In some embodiments, the embedding layer includes an input embedding layer and a positional encoding layer. In the input embedding layer, each word in the current input may be subjected to word embedding processing, thereby obtaining a word embedding vector of each word. In the positional encoding layer, the position of each word in the current input may be obtained, and then a position vector may be generated for the position of each word. In some examples, the position of each word may be the absolute position of each word in the current input. When the word embedding vector and the position vector of each word in the current input are obtained, the position vector of each word and the corresponding word embedding vector may be combined to obtain a feature vector of each word, that is, to obtain multiple feature vectors corresponding to the current input. Multiple feature vectors may be represented as embedding vectors with preset dimensions. The number of feature vectors in the multiple feature vectors may be set to M, and the preset dimension may be H, so that the multiple feature vectors may be represented as M×H embedding vectors.
下面对本申请实施例提供的车辆的位置获取方法进行描述。该方法可以由车辆的位置获取设备执行,也可以由车辆的位置获取设备的部件(例如处理器、芯片、或芯片系统等)执行。该车辆的位置获取设备可以是云端设备,也可以是车辆或终端设备(例如车载终端、飞机终端等等)等。当然,该方法也可以是由云端设备和车辆构成的系统执行。可选地,该方法可以由车辆的位置获取设备中的CPU处理,也可以由CPU和GPU共同处理,也可以不用GPU,而使用其他适合用于神经网络计算的处理器,本申请不做限制。该方法的应用场景可以用于智能驾驶场景。The following is a description of the vehicle location acquisition method provided in an embodiment of the present application. The method can be executed by the vehicle's location acquisition device, or by a component of the vehicle's location acquisition device (such as a processor, a chip, or a chip system, etc.). The vehicle's location acquisition device can be a cloud device, or it can be a vehicle or terminal device (such as a vehicle-mounted terminal, an aircraft terminal, etc.). Of course, the method can also be executed by a system consisting of a cloud device and a vehicle. Optionally, the method can be processed by a CPU in the vehicle's location acquisition device, or it can be processed jointly by a CPU and a GPU, or it can be processed without a GPU, and other processors suitable for neural network calculations can be used, which is not limited by the present application. The application scenario of the method can be used in intelligent driving scenarios.
结合上述描述,下面开始对本申请实施例提供的车辆位置方法的推理阶段和训练阶段的具体实现流程进行描述。In combination with the above description, the specific implementation process of the reasoning phase and the training phase of the vehicle location method provided in the embodiment of the present application will be described below.
一、推理阶段1. Reasoning Stage
本申请实施例中,推理阶段描述的是执行设备210a如何利用目标模型/规则201a,对采集到的信息数据进行处理以生成预测结果的过程,具体地请参阅图4,图4为本申请实施例提供的车辆的位置获取方法的另一种流程示意图,图4中以本申请实施例应用于自动驾驶领域为例进行说明,该方法可以包括步骤401至步骤403。In the embodiment of the present application, the reasoning stage describes the process of how the execution device 210a uses the target model/rule 201a to process the collected information data to generate a prediction result. Please refer to Figure 4 for details. Figure 4 is another flow chart of the vehicle position acquisition method provided in the embodiment of the present application. Figure 4 takes the embodiment of the present application applied to the field of autonomous driving as an example for explanation. The method may include steps 401 to 403.
401,执行设备获取第一信息和第二信息。401. An execution device obtains first information and second information.
本申请实施例中,第一信息包括自车周围的车辆的信息,第二信息包括所述自车周围的车道的信息,执行设备获取自车周围的车辆和车道的信息。具体地,在一些应用场景中,执行设备可以为自车,则自车可以通过采集设备直接采集自车周围的车辆和车道信息,例如摄像设备、雷达设备等。执行设备还可以采用接收外接的其他设备发送的信息的方式,或从数据库中选取信息的方式等,具体此处不做限定。In the embodiment of the present application, the first information includes information about vehicles around the vehicle, the second information includes information about lanes around the vehicle, and the execution device obtains information about vehicles and lanes around the vehicle. Specifically, in some application scenarios, the execution device may be the vehicle, and the vehicle may directly collect information about vehicles and lanes around the vehicle through a collection device, such as a camera device, a radar device, etc. The execution device may also receive information sent by other external devices, or select information from a database, etc., which are not specifically limited here.
在一种实现中,自车在行驶过程中,为了可以准确预知周围其他车辆是否会影响到自车的行驶安全,是否会对自车的行驶决策造成影响,以及如何基于周围车辆进行自车行驶策略的控制,需要确定位于自车周围的至少一个关联车的行车意图。本申请实施例中的目标车辆为位于自车周围的至少一个关联车中的任意一个。In one implementation, when the vehicle is driving, in order to accurately predict whether other vehicles around it will affect the driving safety of the vehicle, whether it will affect the driving decision of the vehicle, and how to control the driving strategy of the vehicle based on the surrounding vehicles, it is necessary to determine the driving intention of at least one associated vehicle located around the vehicle. The target vehicle in the embodiment of the present application is any one of the at least one associated vehicle located around the vehicle.
应理解,上述“关联车”可以理解为在距离上与自车在一定预设范围内的车辆,也就是基于距离来确定哪些车与自车存在关联关系,进而将这些存在关联关系的车作为自车的关联车;此外“关联车”也可以理解为在未来会影响到自车行驶状态决策的车辆,也就是基于未来是否会对自车的驾驶策略造成影响来确定哪些车与自车存在关联关系,进而将这些存在关联关系的车作为自车的关联车。It should be understood that the above-mentioned "associated vehicles" can be understood as vehicles that are within a certain preset range of distance from the own vehicle, that is, based on the distance, it is determined which vehicles are associated with the own vehicle, and then these associated vehicles are regarded as associated vehicles of the own vehicle; in addition, "associated vehicles" can also be understood as vehicles that will affect the driving state decision of the own vehicle in the future, that is, based on whether it will affect the driving strategy of the own vehicle in the future, it is determined which vehicles are associated with the own vehicle, and then these associated vehicles are regarded as associated vehicles of the own vehicle.
在一种实现方式中,自车的处理器可以基于存储器114中与步骤401相关的软件代码来控制自车上相关的传感器获取到周围的车的车辆信息和车道信息,并基于获取到的信息来确定哪些车为关联车,也就是确定哪些车需要进行意图预测。In one implementation, the processor of the vehicle can control the relevant sensors on the vehicle to obtain vehicle information and lane information of surrounding vehicles based on the software code related to step 401 in the memory 114, and determine which vehicles are associated vehicles based on the acquired information, that is, determine which vehicles need to be predicted for intention.
或者,上述确定目标车辆的过程可以由其他车辆或者云侧的服务器来确定,这里并不限定。Alternatively, the above process of determining the target vehicle can be determined by other vehicles or a server on the cloud side, which is not limited here.
为了能够清楚的预测出目标车辆在未来的行车意图,需要获取到目标车辆的车辆信息、车道信息等,这些信息可以作为进行目标车辆行车意图预测的依据。其中,目标车辆的位置可以为目标车辆在地图中 的绝对位置,也可以是和自车之间的相对位置,可以基于自车所处的绝对位置以及目标车辆和自车之间的相对位置来确定目标车辆的绝对位置。In order to clearly predict the target vehicle's future driving intention, it is necessary to obtain the target vehicle's vehicle information, lane information, etc., which can be used as a basis for predicting the target vehicle's driving intention. Among them, the target vehicle's position can be the target vehicle's position in the map. The absolute position of the target vehicle can also be the relative position between the target vehicle and the own vehicle. The absolute position of the target vehicle can be determined based on the absolute position of the own vehicle and the relative position between the target vehicle and the own vehicle.
以关联车为目标车辆为例,本申请实施例中,可以获取到目标车辆的行车状态信息,其中行车状态信息可以包括目标车辆的位置,具体地,可以通过自车携带的传感器来感知目标车辆的位置,或者是通过和其他车辆、云侧的服务器的交互来获取到目标车辆的位置。Taking the associated vehicle as the target vehicle as an example, in the embodiment of the present application, the driving status information of the target vehicle can be obtained, where the driving status information may include the position of the target vehicle. Specifically, the position of the target vehicle can be sensed by the sensor carried by the vehicle itself, or the position of the target vehicle can be obtained through interaction with other vehicles and servers on the cloud side.
以关联车为目标车辆为例,在一种实现中,可以实时获取到目标车辆的位置,或者是每间隔一段时间获取一次目标车辆的位置。Taking the associated vehicle as the target vehicle as an example, in one implementation, the position of the target vehicle can be acquired in real time, or the position of the target vehicle can be acquired once at a certain interval.
在一种可能的实现中,在获取第一信息和第二信息后,需要对第一信息和第二信息进行预处理以及打标签处理,经过预处理和打标签处理后的数据才能用做第一模型的输入数据。其中,预处理包括数据的异常值处理等基本数据处理操作,在此不再赘述。In a possible implementation, after obtaining the first information and the second information, the first information and the second information need to be preprocessed and labeled, and the data after preprocessing and labeling can be used as input data for the first model. Among them, preprocessing includes basic data processing operations such as data outlier processing, which will not be repeated here.
示例性地,获取和处理第一信息和第二信息的主要步骤如下:Exemplarily, the main steps of acquiring and processing the first information and the second information are as follows:
(1)采集第一信息和第二信息。(1) Collecting first information and second information.
第一信息包括自车周围的车辆的信息。车辆的信息包括8个特征数据,分别为车辆的横坐标、纵坐标、类型、长、宽、高、当前速度、速度的方向(即车辆前进的方向)。具体地采集方式可以为:每间隔0.2s获取一帧向量数据,在历史2s内获取共十帧向量数据,再加上当前帧的向量数据,得到共11帧的向量数据,其中,每帧采集到的向量数据均包括8个特征。假设只取64个数据,则按照自车周围的车辆与自车之间的距离远近,依距离选取,将距离较远的目标的数据删除。采集到的车辆的数据具体可参见下表1。The first information includes the information of the vehicles around the vehicle. The vehicle information includes 8 feature data, namely the vehicle's horizontal coordinate, vertical coordinate, type, length, width, height, current speed, and direction of speed (i.e. the direction in which the vehicle is moving). The specific collection method can be: obtain a frame of vector data every 0.2s, obtain a total of ten frames of vector data within the historical 2s, and add the vector data of the current frame to obtain a total of 11 frames of vector data, wherein each frame of vector data collected includes 8 features. Assuming that only 64 data are taken, the data of the targets that are farther away are selected according to the distance between the vehicles around the vehicle and the vehicle itself, and the data of the targets that are farther away are deleted. The specific data of the collected vehicles can be found in Table 1 below.
第二信息包括自车周围的车道的信息。车道的信息包括8个特征数据。在自车周围的每条车道上取20个路点,每个路点对应8个特征数据,分别为路点的横坐标、纵坐标,车道的类型(例如非机动车道和机动车道),车道能否直行,车道能否左转,车道能否右转,车道能否掉头,车道的编号。具体地采集方式可以为:采集自车周围200米内的所有车道的特征,每条车道取20个路点,每个路点取8个特征。假设只取256条车道的数据,则按照自车周围的车道与自车之间的距离进行排序,将距离较远的目标的数据删除。一种实现方式中,可以采用最远点采样的方式在车道上选取路点。采集到的车道的数据具体可参见下表1。
The second information includes the information of the lanes around the vehicle. The lane information includes 8 feature data. Take 20 waypoints on each lane around the vehicle, and each waypoint corresponds to 8 feature data, which are the horizontal and vertical coordinates of the waypoint, the type of lane (such as non-motorized vehicle lane and motor vehicle lane), whether the lane can go straight, whether the lane can turn left, whether the lane can turn right, whether the lane can turn around, and the lane number. The specific collection method can be: collect the features of all lanes within 200 meters around the vehicle, take 20 waypoints for each lane, and take 8 features for each waypoint. Assuming that only 256 lanes of data are taken, they are sorted according to the distance between the lanes around the vehicle and the vehicle, and the data of targets that are farther away are deleted. In one implementation method, the waypoints on the lane can be selected by sampling the farthest point. The collected lane data can be specifically referred to in Table 1 below.
表1Table 1
可以理解的是,在实际操作过程中,自车周围的车辆的信息和车道的信息中包含的特征数据,可以根据实际需求进行设定,在此不做限定。另外,在本实施例中,仅是对自车周围的车辆的位置进行预测,因此仅采集了自车周围的车辆的信息,在实际生活中,自车周围还包括非机动车以及行人等障碍物,在此基础上,也可以对采集自车周围的非机动车和行人的信息,采用本申请实施例提供的位置获取方法,对非机动车和行人的位置进行预测。It is understandable that, in actual operation, the characteristic data contained in the information of vehicles around the vehicle and the lane information can be set according to actual needs, and no limitation is made here. In addition, in this embodiment, only the positions of vehicles around the vehicle are predicted, so only the information of vehicles around the vehicle is collected. In real life, obstacles such as non-motor vehicles and pedestrians are also included around the vehicle. On this basis, the information of non-motor vehicles and pedestrians around the vehicle can also be collected, and the positions of non-motor vehicles and pedestrians can be predicted by using the position acquisition method provided in the embodiment of the present application.
(2)打标签处理(2) Labeling
在本申请实施例中,通过实际获取车辆在未来的第一时间内行驶的轨迹,并对这些轨迹数据打上标签,以供后续训练阶段的使用。类别标签包括:In the embodiment of the present application, the trajectory of the vehicle in the first time in the future is actually obtained, and the trajectory data is labeled for use in the subsequent training stage. The category labels include:
轨迹标签:表示目标车辆在第一时间内的真实行驶轨迹。例如,在未来3s内行驶的真实轨迹,每隔0.2s采集目标车辆的位置,一共15个点,每个位置包括x和y坐标,输出(15,2)的数据。Track label: indicates the actual driving track of the target vehicle in the first time. For example, the actual driving track in the next 3 seconds, the position of the target vehicle is collected every 0.2 seconds, a total of 15 points, each position includes x and y coordinates, and the output is (15,2) data.
路口标签:表示目标车辆离开路口时选择的出口车道。Intersection label: Indicates the exit lane selected by the target vehicle when leaving the intersection.
非路口标签:表示目标车辆在第一时间内所在的非出口车道。 Non-intersection label: Indicates the non-exit lane where the target vehicle is located at the first time.
402,执行设备将第一信息和第二信息输入第一模型中,得到第一模型生成的预测信息,预测信息包括自车周围的车辆在第一时间内的预测位置信息。402, the execution device inputs the first information and the second information into the first model to obtain prediction information generated by the first model, where the prediction information includes predicted position information of vehicles around the vehicle within the first time.
本申请实施例中,执行设备在获取到自车周围的车辆和车道的信息后,可以将这些信息输入第一模型中,以根据自车周围的任一车辆的信息和车道的信息,通过第一模型预测得到自车周围的任一车辆的预测位置信息。In an embodiment of the present application, after the execution device obtains the information of vehicles and lanes around the vehicle, it can input this information into the first model, so as to predict the predicted position information of any vehicle around the vehicle through the first model based on the information of any vehicle and lane around the vehicle.
一种实现方式中,第一模型包括基于注意力机制的编码器(endoder)和解码器(decoder)。请参阅图5,图5为本申请实施例提供的第一模型的一种结构示意图。在图5中,执行设备将获取到的第一信息和第二信息输入第一模型中,基于注意力机制的编码器和解码器的结构,输出预测信息。In one implementation, the first model includes an encoder and a decoder based on an attention mechanism. Please refer to Figure 5, which is a schematic diagram of the structure of the first model provided in an embodiment of the present application. In Figure 5, the execution device inputs the acquired first information and second information into the first model, and outputs prediction information based on the structure of the encoder and decoder of the attention mechanism.
下面对本申请实施例的具体实现过程进行说明:The specific implementation process of the embodiment of the present application is described below:
首先,对预测信息的具体内容进行说明。First, the specific content of the prediction information is described.
一种实现方式中,预测位置信息包括自车周围的车辆在第一时间内的预测轨迹信息和第三信息,第三信息指示自车周围的车辆在第一时间内所在的车道。In one implementation, the predicted position information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
该种实现方式中,执行设备从第一模型中得到的预测信息可以包括两方面,一方面是指自车周围的车辆在未来的第一时间内的轨迹信息,第二方面是指第三信息,该信息指示自车周围的车辆在未来的第一时间内所在的车道。一种实现方式中,该第三信息包括自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关联度,目标车辆为自车周围的一个车辆。In this implementation, the prediction information obtained by the execution device from the first model may include two aspects: one aspect refers to the trajectory information of the vehicles around the vehicle in the first time in the future, and the second aspect refers to the third information, which indicates the lanes where the vehicles around the vehicle are located in the first time in the future. In one implementation, the third information includes the correlation between the target vehicle around the vehicle and at least one lane around the vehicle in the first time, and the target vehicle is a vehicle around the vehicle.
可以理解的是,执行设备能够同时采集自车周围的若干车辆的信息,并输出若干车辆的预测信息,在需要得到某一目标车辆的预测信息时,直接从若干车辆的预测信息中获取即可。而第三信息中所述的关联度,具体是指目标车辆与自车周围的车道在第一时间内的注意力分数,通过将第一信息和第二信息输入第一模型中,基于注意力机制的相关操作,可以得到自车周围的车辆相对于自车周围的每一车道的注意力分数。It is understandable that the execution device can simultaneously collect information about several vehicles around the vehicle and output prediction information about several vehicles. When the prediction information of a target vehicle is needed, it can be directly obtained from the prediction information of several vehicles. The correlation described in the third information specifically refers to the attention score of the target vehicle and the lanes around the vehicle in the first time. By inputting the first information and the second information into the first model, the attention score of the vehicles around the vehicle relative to each lane around the vehicle can be obtained based on the relevant operations of the attention mechanism.
其次,对根据第一信息和第二信息第一模型生成预测信息的具体过程进行说明。Next, the specific process of generating prediction information by the first model based on the first information and the second information is described.
请参阅图6,图6为本申请实施例提供的第一模型的另一种结构示意图。Please refer to FIG. 6 , which is another structural schematic diagram of the first model provided in an embodiment of the present application.
一种实现方式中,第一模型中的编码器包括嵌入模块和注意力模块,解码器包括第一解码器模块和第二解码器模块。下面对各模块进行具体说明。In one implementation, the encoder in the first model includes an embedding module and an attention module, and the decoder includes a first decoder module and a second decoder module. Each module is described in detail below.
1、编码器1. Encoder
编码器包括嵌入模块和注意力模块。The encoder consists of an embedding module and an attention module.
(1)嵌入模块(1) Embedded module
嵌入模块包括第一嵌入模块和第二嵌入模块。The embedded modules include a first embedded module and a second embedded module.
在本申请实施例中,执行设备将第一信息和第二信息输入嵌入模块中,通过对输入序列进行嵌入embedding处理后,得到三个不同的权重矩阵,即矩阵Q,矩阵K和矩阵V。In an embodiment of the present application, the execution device inputs the first information and the second information into an embedding module, and obtains three different weight matrices, namely, matrix Q, matrix K and matrix V, after embedding processing is performed on the input sequence.
一种实现方式中,嵌入模块通过第一嵌入模块处理第一信息,第一嵌入模块包括三个子模块,分别为第一子模块、第二子模块和第三子模块。In one implementation, the embedding module processes the first information through a first embedding module, and the first embedding module includes three submodules, namely a first submodule, a second submodule and a third submodule.
该种实现方式中,具体地,请参阅图7,图7为本申请实施例提供的第一嵌入模块的一种结构示意图。第一子模块具体包括二维卷积层Conv2d1、二维批量归一化层BatchNorm2d、激活函数层ReLU、二维卷积层Conv2d2。第二子模块和第三子模块的组成相同,均包括一维卷积层Conv1d1、一维批量归一化层BatchNorm1d、激活函数层ReLU、一维卷积层Conv1d2。其中,卷积层Conv2d的卷积核大小kernel_size为(1,1),步长stride和补零padding均为默认值。ReLU(x)是一种非线性激活函数,其具体形式为:ReLU(x)=max(0,x)。其中,x表示函数的输入变量。In this implementation, specifically, please refer to Figure 7, which is a structural diagram of the first embedding module provided in an embodiment of the present application. The first submodule specifically includes a two-dimensional convolution layer Conv2d1, a two-dimensional batch normalization layer BatchNorm2d, an activation function layer ReLU, and a two-dimensional convolution layer Conv2d2. The second submodule and the third submodule have the same composition, both including a one-dimensional convolution layer Conv1d1, a one-dimensional batch normalization layer BatchNorm1d, an activation function layer ReLU, and a one-dimensional convolution layer Conv1d2. Among them, the convolution kernel size kernel_size of the convolution layer Conv2d is (1,1), and the step size stride and zero padding are both default values. ReLU(x) is a nonlinear activation function, and its specific form is: ReLU(x) = max(0,x). Among them, x represents the input variable of the function.
示例性地,结合上表1中所举例的内容,第一嵌入模块的处理过程具体为:Exemplarily, in combination with the content exemplified in Table 1 above, the processing process of the first embedded module is specifically as follows:
从第一信息中获得车辆数据(16,64,11,8),位置数据(16,64,2)和时间数据(16,11),并将车辆数据、位置数据和时间数据分别输入第一子模块、第二子模块和第三子模块中,分别输出(16,64,11,256)、(16,64,1,256)、(16,1,11,256)的数据。随后,将这三个数据通过embedding方法进行融合,输出(16,64,11,256)的数据,即矩阵Q的数据形式为(16,64,11,256)。The vehicle data (16, 64, 11, 8), the position data (16, 64, 2) and the time data (16, 11) are obtained from the first information, and the vehicle data, the position data and the time data are respectively input into the first submodule, the second submodule and the third submodule, and the data of (16, 64, 11, 256), (16, 64, 1, 256) and (16, 1, 11, 256) are respectively output. Subsequently, the three data are fused by the embedding method, and the data of (16, 64, 11, 256) is output, that is, the data form of the matrix Q is (16, 64, 11, 256).
其中,16表示batch size,64表示自车周围的车辆的数量(即表1中的数据数量),11表示采集 数据的时间或者次数,2表示车辆的位置(即表1中所示的横坐标和纵坐标)。通俗理解为,取自车周围的64辆车辆的信息为第一信息,以任一车辆为例,间隔11次分11个时间获取该车辆的数据,每一次获取的数据中包括若干个特征数据(对应于表1中的8个特征)。Among them, 16 represents the batch size, 64 represents the number of vehicles around the vehicle (i.e. the number of data in Table 1), and 11 represents the number of collected The time or number of data, 2 represents the position of the vehicle (i.e., the horizontal and vertical coordinates shown in Table 1). In layman's terms, the information obtained from the 64 vehicles around the vehicle is the first information. Taking any vehicle as an example, the data of the vehicle is obtained at 11 intervals and 11 times, and each acquired data includes a number of feature data (corresponding to the 8 features in Table 1).
可以理解是,自车周围的车辆对应的特征数据可以根据实际需求或者试验进行设定,在此仅为举例说明,而不做限定。It can be understood that the characteristic data corresponding to the vehicles around the vehicle can be set according to actual needs or experiments. This is only an example and not a limitation.
一种实现方式中,嵌入模块通过第二嵌入模块处理第二信息,第二嵌入模块包括第四子模块。In one implementation, the embedding module processes the second information through a second embedding module, and the second embedding module includes a fourth submodule.
该种实现方式中,具体地,请参阅图8,图8为本申请实施例提供的第二嵌入模块的一种结构示意图。第四子模块具体包括二维卷积层Conv2d1、二维批量归一化层BatchNorm2d、激活函数层ReLU、二维卷积层Conv2d2。其中,卷积层Conv2d的卷积核大小kernel_size为(1,1),步长stride和补零padding均为默认值。In this implementation, specifically, please refer to FIG8, which is a schematic diagram of the structure of the second embedding module provided in an embodiment of the present application. The fourth submodule specifically includes a two-dimensional convolution layer Conv2d1, a two-dimensional batch normalization layer BatchNorm2d, an activation function layer ReLU, and a two-dimensional convolution layer Conv2d2. Among them, the convolution kernel size kernel_size of the convolution layer Conv2d is (1,1), and the stride and zero padding are both default values.
示例性地,结合上表1中所举例的内容,对第二嵌入模块的处理过程进行具体说明。第二信息的数据形式为(16,256,20,8),并将该数据通过输入第四子模块中,输出(16,256,11,256)的数据,作为矩阵K进而矩阵V的数据形式。对应于上表1可以得知,16是指batch size,256表示车道的数量,20表示每一车道上的20个路点数据,8表示每一路点数据对应的8个特征数据。通俗理解为,取自车周围的256条车道的信息为第二信息,以任一车道为例,取该车道上的20个路点数据,每一路点数据包括8个特征数据。Exemplarily, the processing of the second embedding module is specifically described in combination with the examples in Table 1 above. The data format of the second information is (16, 256, 20, 8), and the data is input into the fourth submodule to output (16, 256, 11, 256) data as the data format of matrix K and then matrix V. Corresponding to Table 1 above, it can be seen that 16 refers to batch size, 256 represents the number of lanes, 20 represents 20 waypoint data on each lane, and 8 represents 8 feature data corresponding to each waypoint data. In layman's terms, the information taken from the 256 lanes around the car is the second information. Taking any lane as an example, 20 waypoint data on the lane are taken, and each waypoint data includes 8 feature data.
可以理解是,自车周围的车道的数据可以根据实际需求或者试验进行设定,在此仅为举例说明,而不做限定。It can be understood that the lane data around the vehicle can be set according to actual needs or tests, and is only used as an example and is not limited here.
需要说明的是,Embedding是一种映射的方法,嵌入模块中所采用卷积和RELU激活函数的方法,仅为其中一种实施方式,在实际操作过程中,也可以采用其它通用的embedding方法对第一信息和第二信息进行embedding处理。It should be noted that Embedding is a mapping method, and the convolution and RELU activation function method used in the embedding module is only one of the implementation methods. In actual operation, other general embedding methods can also be used to embed the first information and the second information.
(2)注意力模块(2) Attention Module
注意力模块的计算主要分为三步,第一步,将query和每个key进行相似度计算得到权重,常用的相似度函数有点积,拼接,感知机等;第二步,使用归一化softmax函数对这些权重进行归一化;第三步,将权重和相应的键值value进行加权求和得到最后的特征值。The calculation of the attention module is mainly divided into three steps. The first step is to calculate the similarity between the query and each key to obtain the weight. Commonly used similarity functions include dot product, concatenation, perceptron, etc.; the second step is to use the normalized softmax function to normalize these weights; the third step is to perform weighted summation of the weight and the corresponding key value to obtain the final eigenvalue.
可以理解的是,归一化函数的主要作用包括:一方面,可以通过归一化得到所有权重系数之和为1的概率分布,通过归一化函数,score转换为一个值分布在0,1之间的矩阵,得到的结果即是每条车道对于当前车辆的相关性大小;另一方面,可以通过softmax的内在机制更加突出重要元素的权重。另外,归一化能够使得训练时的梯度稳定。It is understandable that the main functions of the normalization function include: on the one hand, the probability distribution of the sum of all weight coefficients being 1 can be obtained through normalization. Through the normalization function, the score is converted into a matrix with values distributed between 0 and 1, and the result is the relevance of each lane to the current vehicle; on the other hand, the weights of important elements can be more highlighted through the inherent mechanism of softmax. In addition, normalization can stabilize the gradient during training.
一种实现方式中,将第一信息和第二信息输入第一模型中,基于注意力机制,生成第四信息,第四信息包括自车周围的目标车辆与第一车道集合在第一时间内的关联度,目标车辆为自车周围的一个车辆,第一车道集合包括第二信息中包括的自车周围的所有车道。In one implementation, the first information and the second information are input into the first model, and based on the attention mechanism, the fourth information is generated. The fourth information includes the correlation between the target vehicle around the own vehicle and the first lane set within the first time. The target vehicle is a vehicle around the own vehicle, and the first lane set includes all lanes around the own vehicle included in the second information.
结合图6至图8的内容,在该实现方式中,将矩阵Q,矩阵K和矩阵V作为注意力模块的输入,可选地,分别对矩阵Q,矩阵K和矩阵V进行线性映射,得到第一线性矩阵和第二线性矩阵。然后,以第一线性矩阵作为矩阵Q,以第二线性矩阵作为矩阵K和矩阵V,将矩阵Q,矩阵K和矩阵V作为注意力模块的输入,通过矩阵Q和矩阵K计算每两个输入向量之间的相关性,即自车周围的目标车辆与第一车道集合在第一时间内的关联度,也就是注意力分数,输出第四信息。随后,可选地,通过对第二线性矩阵与第四信息执行矩阵乘运算,得到第六信息,第六信息包括自车周围的目标车辆的预测轨迹信息。Combined with the contents of Figures 6 to 8, in this implementation, matrix Q, matrix K and matrix V are used as inputs of the attention module. Optionally, matrix Q, matrix K and matrix V are linearly mapped respectively to obtain a first linear matrix and a second linear matrix. Then, the first linear matrix is used as matrix Q, the second linear matrix is used as matrix K and matrix V, and matrix Q, matrix K and matrix V are used as inputs of the attention module. The correlation between each two input vectors is calculated through matrix Q and matrix K, that is, the correlation between the target vehicles around the vehicle and the first lane set in the first time, that is, the attention score, and the fourth information is output. Subsequently, optionally, by performing a matrix multiplication operation on the second linear matrix and the fourth information, the sixth information is obtained, and the sixth information includes the predicted trajectory information of the target vehicles around the vehicle.
可以理解的是,本申请实施例中,第一模型主要的结构是采用基于注意力机制的编码器-解码器结构。在实际操作过程中,为了提高预测结果的准确率以及执行设备的安全性,可以通过第一模型的冗余方式,基于多注意力机制进行预测。另外,对于本领域技术人员来说,在实现第一模型的功能的前提下,也可以采用其它神经网络替换该第一模型,对于第一模型的具体结构和组成,在此仅为举例说明,而不做限定。It is understandable that in the embodiment of the present application, the main structure of the first model is an encoder-decoder structure based on the attention mechanism. In the actual operation process, in order to improve the accuracy of the prediction results and the safety of the execution device, the prediction can be made based on the multi-attention mechanism through the redundancy of the first model. In addition, for those skilled in the art, on the premise of realizing the function of the first model, other neural networks can also be used to replace the first model. The specific structure and composition of the first model are only illustrated here by way of example and are not limited.
2、解码器2. Decoder
解码器包括第一解码器模块和第二解码器模块。 The decoder includes a first decoder module and a second decoder module.
(1)第一解码器模块(1) First decoder module
第一解码器模块主要用于对自车周围的预测车道信息进行处理。The first decoder module is mainly used to process the predicted lane information around the vehicle.
一种实现方式中,第一解码器模块的主要工作过程为:In one implementation, the main working process of the first decoder module is:
获取目标车辆所属的道路场景的类别,道路场景的类别包括路口场景和非路口场景;Obtaining the category of the road scene to which the target vehicle belongs, the categories of the road scene include intersection scenes and non-intersection scenes;
根据目标车辆所属的道路场景的类别,从第一车道集合中选取第二车道集合,第二车道集合包括自车周围的车辆在第一时间内所在的车道;According to the category of the road scene to which the target vehicle belongs, a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
从第四信息中获取第五信息,并根据第五信息生成第三信息,第五信息包括目标车辆与第二车道集合在第一时间内的关联度,第三信息包括自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关联度。The fifth information is obtained from the fourth information, and the third information is generated based on the fifth information. The fifth information includes the correlation between the target vehicle and the second lane set within the first time, and the third information includes the correlation between the target vehicle around the own vehicle and at least one lane around the own vehicle within the first time.
该种实现方式中,在得到第四信息,即目标车辆相对于自车周围的每一条车道的注意力分数后,通过目标车辆所在的场景,筛选掉部分数据,以提高预测结果的准确率。示例性地,请参阅图9,图9为本申请实施例提供的第一解码器模块的一种结构示意图。假设目标车辆所属的道路场景为路口场景,则说明车辆在未来的第一时间内会行驶在路口车道上,那么,对于处于非路口场景的车道,可以剔除掉。因此,可以从第一车道集合中选取第二车道集合,将目标车辆相对于所有非路口车道的注意力分数均置为极小值。在筛选出第二车道集合后,从第四信息中获取第五信息,即目标车辆相对于筛选出的每一车道的注意力分数。In this implementation, after obtaining the fourth information, that is, the attention score of the target vehicle relative to each lane around the vehicle, some data is filtered out through the scene where the target vehicle is located to improve the accuracy of the prediction result. For example, please refer to Figure 9, which is a structural diagram of the first decoder module provided in an embodiment of the present application. Assuming that the road scene to which the target vehicle belongs is an intersection scene, it means that the vehicle will travel on the intersection lane in the first time in the future, then the lanes in the non-intersection scene can be eliminated. Therefore, the second lane set can be selected from the first lane set, and the attention scores of the target vehicle relative to all non-intersection lanes are set to the minimum value. After the second lane set is filtered out, the fifth information is obtained from the fourth information, that is, the attention score of the target vehicle relative to each filtered lane.
可以理解的是,目标车辆所属的道路场景的类别可以直接通过地图获取,也可以通过其他方式获取,在此不做限定。It is understandable that the category of the road scene to which the target vehicle belongs can be directly obtained through a map or through other methods, which are not limited here.
一种实现方式中,根据第五信息生成第三信息的具体过程可以包括:In one implementation, a specific process of generating the third information according to the fifth information may include:
对第五信息进行归一化操作,得到归一化后的第五信息,然后将归一化后的第五信息输入多层感知机中,得到第三信息。A normalization operation is performed on the fifth information to obtain normalized fifth information, and then the normalized fifth information is input into a multi-layer perceptron to obtain the third information.
该种实现方式中,在根据场景需求从第四信息中筛选出第五信息后,通过对第五信息进行归一化操作,以使归一化后的第五信息中的每一个注意力分数的范围都在(0,1)之间,并且所有元素的和为1。随后,将归一化后的第五信息输入多层感知机中,在感知机的作用下,输出目标车辆与自车周围的车道在第一时间内的关联度,给出他车未来行驶的意向车道及概率。In this implementation, after the fifth information is selected from the fourth information according to the scene requirements, the fifth information is normalized so that the range of each attention score in the normalized fifth information is between (0, 1), and the sum of all elements is 1. Subsequently, the normalized fifth information is input into a multi-layer perceptron, under the action of the perceptron, the correlation between the target vehicle and the lanes around the vehicle in the first time is output, and the intended lane and probability of the other vehicle in the future are given.
(2)第二解码器模块(2) Second decoder module
第二解码器模块主要用于对自车周围的目标车辆的预测轨迹信息进行处理。The second decoder module is mainly used to process the predicted trajectory information of target vehicles around the vehicle.
在一种实现方式中,第二解码器模块的主要工作过程为:In one implementation, the main working process of the second decoder module is:
将第六信息输入多层感知机中,得到自车周围的车辆在所述第一时间内的预测轨迹信息。The sixth information is input into a multi-layer perceptron to obtain predicted trajectory information of vehicles around the vehicle within the first time.
该种实现方式中,第二解码器模块包括MLP,通过将第六信息输入第二解码器模块中,输出目标车辆为未来第一时间内的轨迹。示例性地,假设输出未来3s的行驶轨迹,每0.2s输出一个点,则总共可以输出15个点的坐标。In this implementation, the second decoder module includes an MLP, and the sixth information is input into the second decoder module to output the trajectory of the target vehicle within the first future time. For example, assuming that the driving trajectory of the next 3 seconds is output, and one point is output every 0.2 seconds, the coordinates of a total of 15 points can be output.
可以理解的是,本申请实施例中第二解码器模块可以采用多种方法得到车辆的预测轨迹,上述方法仅为其中一种示例说明,在此不作限定。It is understandable that in the embodiment of the present application, the second decoder module can use a variety of methods to obtain the predicted trajectory of the vehicle. The above method is only an example and is not limited here.
403,执行设备将第一车道确定为目标车辆在第一时间内所在的车道,其中,第一车道为自车周围的至少一个车道中与目标车辆在第一时间内的关联度最高的一个车道。403, the execution device determines the first lane as the lane where the target vehicle is located in the first time, wherein the first lane is a lane with the highest correlation with the target vehicle in the first time among at least one lane around the vehicle.
本申请实施例中,在将第一信息和第二信息输入第一模型中后,通过第一模型输出的是自车周围的车辆在第一时间内的预测位置信息以及第三信息,其中,第三信息包括自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关联度,即目标车辆与每一条车道对应的注意力分数,或者可以理解为,目标车辆未来行驶在每一条车道的概率。在此基础上,一种实现方式中,执行设备可以从这多条车道对应的注意力分数中,选择注意力分数最高的车道,作为目标车辆在第一时间内所在的预测车道。In the embodiment of the present application, after the first information and the second information are input into the first model, the output of the first model is the predicted position information of the vehicles around the vehicle within the first time and the third information, wherein the third information includes the correlation between the target vehicle around the vehicle and at least one lane around the vehicle within the first time, that is, the attention score corresponding to the target vehicle and each lane, or it can be understood as the probability of the target vehicle driving in each lane in the future. On this basis, in one implementation method, the execution device can select the lane with the highest attention score from the attention scores corresponding to the multiple lanes as the predicted lane where the target vehicle is located within the first time.
可以理解是,步骤403为可选步骤。在步骤402中,通过第一模型生成的预测信息中包括的第三信息,该第三信息指示自车周围的车辆在第一时间内所在的车道,而具体是怎么通过第三信息进一步确定自车周围的车辆在第一时间内所在的车道,可以扩展出多种实施方式。一种实现方式中,执行设备将关联度最高的一个车道作为自车周围的车辆在第一时间内所在的车道,而在实际应用过程中,可以将注意 力分数的分值作为其中一种预测依据,结合其它特征或者方法进行位置预测,在此不做限定。It can be understood that step 403 is an optional step. In step 402, the third information included in the prediction information generated by the first model indicates the lanes where the vehicles around the vehicle are located in the first time. How to further determine the lanes where the vehicles around the vehicle are located in the first time through the third information can be expanded into multiple implementation methods. In one implementation, the execution device uses the lane with the highest correlation as the lane where the vehicles around the vehicle are located in the first time. In actual application, the attention can be The force score is used as one of the prediction bases, and is combined with other features or methods to predict the position, which is not limited here.
在本申请实施例中,能够根据目标车辆的实际情况选择不同数量的车道作为考虑的环境特征,并通过采用注意力机制的方式,将目标车辆轨迹与周围的车道特征结合起来形成注意力特征,使得网络在学习的时候,可以关注到周边车道的特征,让车辆轨迹预测结果更加符合实际驾驶规则,也提升了预测结果的准确度。In an embodiment of the present application, different numbers of lanes can be selected as environmental features to be considered according to the actual situation of the target vehicle, and the target vehicle trajectory can be combined with the surrounding lane features to form an attention feature by adopting an attention mechanism, so that the network can pay attention to the characteristics of the surrounding lanes when learning, making the vehicle trajectory prediction results more in line with actual driving rules and improving the accuracy of the prediction results.
二、训练阶段2. Training Phase
本申请实施例中,训练阶段描述的是训练设备220如何利用数据库230a中的图像数据集合生成成熟的神经网络的过程,具体地,请参阅图10,图10为本申请实施例提供的模型的训练方法的一种流程示意图,本申请实施例提供的模型的训练方法可以包括:In the embodiment of the present application, the training stage describes the process of how the training device 220 generates a mature neural network using the image data set in the database 230a. Specifically, please refer to FIG. 10, which is a flow chart of the training method of the model provided in the embodiment of the present application. The training method of the model provided in the embodiment of the present application may include:
1001,训练设备获取第一信息和第二信息,第一信息包括自车周围的车辆的信息,第二信息包括所述自车周围的车道的信息。1001. A training device obtains first information and second information, where the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle.
本申请实施例中,训练设备获取第一信息和第二信息的数据集,将数据集划分为训练集、验证集和测试集,用训练集训练模型,验证集进行调参,测试集进行性能评价。其中,训练集、验证集、测试集的数据划分比例可以根据实际需求设定,在此不做限定。In the embodiment of the present application, the training device obtains a data set of the first information and the second information, divides the data set into a training set, a validation set, and a test set, trains the model with the training set, adjusts the parameters with the validation set, and evaluates the performance with the test set. The data division ratio of the training set, validation set, and test set can be set according to actual needs and is not limited here.
本申请实施例中,训练设备执行步骤1001的具体实现方式可以参阅图4对应实施例中步骤401的具体实现方式的描述,此处不做赘述。In the embodiment of the present application, the specific implementation method of the training device executing step 1001 can refer to the description of the specific implementation method of step 401 in the embodiment corresponding to Figure 4, which will not be repeated here.
可以理解的是,在对第一模型进行训练时,所使用的训练样本包括自车周围的车辆的完整信息和自车周围的车道的完整信息,从而使第一模型输出的位置信息也更准确。It can be understood that when training the first model, the training samples used include complete information about vehicles around the vehicle and complete information about lanes around the vehicle, so that the position information output by the first model is also more accurate.
1002,训练设备将第一信息和所述第二信息输入第一模型中,得到第一模型生成的预测信息,预测信息包括自车周围的车辆在第一时间内的预测位置信息。1002. The training device inputs the first information and the second information into the first model to obtain prediction information generated by the first model, where the prediction information includes predicted position information of vehicles around the vehicle within the first time.
本申请实施例中,训练设备将获取到的第一信息和第二信息输入第一模型中,得到第一模型生成的预测信息。一种实现方式中,预测信息包括自车周围的车辆在第一时间内的预测轨迹信息和第三信息,第三信息包括自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关联度。参考本申请的上述实施例,第一模型包括编码器和解码器,解码器包括第一解码器模块和第二解码器模块,其中,训练设备将第一信息和第二信息输入第一模型的编码器后,会通过第一解码器模块输出第三信息,通过第二解码器模块输出自车周围的车辆在第一时间内的预测轨迹信息。In an embodiment of the present application, the training device inputs the acquired first information and second information into the first model to obtain prediction information generated by the first model. In one implementation, the prediction information includes the predicted trajectory information of vehicles around the vehicle within the first time and third information, and the third information includes the correlation between the target vehicle around the vehicle and at least one lane around the vehicle within the first time. Referring to the above embodiment of the present application, the first model includes an encoder and a decoder, and the decoder includes a first decoder module and a second decoder module, wherein after the training device inputs the first information and the second information into the encoder of the first model, it will output the third information through the first decoder module, and output the predicted trajectory information of the vehicles around the vehicle within the first time through the second decoder module.
本申请实施例中,训练设备执行步骤1302的具体实现方式可以参阅图4对应实施例中步骤402的具体实现方式的描述,此处不做赘述。In the embodiment of the present application, the specific implementation method of the training device executing step 1302 can refer to the description of the specific implementation method of step 402 in the embodiment corresponding to Figure 4, which will not be repeated here.
1003,训练设备根据损失函数对第一模型进行训练,损失函数指示所述预测信息和正确信息之间的相似度,正确信息包括自车周围的车辆在第一时间内的正确的位置信息。1003. The training device trains the first model according to a loss function, where the loss function indicates the similarity between the predicted information and correct information, and the correct information includes correct position information of vehicles around the vehicle within the first time.
本申请实施例中,训练设备上预先配置有训练数据,训练数据包括与自车周围的车辆的信息和车道信息对应的期望结果。训练设备在得到与自车周围的车辆的信息和车道信息对应的预测结果后,可以根据与预测结果和期望结果,计算目标损失函数的函数值,根据目标损失函数的函数值和反向传播算法来更新待训练模型的参数值,以完成对待训练模型的一次训练。In the embodiment of the present application, the training device is pre-configured with training data, and the training data includes expected results corresponding to the information of vehicles and lanes around the vehicle. After obtaining the prediction results corresponding to the information of vehicles and lanes around the vehicle, the training device can calculate the function value of the target loss function according to the prediction results and the expected results, and update the parameter value of the model to be trained according to the function value of the target loss function and the back propagation algorithm to complete one training of the model to be trained.
其中,“待训练模型”也可以被理解为“待训练的目标模型”。“与自车周围的车辆的信息和车道信息对应的期望结果”所代表的含义与“与自车周围的车辆的信息和车道信息对应的预测结果”的含义类似,区别在于“与自车周围的车辆的信息和车道信息对应的预测结果”是由待训练模型生成的预测结果,“与自车周围的车辆的信息和车道信息对应的期望结果”是与自车周围的车辆的信息和车道信息对应的正确结果。作为示例,例如当待处理模型用于执行目标检测任务时,预测结果用于指示目标环境中至少一个物体的期望位置,期望结果用于指示目标环境中至少一个物体的期望位置(也可以称为正确位置),应理解,此处举例仅为方便理解本方案,不用于对各种应用场景下期望结果的含义进行穷举。Among them, the "model to be trained" can also be understood as the "target model to be trained". The meaning represented by the "expected result corresponding to the information and lane information of the vehicles around the vehicle" is similar to the meaning of the "prediction result corresponding to the information and lane information of the vehicles around the vehicle", the difference is that the "prediction result corresponding to the information and lane information of the vehicles around the vehicle" is the prediction result generated by the model to be trained, and the "expected result corresponding to the information and lane information of the vehicles around the vehicle" is the correct result corresponding to the information and lane information of the vehicles around the vehicle. As an example, for example, when the model to be processed is used to perform a target detection task, the prediction result is used to indicate the expected position of at least one object in the target environment, and the expected result is used to indicate the expected position (also referred to as the correct position) of at least one object in the target environment. It should be understood that the examples here are only for the convenience of understanding this solution, and are not used to exhaustively enumerate the meanings of the expected results in various application scenarios.
训练设备可以重复执行步骤1001至1003多次,以实现对待训练模型的迭代训练,直至满足预设条件,得到训练后的待训练模型,其中,预设条件可以为达到目标损失函数的收敛条件,或者,步骤1001至1003的迭代次数达到预设次数。The training device can repeat steps 1001 to 1003 multiple times to achieve iterative training of the model to be trained until the preset conditions are met and the trained model to be trained is obtained, wherein the preset conditions can be the convergence conditions for reaching the target loss function, or the number of iterations of steps 1001 to 1003 reaches a preset number.
本申请实施例中,不仅提供了模型的推理过程的具体实现方式,还提供了模型的训练过程的具体实 现方式,扩展了本方案的应用场景。In the embodiments of the present application, not only the specific implementation of the reasoning process of the model is provided, but also the specific implementation of the training process of the model is provided. The present method expands the application scenarios of this solution.
下面对训练设备根据损失函数对第一模型进行训练的完整过程进行具体说明。The complete process of training the first model by the training device according to the loss function is described in detail below.
1、训练设备采集数据集,得到所需的原始数据集及其对应的类别标签,按比例数量划分训练集、验证集和测试集,分别用于后续对模型进行训练、验证和评估。1. The training device collects the data set, obtains the required original data set and its corresponding category labels, and divides the training set, validation set, and test set into proportional quantities, which are used for subsequent training, validation, and evaluation of the model.
2、训练设备构建基于注意力机制的第一模型。2. The training device builds the first model based on the attention mechanism.
3、训练设备将训练集的数据输入第一模型中,采用第一损失函数和第二损失函数对第一模型进行训练,通过反向传播算法更新识别模型,并利用验证集的数据筛选出最优的第一模型。3. The training device inputs the data of the training set into the first model, trains the first model using the first loss function and the second loss function, updates the recognition model through the back propagation algorithm, and uses the data of the verification set to screen out the optimal first model.
(1)确定损失函数(1) Determine the loss function
一种实现方式中,训练设备采用的损失函数包括第一损失函数和第二损失函数。In one implementation, the loss function used by the training device includes a first loss function and a second loss function.
该种实现方式中,由于第一模型输出的预测信息包括自车周围的车辆在第一时间内的预测轨迹信息和自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关联度。因此,通过第一损失函数,来计算第一模型输出的自车周围的车辆在第一时间内的预测轨迹信息与正确轨迹信息之间的损失值,通过第二损失函数,来计算自车周围的至少一个车道在第一时间内的关联度与正确信息之间的损失值。In this implementation, since the prediction information output by the first model includes the predicted trajectory information of the vehicles around the vehicle within the first time and the correlation between the target vehicle around the vehicle and at least one lane around the vehicle within the first time, the loss value between the predicted trajectory information of the vehicles around the vehicle within the first time output by the first model and the correct trajectory information is calculated by the first loss function, and the loss value between the correlation between at least one lane around the vehicle within the first time and the correct information is calculated by the second loss function.
示例性地,第一损失函数的公式具体为:
Exemplarily, the formula of the first loss function is specifically:
其中,ln表示目标车辆在第n个样本对应的预测坐标与真实坐标的损失值,xn表示目标车辆在第n个样本对应的预测坐标的向量数据,yn表示目标车辆在第n个样本对应的真实坐标的向量数据,beta表示误差阈值。例如,每隔0.2s采集目标车辆在未来3s内的坐标,可以得到15个点的位置坐标,每一个取样点对应一个样本。Among them, l n represents the loss value between the predicted coordinates and the real coordinates of the target vehicle corresponding to the nth sample, x n represents the vector data of the predicted coordinates of the target vehicle corresponding to the nth sample, y n represents the vector data of the real coordinates of the target vehicle corresponding to the nth sample, and beta represents the error threshold. For example, the coordinates of the target vehicle in the next 3 seconds are collected every 0.2 seconds, and the position coordinates of 15 points can be obtained, and each sampling point corresponds to one sample.
第二损失函数的公式具体为:
The formula of the second loss function is:
其中,ln为第n个样本对应的损失值loss,xn为第n个样本在第一时间内所在的预测车道,yn表示第n个样本在第一时间内真实所处的车道,样本表示自车周围的车辆,w表示权重。Among them, l n is the loss value loss corresponding to the nth sample, x n is the predicted lane where the nth sample is located in the first time, y n represents the actual lane where the nth sample is located in the first time, sample represents the vehicles around the vehicle, and w represents the weight.
可以理解的是,由于第一模型输出的第三信息包括自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关联度,在实际进行模型训练的过程中,注意力分数的真实值不方便获取,因此,可以在第一模型输出第三信息的基础上,将自车周围的至少一个车道中与目标车辆在第一时间内的关联度最高的一个车道作为目标车辆在第一时间内所在的车道,并以目标车道的预测车道信息与实际车道信息进行对比,来对模型进行训练。It can be understood that since the third information output by the first model includes the correlation between the target vehicle around the own vehicle and at least one lane around the own vehicle within the first time, it is not convenient to obtain the true value of the attention score during the actual model training process. Therefore, based on the third information output by the first model, the lane with the highest correlation with the target vehicle within the first time among the at least one lane around the own vehicle can be used as the lane where the target vehicle is located within the first time, and the predicted lane information of the target lane can be compared with the actual lane information to train the model.
(2)训练设备根据损失函数对第一模型进行训练(2) The training device trains the first model according to the loss function
在确定了损失函数后,训练设备使用训练集对第一模型进行训练,并在验证集上进行验证,保存在验证集上表现最好的网络模型参数。After the loss function is determined, the training device trains the first model using the training set, verifies it on the validation set, and saves the network model parameters that perform best on the validation set.
一种实现方式中,训练设备根据损失函数对第一模型进行训练的具体过程为:In one implementation, the specific process of the training device training the first model according to the loss function is:
(1)基于反向传播算法,采用第一损失函数对第一模型中进行训练,训练完成后保存第一损失值最小的模型。(1) Based on the back propagation algorithm, the first loss function is used to train the first model, and after the training is completed, the model with the smallest first loss value is saved.
(2)基于反向传播算法,采用第二损失函数对(1)中得到的模型进行训练,训练完成后保存第二损失值最小的模型。(2) Based on the back propagation algorithm, the model obtained in (1) is trained using the second loss function, and after the training is completed, the model with the smallest second loss value is saved.
可以理解的是,在第一模型训练过程中,训练设备可以通过误差反向传播算法对第一模型的参数进行更新。简单来说,训练设备可以通过误差反向传播算法,在第一模型的训练过程中修正初始的降噪分类网络中参数的大小,使得误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的第一模型中的参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。It is understandable that during the training of the first model, the training device can update the parameters of the first model through the error back propagation algorithm. Simply put, the training device can correct the size of the parameters in the initial denoising classification network during the training of the first model through the error back propagation algorithm, so that the error loss becomes smaller and smaller. Specifically, the forward transmission of the input signal until the output will generate error loss, and the parameters in the initial first model are updated by back propagating the error loss information, so that the error loss converges. The back propagation algorithm is a back propagation movement dominated by error loss, which aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
4、训练设备使用测试集的数据测试第一模型的预测性能,得到最终的模型识别正确率,当模型识别正确率达到设定阈值时,将待预测的数据输入第一模型进行识别;否则返回第三步,直至模型识别正 确率达到设定阈值。4. The training device uses the data of the test set to test the prediction performance of the first model and obtain the final model recognition accuracy. When the model recognition accuracy reaches the set threshold, the data to be predicted is input into the first model for recognition; otherwise, it returns to the third step until the model recognition accuracy reaches the set threshold. The accuracy reaches the set threshold.
可以理解的是,由于第一模型输出的预测信息包括两个部分,分别为目标车辆的预测轨迹信息和目标车辆相对于车道的注意力分数,因此,针对这两个部分,分别进行准确率评估。It can be understood that since the prediction information output by the first model includes two parts, namely the predicted trajectory information of the target vehicle and the attention score of the target vehicle relative to the lane, the accuracy evaluation is performed on these two parts respectively.
一种实现方式中,预测轨迹信息的准确率评估的公式具体为:

In one implementation, the formula for evaluating the accuracy of the predicted trajectory information is specifically:

其中,FDE(Final displacement error,最后位移误差),测量的是未来一段时间内的步长的误差(欧式距离,以米为单位),MR(Miss Rate,未命中率)表示第一模型输出的预测轨迹信息不准确的比率,即针对第一模型输出的预测轨迹信息和实际测量的轨迹信息中每一采样点,采样点之间的距离超过容错距离的比率。N表示batch-size,对应到表1中为256,n表示每条轨迹的点数,x、y分别是点的横、纵坐标, 分别表示针对每一条轨迹,在采集的点数n的第j个点对应的横坐标、纵坐标的真实值(即轨迹标签中采集到的值),xnj、ynj分别表示针对每一条轨迹,在采集的点数n的第j个点对应的横坐标、纵坐标的预测值,dist_threshold指容错距离,可以设置为1.5米;valid_num表示采集到的有效数据的数量。Among them, FDE (Final displacement error) measures the error of the step length in the future period of time (Euclidean distance, in meters), and MR (Miss Rate) represents the ratio of the inaccurate predicted trajectory information output by the first model, that is, the ratio of the distance between the predicted trajectory information output by the first model and each sampling point in the actually measured trajectory information that exceeds the tolerance distance. N represents the batch-size, which corresponds to 256 in Table 1, n represents the number of points in each trajectory, x and y are the horizontal and vertical coordinates of the point, respectively. They represent the true values of the horizontal and vertical coordinates corresponding to the j-th point of the collected points n for each trajectory (that is, the values collected in the trajectory label), xnj and ynj represent the predicted values of the horizontal and vertical coordinates corresponding to the j-th point of the collected points n for each trajectory, dist_threshold refers to the tolerance distance, which can be set to 1.5 meters; valid_num represents the number of valid data collected.
一种实现方式中,若目标车辆所所属的道路场景为路口场景,则通过以下公式进行路口车道意图的准确率评估:
In one implementation, if the road scene to which the target vehicle belongs is an intersection scene, the accuracy of the intersection lane intention is evaluated by the following formula:
其中,Accexit表示出口车道的准确率,目标车辆从路口场景中离开的车道称为出口车道,exit_lane_right_num表示目标车辆在未来第一时间内所在的真实车道与预测出的车道一致的数量,其中,可以通过从路口标签中获取目标车辆在未来第一时间内所在的真实车道,valid_exit_lane_num表示采集到的有效数据的数量。Acc exit represents the accuracy of the exit lane. The lane where the target vehicle leaves the intersection scene is called the exit lane. exit_lane_right_num represents the number of real lanes where the target vehicle will be located in the first time in the future and the predicted lanes. The real lane where the target vehicle will be located in the first time in the future can be obtained from the intersection label. valid_exit_lane_num represents the number of valid data collected.
一种实现方式中,若目标车辆所所属的道路场景为非路口场景,在非路口场景中,车辆在行驶过程中存在换道或者不换道的两种行为。In one implementation, if the road scene to which the target vehicle belongs is a non-intersection scene, in the non-intersection scene, the vehicle has two behaviors during driving: changing lanes or not changing lanes.
通过以下公式对车辆换道的准确率评估:
The accuracy of vehicle lane changing is evaluated by the following formula:
其中,Acccutin表示目标车辆在第一时间内换道的准确率,cutin_right_num表示目标车辆在未来第一时间内实际进行了换道,且通过第一模型预测出了车辆换道,valid_cutin_num表示采集到的有效数据的数量。Among them, Acc cutin represents the accuracy of the target vehicle changing lanes in the first time, cutin_right_num represents that the target vehicle actually changes lanes in the first time in the future, and the vehicle lane change is predicted by the first model, and valid_cutin_num represents the number of valid data collected.
通过以下公式对车辆不换道的准确率评估:
The accuracy of the vehicle not changing lanes is evaluated by the following formula:
其中,Acckeep表示目标车辆在第一时间内不换道的准确率,即换道误报的准确率,keep_right_num表示目标车辆在未来第一时间内所在的真实车道与预测出的车道一致的数量,其中,可以通过从非路口标签中获取目标车辆在未来第一时间内所在的真实车道,valid_keep_num表示采集到的有效数据的数量。Acc keep indicates the accuracy of the target vehicle not changing lanes in the first time, that is, the accuracy of lane change false alarms. keep_right_num indicates the number of real lanes where the target vehicle is located in the first time in the future that are consistent with the predicted lanes. The real lane where the target vehicle is located in the first time in the future can be obtained from the non-intersection label. valid_keep_num indicates the number of valid data collected.
在图1a至图10所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图11,图11为本申请实施例提供的车辆的位置获取装置的一种结构示意图,车辆的位置获取装置1100可以包括:On the basis of the embodiments corresponding to FIG. 1a to FIG. 10 , in order to better implement the above-mentioned solution of the embodiment of the present application, the following also provides related devices for implementing the above-mentioned solution. Specifically, refer to FIG. 11 , which is a structural schematic diagram of a vehicle position acquisition device provided in the embodiment of the present application. The vehicle position acquisition device 1100 may include:
获取模块1101,用于获取第一信息和第二信息,第一信息包括自车周围的车辆的信息,第二信息包括自车周围的车道的信息。 The acquisition module 1101 is used to acquire first information and second information, where the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle.
位置预测模块1102,用于将第一信息和第二信息输入第一模型中,得到第一模型生成的预测信息,预测信息包括自车周围的车辆在第一时间内的预测位置信息。The position prediction module 1102 is used to input the first information and the second information into the first model to obtain prediction information generated by the first model, where the prediction information includes the predicted position information of vehicles around the vehicle within the first time.
其中,关于获取模块1101和位置预测模块1102的具体描述可以参照上述实施例中步骤401至步骤402的描述,此处不再赘述。Among them, the specific description of the acquisition module 1101 and the position prediction module 1102 can refer to the description of step 401 to step 402 in the above embodiment, and will not be repeated here.
一种实现方式中,预测信息包括自车周围的车辆在第一时间内的预测轨迹信息和第三信息,第三信息指示自车周围的车辆在第一时间内所在的车道。In one implementation, the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
一种实现方式中,第三信息包括自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关联度,目标车辆为自车周围的一个车辆,车辆的位置获取装置1100还包括:In one implementation, the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the vehicle position acquisition device 1100 further includes:
车道确定模块,用于将第一车道确定为目标车辆在第一时间内所在的车道,其中,第一车道为自车周围的至少一个车道中与目标车辆在第一时间内的关联度最高的一个车道。The lane determination module is used to determine the first lane as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.
一种实现方式中,第一模型基于注意力机制构建,位置预测模块1102具体用于:In one implementation, the first model is constructed based on the attention mechanism, and the position prediction module 1102 is specifically used to:
将第一信息和第二信息输入第一模型中,基于注意力机制,生成第四信息,第四信息包括自车周围的目标车辆与第一车道集合在第一时间内的关联度,目标车辆为自车周围的一个车辆,第一车道集合包括第二信息中包括的自车周围的所有车道;Input the first information and the second information into the first model, and generate fourth information based on the attention mechanism, wherein the fourth information includes the correlation between the target vehicle around the ego vehicle and the first lane set within the first time, the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;
获取目标车辆所属的道路场景的类别,道路场景的类别包括路口场景和非路口场景;Obtaining the category of the road scene to which the target vehicle belongs, the categories of the road scene include intersection scenes and non-intersection scenes;
根据目标车辆所属的道路场景的类别,从第一车道集合中选取第二车道集合,第二车道集合包括自车周围的车辆在第一时间内所在的车道;According to the category of the road scene to which the target vehicle belongs, a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
从第四信息中获取第五信息,并根据第五信息生成第三信息,第五信息包括目标车辆与第二车道集合在第一时间内的关联度;Acquire fifth information from the fourth information, and generate third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;
根据第二信息和第四信息,生成自车周围的车辆在第一时间内的预测轨迹信息。Based on the second information and the fourth information, the predicted trajectory information of the vehicles around the vehicle within the first time is generated.
一种实现方式中,位置预测模块1102具体用于:In one implementation, the location prediction module 1102 is specifically used for:
从第四信息中获取第五信息,并对第五信息进行归一化操作,得到归一化后的第五信息;Acquire fifth information from the fourth information, and perform a normalization operation on the fifth information to obtain normalized fifth information;
将归一化后的第五信息输入多层感知机中,得到第三信息。The normalized fifth information is input into the multi-layer perceptron to obtain the third information.
在第三方面的一种可能的实现方式中,位置预测模块1102具体用于:In a possible implementation manner of the third aspect, the location prediction module 1102 is specifically configured to:
分别对第一信息和第二信息进行向量化处理和线性映射,得到第一线性矩阵和第二线性矩阵;Performing vectorization processing and linear mapping on the first information and the second information respectively to obtain a first linear matrix and a second linear matrix;
对第一线性矩阵和第二线性矩阵的矩阵乘积执行归一化操作,得到第四信息。A normalization operation is performed on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.
一种实现方式中,位置预测模块1102具体用于:In one implementation, the location prediction module 1102 is specifically used for:
对第二线性矩阵与第四信息执行矩阵乘运算,得到第六信息;Performing a matrix multiplication operation on the second linear matrix and the fourth information to obtain sixth information;
将第六信息输入多层感知机中,得到自车周围的车辆在第一时间内的预测轨迹信息。The sixth information is input into the multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle in the first time.
需要说明的是,车辆的位置获取装置1100中各模块/单元之间的信息交互、执行过程等内容,与本申请中图4对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the information interaction, execution process, etc. between the modules/units in the vehicle position acquisition device 1100 are based on the same concept as the various method embodiments corresponding to Figure 4 in the present application. The specific contents can be found in the description of the method embodiments shown in the previous part of the present application, and will not be repeated here.
本申请实施例还提供了一种模型的训练装置,请参阅图12,图12为本申请实施例提供的模型的训练装置的一种结构示意图,模型的训练装置1200可以包括:The present application embodiment further provides a model training device. Please refer to FIG. 12 , which is a schematic diagram of a structure of the model training device provided in the present application embodiment. The model training device 1200 may include:
获取模块1201,用于获取第一信息和第二信息,第一信息包括自车周围的车辆的信息,第二信息包括自车周围的车道的信息。The acquisition module 1201 is used to acquire first information and second information, where the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle.
位置预测模块1202,用于将第一信息和第二信息输入第一模型中,得到第一模型生成的预测信息,预测信息包括自车周围的车辆在第一时间内的预测位置信息。The position prediction module 1202 is used to input the first information and the second information into the first model to obtain prediction information generated by the first model, where the prediction information includes the predicted position information of vehicles around the vehicle within the first time.
模型训练模块1203,用于根据损失函数对第一模型进行训练,损失函数指示预测信息和正确信息之间的相似度,正确信息包括自车周围的车辆在第一时间内的正确的位置信息。The model training module 1203 is used to train the first model according to the loss function, where the loss function indicates the similarity between the predicted information and the correct information, and the correct information includes the correct position information of the vehicles around the vehicle within the first time.
其中,关于获取模块1201、位置预测模块1202和模型训练模块1203的具体描述可以参照上述实施例中步骤1001至步骤1003的描述,此处不再赘述。Among them, the specific description of the acquisition module 1201, the position prediction module 1202 and the model training module 1203 can refer to the description of steps 1001 to 1003 in the above embodiment, and will not be repeated here.
一种实现方式中,预测信息包括自车周围的车辆在所述第一时间内的预测轨迹信息和第三信息,第三信息指示自车周围的车辆在第一时间内所在的车道。In one implementation, the prediction information includes predicted trajectory information of vehicles around the vehicle within the first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
一种实现方式中,第三信息包括自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关 联度,目标车辆为自车周围的一个车辆,模型的训练装置1200还包括:In one implementation, the third information includes the relationship between the target vehicle around the vehicle and at least one lane around the vehicle within the first time. The target vehicle is a vehicle around the vehicle, and the model training device 1200 also includes:
车道确定模块,用于将第一车道确定为目标车辆在第一时间内所在的车道,其中,第一车道为自车周围的至少一个车道中与目标车辆在第一时间内的关联度最高的一个车道。The lane determination module is used to determine the first lane as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.
该种实现方式中,由于第一模型输出的第三信息包括自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关联度,在实际进行模型训练的过程中,注意力分数的正确值不方便获取,因此,可以在第一模型输出第三信息的基础上,将自车周围的至少一个车道中与目标车辆在第一时间内的关联度最高的一个车道作为目标车辆在第一时间内所在的车道,并以目标车道的预测车道信息与实际车道信息进行对比,来对模型进行训练。In this implementation, since the third information output by the first model includes the correlation between the target vehicle around the own vehicle and at least one lane around the own vehicle within the first time, it is not convenient to obtain the correct value of the attention score during the actual model training process. Therefore, based on the third information output by the first model, the lane with the highest correlation with the target vehicle within the first time among the at least one lane around the own vehicle can be used as the lane where the target vehicle is located within the first time, and the predicted lane information of the target lane is compared with the actual lane information to train the model.
可以理解的是,一种实现方式中,实际操作过程中,在通过位置预测模块1202得到预测信息后,可以首先通过车道确定模块确定目标车辆在第一时间内所在的车道,然后,再通过模型训练模块1203对第一模型进行训练;又或者,一种实现方式中,在实际操作过程中,若能够对自车周围的目标车辆与自车周围的至少一个车道在第一时间内的关联度进行正确测量,则可以直接根据关联度之间的误差大小对第一模型进行训练,则无需再执行车道确定模块。对应到实际应用场景中,训练设备是否需要采用车道确定模块,可以根据实际需求进行设定,在此不做限定。It can be understood that, in one implementation, during the actual operation, after obtaining the prediction information through the position prediction module 1202, the lane in which the target vehicle is located in the first time can be determined by the lane determination module first, and then the first model can be trained through the model training module 1203; or, in one implementation, during the actual operation, if the correlation between the target vehicle around the vehicle and at least one lane around the vehicle can be correctly measured within the first time, the first model can be trained directly according to the error between the correlations, and there is no need to execute the lane determination module. Corresponding to the actual application scenario, whether the training device needs to use the lane determination module can be set according to actual needs and is not limited here.
一种实现方式中,第一模型基于注意力机制构建,位置预测模块1202具体用于:In one implementation, the first model is constructed based on the attention mechanism, and the position prediction module 1202 is specifically used to:
将第一信息和第二信息输入第一模型中,基于注意力机制,生成第四信息,第四信息包括自车周围的目标车辆与第一车道集合在第一时间内的关联度,目标车辆为自车周围的一个车辆,第一车道集合包括第二信息中包括的自车周围的所有车道;Input the first information and the second information into the first model, and generate fourth information based on the attention mechanism, wherein the fourth information includes the correlation between the target vehicle around the ego vehicle and the first lane set within the first time, the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;
获取目标车辆所属的道路场景的类别,道路场景的类别包括路口场景和非路口场景;Obtaining the category of the road scene to which the target vehicle belongs, the categories of the road scene include intersection scenes and non-intersection scenes;
根据目标车辆所属的道路场景的类别,从第一车道集合中选取第二车道集合,第二车道集合包括自车周围的车辆在第一时间内所在的车道;According to the category of the road scene to which the target vehicle belongs, a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;
从第四信息中获取第五信息,并根据第五信息生成第三信息,第五信息包括目标车辆与第二车道集合在第一时间内的关联度;Acquire fifth information from the fourth information, and generate third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;
根据第二信息和第四信息,生成自车周围的车辆在第一时间内的预测轨迹信息。Based on the second information and the fourth information, the predicted trajectory information of the vehicles around the vehicle within the first time is generated.
一种实现方式中,位置预测模块1202具体用于:In one implementation, the location prediction module 1202 is specifically used for:
从第四信息中获取第五信息,并对第五信息进行归一化操作,得到归一化后的第五信息;Acquire fifth information from the fourth information, and perform a normalization operation on the fifth information to obtain normalized fifth information;
将归一化后的第五信息输入多层感知机中,得到第三信息。The normalized fifth information is input into the multi-layer perceptron to obtain the third information.
一种实现方式中,位置预测模块1202具体用于:In one implementation, the location prediction module 1202 is specifically used for:
分别对第一信息和第二信息进行向量化处理和线性映射,得到第一线性矩阵和第二线性矩阵;Performing vectorization processing and linear mapping on the first information and the second information respectively to obtain a first linear matrix and a second linear matrix;
对第一线性矩阵和第二线性矩阵的矩阵乘积执行归一化操作,得到第四信息。A normalization operation is performed on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.
一种实现方式中,位置预测模块1202具体用于:In one implementation, the location prediction module 1202 is specifically used for:
对第二线性矩阵与第四信息执行矩阵乘运算,得到第六信息;Performing a matrix multiplication operation on the second linear matrix and the fourth information to obtain sixth information;
将第六信息输入多层感知机中,得到自车周围的车辆在第一时间内的预测轨迹信息。The sixth information is input into the multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle in the first time.
需要说明的是,模型的训练装置1200中各模块/单元之间的信息交互、执行过程等内容,与本申请中图10对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the information interaction, execution process, etc. between the modules/units in the model training device 1200 are based on the same concept as the various method embodiments corresponding to Figure 10 in the present application. The specific contents can be found in the description of the method embodiments shown in the previous part of the present application and will not be repeated here.
接下来介绍本申请实施例提供的一种执行设备,请参阅图13,图13为本申请实施例提供的执行设备的一种结构示意图,执行设备1300具体可以表现为车辆、移动机器人、监控数据处理设备或者其他设备等,此处不做限定。具体地,执行设备1300包括:接收器1301、发射器1302、处理器1303和存储器1304(其中执行设备1300中的处理器1303的数量可以一个或多个,图13中以一个处理器为例),其中,处理器1303可以包括应用处理器13031和通信处理器13032。在本申请的一些实施例中,接收器1301、发射器1302、处理器1303和存储器1304可通过总线或其它方式连接。Next, an execution device provided in an embodiment of the present application is introduced. Please refer to Figure 13. Figure 13 is a structural schematic diagram of an execution device provided in an embodiment of the present application. The execution device 1300 can be specifically manifested as a vehicle, a mobile robot, a monitoring data processing device or other equipment, etc., which is not limited here. Specifically, the execution device 1300 includes: a receiver 1301, a transmitter 1302, a processor 1303 and a memory 1304 (wherein the number of processors 1303 in the execution device 1300 can be one or more, and one processor is taken as an example in Figure 13), wherein the processor 1303 may include an application processor 13031 and a communication processor 13032. In some embodiments of the present application, the receiver 1301, the transmitter 1302, the processor 1303 and the memory 1304 may be connected via a bus or other means.
存储器1304可以包括只读存储器和随机存取存储器,并向处理器1303提供指令和数据。存储器1304的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1304存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中, 操作指令可包括各种操作指令,用于实现各种操作。The memory 1304 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1303. A portion of the memory 1304 may also include a non-volatile random access memory (NVRAM). The memory 1304 stores processor and operation instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein: The operation instructions may include various operation instructions for implementing various operations.
处理器1303控制执行设备的操作。具体地应用中,执行设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。The processor 1303 controls the operation of the execution device. In specific applications, the various components of the execution device are coupled together through a bus system, wherein the bus system includes not only a data bus but also a power bus, a control bus, and a status signal bus, etc. However, for the sake of clarity, various buses are referred to as bus systems in the figure.
上述本申请实施例揭示的方法可以应用于处理器1303中,或者由处理器1303实现。处理器1303可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1303中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1303可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1303可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1304,处理器1303读取存储器1304中的信息,结合其硬件完成上述方法的步骤。The method disclosed in the above embodiment of the present application can be applied to the processor 1303, or implemented by the processor 1303. The processor 1303 can be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by the hardware integrated logic circuit in the processor 1303 or the instruction in the form of software. The above processor 1303 can be a general processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and can further include an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The processor 1303 can implement or execute the various methods, steps and logic block diagrams disclosed in the embodiment of the present application. The general processor can be a microprocessor or the processor can also be any conventional processor, etc. The steps of the method disclosed in the embodiment of the present application can be directly embodied as a hardware decoding processor to be executed, or a combination of hardware and software modules in the decoding processor can be executed. The software module may be located in a storage medium mature in the art, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, etc. The storage medium is located in the memory 1304, and the processor 1303 reads the information in the memory 1304 and completes the steps of the above method in combination with its hardware.
接收器1301可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1302可用于通过第一接口输出数字或字符信息;发射器1302还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1302还可以包括显示屏等显示设备。The receiver 1301 can be used to receive input digital or character information and generate signal input related to the relevant settings and function control of the execution device. The transmitter 1302 can be used to output digital or character information through the first interface; the transmitter 1302 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1302 can also include a display device such as a display screen.
本申请实施例中,在一种情况下,处理器1303中的应用处理器13031,用于执行图4至图9对应实施例中的执行设备执行的车辆的位置获取方法。需要说明的是,应用处理器13031执行前述各个步骤的具体方式,与本申请中图4至图9对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图4至图9对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。In an embodiment of the present application, in one case, the application processor 13031 in the processor 1303 is used to execute the vehicle position acquisition method executed by the execution device in the embodiments corresponding to Figures 4 to 9. It should be noted that the specific manner in which the application processor 13031 executes the aforementioned steps is based on the same concept as the various method embodiments corresponding to Figures 4 to 9 in the present application, and the technical effects brought about are the same as the various method embodiments corresponding to Figures 4 to 9 in the present application. For specific contents, please refer to the description in the method embodiments shown in the aforementioned present application, which will not be repeated here.
本申请实施例还提供了一种训练设备,请参阅图14,图14是本申请实施例提供的训练设备一种结构示意图。具体地,训练设备1400由一个或多个服务器实现,训练设备1400可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1422(例如,一个或一个以上处理器)和存储器1432,一个或一个以上存储应用程序1442或数据1444的存储介质1430(例如一个或一个以上海量存储设备)。其中,存储器1432和存储介质1430可以是短暂存储或持久存储。存储在存储介质1430的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器1422可以设置为与存储介质1430通信,在训练设备1400上执行存储介质1430中的一系列指令操作。The embodiment of the present application also provides a training device, please refer to Figure 14, which is a structural diagram of a training device provided by the embodiment of the present application. Specifically, the training device 1400 is implemented by one or more servers, and the training device 1400 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (CPU) 1422 (for example, one or more processors) and a memory 1432, and one or more storage media 1430 (for example, one or more mass storage devices) storing application programs 1442 or data 1444. Among them, the memory 1432 and the storage medium 1430 can be short-term storage or permanent storage. The program stored in the storage medium 1430 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the training device. Furthermore, the central processor 1422 can be configured to communicate with the storage medium 1430 to execute a series of instruction operations in the storage medium 1430 on the training device 1400.
训练设备1400还可以包括一个或一个以上电源1426,一个或一个以上有线或无线网络接口1450,一个或一个以上输入输出接口1458,和/或,一个或一个以上操作系统1441,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。The training device 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input and output interfaces 1458, and/or, one or more operating systems 1441, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
本申请实施例中,中央处理器1422,用于执行图10对应实施例中的训练设备执行的车辆的位置获取方法。需要说明的是,中央处理器1422执行前述各个步骤的具体方式,与本申请中图10对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图10对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。In the embodiment of the present application, the central processor 1422 is used to execute the vehicle position acquisition method executed by the training device in the embodiment corresponding to Figure 10. It should be noted that the specific manner in which the central processor 1422 executes the aforementioned steps is based on the same concept as the various method embodiments corresponding to Figure 10 in the present application, and the technical effects brought about are the same as the various method embodiments corresponding to Figure 10 in the present application. For specific contents, please refer to the description in the method embodiments shown in the previous embodiment of the present application, and no further description will be given here.
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图4至图9所示实施例描述的方法中执行设备所执行的步骤,或者,使得计算机执行如前述图10所示实施例描述的方法中训练设备所执行的步骤。Also provided in an embodiment of the present application is a computer program product, which, when executed on a computer, enables the computer to execute the steps executed by the execution device in the method described in the embodiments shown in Figures 4 to 9 above, or enables the computer to execute the steps executed by the training device in the method described in the embodiment shown in Figure 10 above.
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述图4至图9所示实施例描述的方法中执行设备所执行的步骤,或者,使得计算机执行如前述图10所示实施例描述的方法中训练设备所执行的步骤。 A computer-readable storage medium is also provided in an embodiment of the present application, which stores a program for signal processing. When the computer-readable storage medium is run on a computer, the computer executes the steps executed by the execution device in the method described in the embodiments shown in Figures 4 to 9 above, or the computer executes the steps executed by the training device in the method described in the embodiment shown in Figure 10 above.
本申请实施例提供的车辆的位置获取装置、模型的训练装置、执行设备以及训练设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使芯片执行上述图4至图9所示实施例描述的车辆的位置获取方法,或者,以使芯片执行上述图10所示实施例描述的模型的训练方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The vehicle position acquisition device, model training device, execution device and training device provided in the embodiments of the present application can be specifically a chip, and the chip includes: a processing unit and a communication unit, the processing unit can be, for example, a processor, and the communication unit can be, for example, an input/output interface, a pin or a circuit, etc. The processing unit can execute the computer execution instructions stored in the storage unit, so that the chip executes the vehicle position acquisition method described in the embodiments shown in Figures 4 to 9 above, or so that the chip executes the model training method described in the embodiment shown in Figure 10 above. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit can also be a storage unit located outside the chip in the wireless access device end, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.
具体地,请参阅图15,图15为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 150,NPU 150作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1503,通过控制器1504控制运算电路1503提取存储器中的矩阵数据并进行乘法运算。Specifically, please refer to FIG. 15 , which is a schematic diagram of a structure of a chip provided in an embodiment of the present application, wherein the chip may be a neural network processor NPU 150, which is mounted on the host CPU (Host CPU) as a coprocessor and is assigned tasks by the Host CPU. The core part of the NPU is the operation circuit 1503, which is controlled by the controller 1504 to extract matrix data from the memory and perform multiplication operations.
在一些实现中,运算电路1503内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1503是二维脉动阵列。运算电路1503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1503是通用的矩阵处理器。In some implementations, the operation circuit 1503 includes multiple processing units (Process Engine, PE) inside. In some implementations, the operation circuit 1503 is a two-dimensional systolic array. The operation circuit 1503 can also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 1503 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1502中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1501中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1508中。For example, assume there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit takes the corresponding data of matrix B from the weight memory 1502 and caches it on each PE in the operation circuit. The operation circuit takes the matrix A data from the input memory 1501 and performs matrix operation with matrix B, and the partial result or final result of the matrix is stored in the accumulator 1508.
统一存储器1506用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)1505,DMAC被搬运到权重存储器1502中。输入数据也通过DMAC被搬运到统一存储器1506中。Unified memory 1506 is used to store input data and output data. Weight data is directly transferred to weight memory 1502 through Direct Memory Access Controller (DMAC) 1505. Input data is also transferred to unified memory 1506 through DMAC.
BIU为Bus Interface Unit即,总线接口单元1510,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)1509的交互。BIU stands for Bus Interface Unit, that is, the bus interface unit 1510, which is used for the interaction between AXI bus and DMAC and instruction fetch buffer (IFB) 1509.
总线接口单元1510(Bus Interface Unit,简称BIU),用于取指存储器1509从外部存储器获取指令,还用于存储单元访问控制器1505从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 1510 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1509 to obtain instructions from the external memory, and is also used for the storage unit access controller 1505 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1506或将权重数据搬运到权重存储器1502中或将输入数据数据搬运到输入存储器1501中。DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1506 or to transfer weight data to the weight memory 1502 or to transfer input data to the input memory 1501.
向量计算单元1507包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。The vector calculation unit 1507 includes multiple operation processing units, which further process the output of the operation circuit when necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as Batch Normalization, pixel-level summation, upsampling of feature planes, etc.
在一些实现中,向量计算单元1507能将经处理的输出的向量存储到统一存储器1506。例如,向量计算单元1507可以将线性函数和/或非线性函数应用到运算电路1503的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1507生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1503的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector calculation unit 1507 can store the processed output vector to the unified memory 1506. For example, the vector calculation unit 1507 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1503, such as linear interpolation of the feature plane extracted by the convolution layer, and then, for example, a vector of accumulated values to generate an activation value. In some implementations, the vector calculation unit 1507 generates a normalized value, a pixel-level summed value, or both. In some implementations, the processed output vector can be used as an activation input to the operation circuit 1503, for example, for use in a subsequent layer in a neural network.
控制器1504连接的取指存储器(instruction fetch buffer)1509,用于存储控制器1504使用的指令;An instruction fetch buffer 1509 connected to the controller 1504 is used to store instructions used by the controller 1504;
统一存储器1506,输入存储器1501,权重存储器1502以及取指存储器1509均为On-Chip存储器。外部存储器私有于该NPU硬件架构。Unified memory 1506, input memory 1501, weight memory 1502 and instruction fetch memory 1509 are all on-chip memories. External memories are private to the NPU hardware architecture.
其中,图4至图10示出的目标模型中各层的运算可以由运算电路1503或向量计算单元1507执行。Among them, the operations of each layer in the target model shown in Figures 4 to 10 can be performed by the operation circuit 1503 or the vector calculation unit 1507.
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。The processor mentioned in any of the above places may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the above-mentioned first aspect method.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实 现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。It should also be noted that the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed over multiple network units. Some or all of the modules may be selected to implement the present invention according to actual needs. In addition, in the drawings of the device embodiments provided by the present application, the connection relationship between modules indicates that there is a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation mode, the technicians in the field can clearly understand that the present application can be implemented by means of software plus necessary general hardware, and of course, it can also be implemented by special hardware including special integrated circuits, special CPUs, special memories, special components, etc. In general, all functions completed by computer programs can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be various, such as analog circuits, digital circuits or special circuits. However, for the present application, software program implementation is a better implementation mode in more cases. Based on such an understanding, the technical solution of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, a U disk, a mobile hard disk, a ROM, a RAM, a disk or an optical disk, etc., including a number of instructions to enable a computer device (which can be a personal computer, a training device, or a network device, etc.) to execute the methods described in each embodiment of the present application.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above embodiments, all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented by software, all or part of the embodiments may be implemented in the form of a computer program product.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。 The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website site, a computer, a training device, or a data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, training device, or data center. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device, a data center, etc. that includes one or more available media integrations. The available medium may be a magnetic medium, (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)), etc.

Claims (27)

  1. 一种车辆的位置获取方法,其特征在于,所述方法包括:A method for obtaining a vehicle position, characterized in that the method comprises:
    获取第一信息和第二信息,所述第一信息包括自车周围的车辆的信息,所述第二信息包括所述自车周围的车道的信息;Acquire first information and second information, wherein the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle;
    将所述第一信息和所述第二信息输入第一模型中,得到所述第一模型生成的预测信息,所述预测信息包括所述自车周围的车辆在第一时间内的预测位置信息。The first information and the second information are input into a first model to obtain prediction information generated by the first model, wherein the prediction information includes predicted position information of vehicles around the vehicle within a first time.
  2. 根据权利要求1所述的方法,其特征在于,所述预测信息包括所述自车周围的车辆在所述第一时间内的预测轨迹信息和第三信息,所述第三信息指示所述自车周围的车辆在所述第一时间内所在的车道。The method according to claim 1 is characterized in that the prediction information includes predicted trajectory information of vehicles around the vehicle within the first time and third information, and the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
  3. 根据权利要求2所述的方法,其特征在于,所述第三信息包括所述自车周围的目标车辆与所述自车周围的至少一个车道在所述第一时间内的关联度,所述目标车辆为所述自车周围的一个车辆,所述方法还包括:The method according to claim 2, characterized in that the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within the first time, the target vehicle being a vehicle around the vehicle, and the method further comprising:
    将第一车道确定为所述目标车辆在所述第一时间内所在的车道,其中,所述第一车道为所述自车周围的至少一个车道中与所述目标车辆在所述第一时间内的关联度最高的一个车道。A first lane is determined as a lane where the target vehicle is located during the first time, wherein the first lane is a lane with the highest correlation with the target vehicle during the first time among at least one lane around the vehicle.
  4. 根据权利要求2或3所述的方法,其特征在于,所述第一模型基于注意力机制构建,所述将所述第一信息和所述第二信息输入所述第一模型中,得到所述第一模型生成的所述预测信息,包括:将所述第一信息和所述第二信息输入所述第一模型中,基于所述注意力机制,生成第四信息,所述第四信息包括所述自车周围的目标车辆与第一车道集合在所述第一时间内的关联度,所述目标车辆为所述自车周围的一个车辆,所述第一车道集合包括所述第二信息中包括的所述自车周围的所有车道;The method according to claim 2 or 3 is characterized in that the first model is constructed based on an attention mechanism, and the inputting the first information and the second information into the first model to obtain the prediction information generated by the first model comprises: inputting the first information and the second information into the first model, and generating fourth information based on the attention mechanism, wherein the fourth information comprises the degree of association between a target vehicle around the ego vehicle and a first lane set within the first time, the target vehicle being a vehicle around the ego vehicle, and the first lane set comprising all lanes around the ego vehicle included in the second information;
    获取所述目标车辆所属的道路场景的类别,所述道路场景的类别包括路口场景和非路口场景;Acquire the category of the road scene to which the target vehicle belongs, where the category of the road scene includes an intersection scene and a non-intersection scene;
    根据所述目标车辆所属的道路场景的类别,从所述第一车道集合中选取第二车道集合,所述第二车道集合包括所述自车周围的车辆在所述第一时间内所在的车道;Selecting a second lane set from the first lane set according to the category of the road scene to which the target vehicle belongs, the second lane set including lanes where vehicles around the ego vehicle are located during the first time;
    从所述第四信息中获取第五信息,并根据所述第五信息生成所述第三信息,所述第五信息包括所述目标车辆与所述第二车道集合在所述第一时间内的关联度;Acquire fifth information from the fourth information, and generate the third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;
    根据所述第二信息和所述第四信息,生成所述自车周围的车辆在所述第一时间内的所述预测轨迹信息。The predicted trajectory information of vehicles around the own vehicle within the first time is generated based on the second information and the fourth information.
  5. 根据权利要求4所述的方法,其特征在于,所述从所述第四信息中获取所述第五信息,并根据所述第五信息生成所述第三信息,包括:The method according to claim 4, characterized in that the obtaining the fifth information from the fourth information and generating the third information according to the fifth information comprises:
    从所述第四信息中获取所述第五信息,并对所述第五信息进行归一化操作,得到归一化后的第五信息;Acquire the fifth information from the fourth information, and perform a normalization operation on the fifth information to obtain normalized fifth information;
    将所述归一化后的第五信息输入多层感知机中,得到所述第三信息。The normalized fifth information is input into a multi-layer perceptron to obtain the third information.
  6. 根据权利要求4或5所述的方法,其特征在于,所述根据所述第一信息和所述第二信息,生成所述第四信息,包括:The method according to claim 4 or 5, characterized in that generating the fourth information according to the first information and the second information comprises:
    分别对所述第一信息和所述第二信息进行向量化处理和线性映射,得到第一线性矩阵和第二线性矩阵;Performing vectorization processing and linear mapping on the first information and the second information respectively to obtain a first linear matrix and a second linear matrix;
    对所述第一线性矩阵和所述第二线性矩阵的矩阵乘积执行归一化操作,得到所述第四信息。A normalization operation is performed on a matrix product of the first linear matrix and the second linear matrix to obtain the fourth information.
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述第二信息和所述第四信息,生成所述自车周围的车辆在所述第一时间内的所述预测轨迹信息,包括:The method according to claim 6, characterized in that the step of generating the predicted trajectory information of vehicles around the vehicle within the first time period based on the second information and the fourth information comprises:
    对所述第二线性矩阵与所述第四信息执行矩阵乘运算,得到第六信息;Performing a matrix multiplication operation on the second linear matrix and the fourth information to obtain sixth information;
    将所述第六信息输入多层感知机中,得到所述自车周围的车辆在所述第一时间内的所述预测轨迹信息。The sixth information is input into a multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle within the first time.
  8. 一种模型的训练方法,其特征在于,所述方法包括:A model training method, characterized in that the method comprises:
    获取第一信息和第二信息,所述第一信息包括自车周围的车辆的信息,所述第二信息包括所述自车周围的车道的信息;Acquire first information and second information, wherein the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle;
    将所述第一信息和所述第二信息输入第一模型中,得到所述第一模型生成的预测信息,所述预测信息包括所述自车周围的车辆在第一时间内的预测位置信息; Inputting the first information and the second information into a first model to obtain prediction information generated by the first model, the prediction information including predicted position information of vehicles around the vehicle within a first time;
    根据损失函数对所述第一模型进行训练,所述损失函数指示所述预测信息和正确信息之间的相似度,所述正确信息包括所述自车周围的车辆在所述第一时间内的正确的位置信息。The first model is trained according to a loss function, wherein the loss function indicates a similarity between the predicted information and correct information, wherein the correct information includes correct position information of vehicles around the self-vehicle within the first time.
  9. 根据权利要求8所述的方法,其特征在于,所述预测信息包括所述自车周围的车辆在所述第一时间内的预测轨迹信息和第三信息,所述第三信息指示所述自车周围的车辆在所述第一时间内所在的车道。The method according to claim 8 is characterized in that the prediction information includes predicted trajectory information of vehicles around the vehicle within the first time and third information, and the third information indicates the lane in which the vehicles around the vehicle are located within the first time.
  10. 根据权利要求9所述的方法,其特征在于,所述第三信息包括所述自车周围的目标车辆与所述自车周围的至少一个车道在所述第一时间内的关联度,所述目标车辆为所述自车周围的一个车辆,所述方法还包括:The method according to claim 9, characterized in that the third information includes a correlation between a target vehicle around the ego vehicle and at least one lane around the ego vehicle within the first time, the target vehicle being a vehicle around the ego vehicle, and the method further comprising:
    将第一车道确定为所述目标车辆在所述第一时间内所在的车道,其中,所述第一车道为所述自车周围的至少一个车道中与所述目标车辆在所述第一时间内的关联度最高的一个车道。A first lane is determined as a lane where the target vehicle is located during the first time, wherein the first lane is a lane with the highest correlation with the target vehicle during the first time among at least one lane around the vehicle.
  11. 根据权利要求9或10所述的方法,其特征在于,所述第一模型基于注意力机制构建,所述将所述第一信息和所述第二信息输入所述第一模型中,得到所述第一模型生成的所述预测信息,包括:The method according to claim 9 or 10, characterized in that the first model is constructed based on an attention mechanism, and the inputting the first information and the second information into the first model to obtain the prediction information generated by the first model comprises:
    将所述第一信息和所述第二信息输入所述第一模型中,基于所述注意力机制,生成第四信息,所述第四信息包括所述自车周围的目标车辆与第一车道集合在所述第一时间内的关联度,所述目标车辆为所述自车周围的一个车辆,所述第一车道集合包括所述第二信息中包括的所述自车周围的所有车道;Inputting the first information and the second information into the first model, generating fourth information based on the attention mechanism, wherein the fourth information includes a correlation between a target vehicle around the ego vehicle and a first lane set within the first time, wherein the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;
    获取所述目标车辆所属的道路场景的类别,所述道路场景的类别包括路口场景和非路口场景;Acquire the category of the road scene to which the target vehicle belongs, where the category of the road scene includes an intersection scene and a non-intersection scene;
    根据所述目标车辆所属的道路场景的类别,从所述第一车道集合中选取第二车道集合,所述第二车道集合包括所述自车周围的车辆在所述第一时间内所在的车道;Selecting a second lane set from the first lane set according to the category of the road scene to which the target vehicle belongs, the second lane set including lanes where vehicles around the ego vehicle are located during the first time;
    从所述第四信息中获取第五信息,并根据所述第五信息生成所述第三信息,所述第五信息包括所述目标车辆与所述第二车道集合在所述第一时间内的关联度;Acquire fifth information from the fourth information, and generate the third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;
    根据所述第二信息和所述第四信息,生成所述自车周围的车辆在所述第一时间内的所述预测轨迹信息。The predicted trajectory information of vehicles around the own vehicle within the first time is generated based on the second information and the fourth information.
  12. 一种车辆的位置获取装置,其特征在于,包括:A vehicle position acquisition device, characterized by comprising:
    获取模块,用于获取第一信息和第二信息,所述第一信息包括自车周围的车辆的信息,所述第二信息包括所述自车周围的车道的信息;An acquisition module, configured to acquire first information and second information, wherein the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle;
    位置预测模块,用于将所述第一信息和所述第二信息输入第一模型中,得到所述第一模型生成的预测信息,所述预测信息包括所述自车周围的车辆在第一时间内的预测位置信息。The position prediction module is used to input the first information and the second information into a first model to obtain prediction information generated by the first model, wherein the prediction information includes predicted position information of vehicles around the vehicle within a first time.
  13. 根据权利要求12所述的装置,其特征在于,所述预测信息包括所述自车周围的车辆在所述第一时间内的预测轨迹信息和第三信息,所述第三信息指示所述自车周围的车辆在所述第一时间内所在的车道。The device according to claim 12 is characterized in that the prediction information includes predicted trajectory information of vehicles around the vehicle within the first time and third information, and the third information indicates the lane in which the vehicles around the vehicle are located within the first time.
  14. 根据权利要求13所述的装置,其特征在于,所述第三信息包括所述自车周围的目标车辆与所述自车周围的至少一个车道在所述第一时间内的关联度,所述目标车辆为所述自车周围的一个车辆,所述装置还包括:The device according to claim 13, characterized in that the third information includes a correlation between a target vehicle around the ego vehicle and at least one lane around the ego vehicle during the first time, the target vehicle being a vehicle around the ego vehicle, and the device further comprising:
    车道确定模块,用于将第一车道确定为所述目标车辆在所述第一时间内所在的车道,其中,所述第一车道为所述自车周围的至少一个车道中与所述目标车辆在所述第一时间内的关联度最高的一个车道。A lane determination module is used to determine a first lane as a lane where the target vehicle is located during the first time, wherein the first lane is a lane with the highest correlation with the target vehicle during the first time among at least one lane around the vehicle.
  15. 根据权利要求13或14所述的装置,其特征在于,所述第一模型基于注意力机制构建,所述位置预测模块具体用于:The device according to claim 13 or 14, characterized in that the first model is constructed based on an attention mechanism, and the position prediction module is specifically used to:
    将所述第一信息和所述第二信息输入所述第一模型中,基于所述注意力机制,生成第四信息,所述第四信息包括所述自车周围的目标车辆与第一车道集合在所述第一时间内的关联度,所述目标车辆为所述自车周围的一个车辆,所述第一车道集合包括所述第二信息中包括的所述自车周围的所有车道;Inputting the first information and the second information into the first model, generating fourth information based on the attention mechanism, wherein the fourth information includes a correlation between a target vehicle around the ego vehicle and a first lane set within the first time, wherein the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;
    获取所述目标车辆所属的道路场景的类别,所述道路场景的类别包括路口场景和非路口场景;Acquire the category of the road scene to which the target vehicle belongs, where the category of the road scene includes an intersection scene and a non-intersection scene;
    根据所述目标车辆所属的道路场景的类别,从所述第一车道集合中选取第二车道集合,所述第二车道集合包括所述自车周围的车辆在所述第一时间内所在的车道;Selecting a second lane set from the first lane set according to the category of the road scene to which the target vehicle belongs, the second lane set including lanes where vehicles around the ego vehicle are located during the first time;
    从所述第四信息中获取第五信息,并根据所述第五信息生成所述第三信息,所述第五信息包括所述目标车辆与所述第二车道集合在所述第一时间内的关联度;Acquire fifth information from the fourth information, and generate the third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;
    根据所述第二信息和所述第四信息,生成所述自车周围的车辆在所述第一时间内的所述预测轨迹信 息。Generate the predicted trajectory information of the vehicles around the vehicle within the first time according to the second information and the fourth information interest.
  16. 根据权利要求15所述的装置,其特征在于,所述位置预测模块具体用于:The device according to claim 15, characterized in that the position prediction module is specifically used to:
    从所述第四信息中获取所述第五信息,并对所述第五信息进行归一化操作,得到归一化后的第五信息;Acquire the fifth information from the fourth information, and perform a normalization operation on the fifth information to obtain normalized fifth information;
    将所述归一化后的第五信息输入多层感知机中,得到所述第三信息。The normalized fifth information is input into a multi-layer perceptron to obtain the third information.
  17. 根据权利要求15或16所述的装置,其特征在于,所述位置预测模块具体用于:The device according to claim 15 or 16, characterized in that the position prediction module is specifically used to:
    分别对所述第一信息和所述第二信息进行向量化处理和线性映射,得到第一线性矩阵和第二线性矩阵;Performing vectorization processing and linear mapping on the first information and the second information respectively to obtain a first linear matrix and a second linear matrix;
    对所述第一线性矩阵和所述第二线性矩阵的矩阵乘积执行归一化操作,得到所述第四信息。A normalization operation is performed on a matrix product of the first linear matrix and the second linear matrix to obtain the fourth information.
  18. 根据权利要求17所述的装置,其特征在于,所述位置预测模块具体用于:The device according to claim 17, characterized in that the location prediction module is specifically used to:
    对所述第二线性矩阵与所述第四信息执行矩阵乘运算,得到第六信息;Performing a matrix multiplication operation on the second linear matrix and the fourth information to obtain sixth information;
    将所述第六信息输入多层感知机中,得到所述自车周围的车辆在所述第一时间内的所述预测轨迹信息。The sixth information is input into a multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle within the first time.
  19. 一种模型的训练装置,其特征在于,包括:A model training device, characterized in that it comprises:
    获取模块,用于获取第一信息和第二信息,所述第一信息包括自车周围的车辆的信息,所述第二信息包括所述自车周围的车道的信息;An acquisition module, configured to acquire first information and second information, wherein the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle;
    位置预测模块,用于将所述第一信息和所述第二信息输入第一模型中,得到所述第一模型生成的预测信息,所述预测信息包括所述自车周围的车辆在第一时间内的预测位置信息;a position prediction module, configured to input the first information and the second information into a first model to obtain prediction information generated by the first model, wherein the prediction information includes predicted position information of vehicles around the vehicle within a first time;
    模型训练模块,用于根据损失函数对所述第一模型进行训练,所述损失函数指示所述预测信息和正确信息之间的相似度,所述正确信息包括所述自车周围的车辆在所述第一时间内的正确的位置信息。A model training module is used to train the first model according to a loss function, wherein the loss function indicates the similarity between the predicted information and the correct information, and the correct information includes the correct position information of the vehicles around the vehicle within the first time.
  20. 根据权利要求19所述的装置,其特征在于,其特征在于,所述预测信息包括所述自车周围的车辆在所述第一时间内的预测轨迹信息和第三信息,所述第三信息指示所述自车周围的车辆在所述第一时间内所在的车道。The device according to claim 19 is characterized in that the prediction information includes predicted trajectory information of vehicles around the self-vehicle within the first time and third information, and the third information indicates the lane in which the vehicles around the self-vehicle are located within the first time.
  21. 根据权利要求20所述的装置,其特征在于,所述第三信息包括所述自车周围的目标车辆与所述自车周围的至少一个车道在所述第一时间内的关联度,所述目标车辆为所述自车周围的一个车辆,所述装置还包括:The device according to claim 20, characterized in that the third information includes a correlation between a target vehicle around the ego vehicle and at least one lane around the ego vehicle during the first time, the target vehicle being a vehicle around the ego vehicle, and the device further comprising:
    车道确定模块,用于将第一车道确定为所述目标车辆在所述第一时间内所在的车道,其中,所述第一车道为所述自车周围的至少一个车道中与所述目标车辆在所述第一时间内的关联度最高的一个车道。A lane determination module is used to determine a first lane as a lane where the target vehicle is located during the first time, wherein the first lane is a lane with the highest correlation with the target vehicle during the first time among at least one lane around the vehicle.
  22. 根据权利要求20或21所述的装置,其特征在于,所述第一模型基于注意力机制构建,所述位置预测模块具体用于:The device according to claim 20 or 21, characterized in that the first model is constructed based on an attention mechanism, and the position prediction module is specifically used to:
    将所述第一信息和所述第二信息输入所述第一模型中,基于所述注意力机制,生成第四信息,所述第四信息包括所述自车周围的目标车辆与第一车道集合在所述第一时间内的关联度,所述目标车辆为所述自车周围的一个车辆,所述第一车道集合包括所述第二信息中包括的所述自车周围的所有车道;Inputting the first information and the second information into the first model, generating fourth information based on the attention mechanism, wherein the fourth information includes a correlation between a target vehicle around the ego vehicle and a first lane set within the first time, wherein the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;
    获取所述目标车辆所属的道路场景的类别,所述道路场景的类别包括路口场景和非路口场景;Acquire the category of the road scene to which the target vehicle belongs, where the category of the road scene includes an intersection scene and a non-intersection scene;
    根据所述目标车辆所属的道路场景的类别,从所述第一车道集合中选取第二车道集合,所述第二车道集合包括所述自车周围的车辆在所述第一时间内所在的车道;Selecting a second lane set from the first lane set according to the category of the road scene to which the target vehicle belongs, the second lane set including lanes where vehicles around the ego vehicle are located during the first time;
    从所述第四信息中获取第五信息,并根据所述第五信息生成所述第三信息,所述第五信息包括所述目标车辆与所述第二车道集合在所述第一时间内的关联度;Acquire fifth information from the fourth information, and generate the third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;
    根据所述第二信息和所述第四信息,生成所述自车周围的车辆在所述第一时间内的所述预测轨迹信息。The predicted trajectory information of vehicles around the own vehicle within the first time is generated based on the second information and the fourth information.
  23. 一种执行设备,其特征在于,包括处理器和存储器,所述处理器与所述存储器耦合,An execution device, comprising a processor and a memory, wherein the processor is coupled to the memory.
    所述存储器,用于存储程序;The memory is used to store programs;
    所述处理器,用于执行所述存储器中的程序,使得所述执行设备执行如权利要求1至7中任一项所述的方法。 The processor is configured to execute the program in the memory so that the execution device executes the method according to any one of claims 1 to 7.
  24. 一种自动驾驶车辆,其特征在于,包括处理器,所述处理器和存储器耦合,所述存储器存储有程序指令,当所述存储器存储的程序指令被所述处理器执行时实现权利要求1至7中任一项所述的方法。An autonomous driving vehicle, characterized in that it includes a processor, the processor is coupled to a memory, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the method described in any one of claims 1 to 7 is implemented.
  25. 一种训练设备,其特征在于,包括处理器和存储器,所述处理器与所述存储器耦合,A training device, comprising a processor and a memory, wherein the processor is coupled to the memory.
    所述存储器,用于存储程序;The memory is used to store programs;
    所述处理器,用于执行所述存储器中的程序,使得所述训练设备执行如权利要求8至11中任意一项所述的方法。The processor is used to execute the program in the memory so that the training device performs the method according to any one of claims 8 to 11.
  26. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至7中任一项所述的方法,或,执行如权利要求8至11中任一项所述的方法。A computer-readable storage medium comprises instructions, which, when executed on a computer, causes the computer to execute the method according to any one of claims 1 to 7, or the method according to any one of claims 8 to 11.
  27. 一种电路系统,其特征在于,所述电路系统包括处理电路,所述处理电路配置为执行如权利要求1至7中任一项所述的方法,或,执行如权利要求8至11中任一项所述的方法。 A circuit system, characterized in that the circuit system comprises a processing circuit, and the processing circuit is configured to execute the method as claimed in any one of claims 1 to 7, or to execute the method as claimed in any one of claims 8 to 11.
PCT/CN2023/104695 2022-10-31 2023-06-30 Vehicle position acquiring method, model training method, and related device WO2024093321A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211350093.1A CN117994754A (en) 2022-10-31 2022-10-31 Vehicle position acquisition method, model training method and related equipment
CN202211350093.1 2022-10-31

Publications (1)

Publication Number Publication Date
WO2024093321A1 true WO2024093321A1 (en) 2024-05-10

Family

ID=90898125

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/104695 WO2024093321A1 (en) 2022-10-31 2023-06-30 Vehicle position acquiring method, model training method, and related device

Country Status (2)

Country Link
CN (1) CN117994754A (en)
WO (1) WO2024093321A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180173240A1 (en) * 2016-12-21 2018-06-21 Baidu Usa Llc Method and system to predict one or more trajectories of a vehicle based on context surrounding the vehicle
CN110040138A (en) * 2019-04-18 2019-07-23 北京智行者科技有限公司 A kind of parallel auxiliary driving method of vehicle and system
CN113771867A (en) * 2020-06-10 2021-12-10 华为技术有限公司 Method and device for predicting driving state and terminal equipment
CN114889610A (en) * 2022-05-20 2022-08-12 重庆长安汽车股份有限公司 Target vehicle lane change time prediction method and system based on recurrent neural network
CN115195718A (en) * 2022-07-01 2022-10-18 岚图汽车科技有限公司 Lane keeping auxiliary driving method and system and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180173240A1 (en) * 2016-12-21 2018-06-21 Baidu Usa Llc Method and system to predict one or more trajectories of a vehicle based on context surrounding the vehicle
CN110040138A (en) * 2019-04-18 2019-07-23 北京智行者科技有限公司 A kind of parallel auxiliary driving method of vehicle and system
CN113771867A (en) * 2020-06-10 2021-12-10 华为技术有限公司 Method and device for predicting driving state and terminal equipment
CN114889610A (en) * 2022-05-20 2022-08-12 重庆长安汽车股份有限公司 Target vehicle lane change time prediction method and system based on recurrent neural network
CN115195718A (en) * 2022-07-01 2022-10-18 岚图汽车科技有限公司 Lane keeping auxiliary driving method and system and electronic equipment

Also Published As

Publication number Publication date
CN117994754A (en) 2024-05-07

Similar Documents

Publication Publication Date Title
CN113261035B (en) Trajectory prediction method and related equipment
US20230127115A1 (en) Three-Dimensional Object Detection
Ma et al. Artificial intelligence applications in the development of autonomous vehicles: A survey
US11899411B2 (en) Hybrid reinforcement learning for autonomous driving
WO2021160184A1 (en) Target detection method, training method, electronic device, and computer-readable medium
CN111860155B (en) Lane line detection method and related equipment
US20220011122A1 (en) Trajectory prediction method and device
US11768292B2 (en) Three-dimensional object detection
WO2020264010A1 (en) Low variance region detection for improved detection
WO2023131065A1 (en) Image processing method, lane line detection method and related device
CN110371132B (en) Driver takeover evaluation method and device
CN111971574A (en) Deep learning based feature extraction for LIDAR localization of autonomous vehicles
CN111771141A (en) LIDAR positioning in autonomous vehicles using 3D CNN networks for solution inference
CN111771135A (en) LIDAR positioning using RNN and LSTM for time smoothing in autonomous vehicles
CN116348938A (en) Method and system for predicting dynamic object behavior
US11537819B1 (en) Learned state covariances
CN114882457A (en) Model training method, lane line detection method and equipment
CN115273002A (en) Image processing method, device, storage medium and computer program product
CN113552867A (en) Planning method of motion trail and wheel type mobile equipment
WO2022178858A1 (en) Vehicle driving intention prediction method and apparatus, terminal and storage medium
CN114332845A (en) 3D target detection method and device
WO2024093321A1 (en) Vehicle position acquiring method, model training method, and related device
CN115214708A (en) Vehicle intention prediction method and related device thereof
Gao et al. Deep learning‐based hybrid model for the behaviour prediction of surrounding vehicles over long‐time periods
CN115546781A (en) Point cloud data clustering method and device