WO2024093321A1

WO2024093321A1 - Vehicle position acquiring method, model training method, and related device

Info

Publication number: WO2024093321A1
Application number: PCT/CN2023/104695
Authority: WO
Inventors: 李姗; 邓乃铭; 邢国成; 朱丽
Original assignee: 华为技术有限公司
Priority date: 2022-10-31
Filing date: 2023-06-30
Publication date: 2024-05-10
Also published as: CN117994754A

Abstract

A vehicle position acquiring method, a model training method, and a related device. The method comprises: acquiring first information and second information, the first information comprising information on a vehicle in the vicinity of an ego vehicle, and the second information comprising information on a lane in the vicinity of the ego vehicle; and inputting the first information and the second information into a first model to obtain predicted information generated by the first model, the predicted information comprising predicted position information of the vehicle in the vicinity of the ego vehicle within a first time. By taking into account information on a lane in the vicinity of an ego vehicle and associating predicted position information of a vehicle in the vicinity of the ego vehicle with a lane, the accuracy of the prediction result is improved.

Description

Vehicle position acquisition method, model training method and related equipment

This application claims priority to the Chinese patent application filed with the China Patent Office on October 31, 2022, with application number 202211350093.1 and invention name “Vehicle position acquisition method, model training method and related equipment”, the entire contents of which are incorporated by reference in this application.

Technical Field

The present application relates to the field of artificial intelligence, and in particular to a vehicle position acquisition method, a model training method, and related equipment.

Background technique

When an autonomous vehicle is driving on the road, it needs to consider the driving trajectories of surrounding vehicles. When the driving intentions of surrounding vehicles change, the autonomous vehicle needs to respond accordingly to avoid collisions with surrounding vehicles. Therefore, accurate prediction of other vehicles’ driving intentions is very important for autonomous vehicles.

At present, methods such as Kalman filtering are mainly used to predict the position of the vehicle. However, this method only relies on the historical trajectory of the vehicle and does not consider the lane information in the environment.

Summary of the invention

The present application provides a vehicle position acquisition method, a model training method and related equipment, which can predict the positions of vehicles around the vehicle.

In a first aspect, the present application provides a method for obtaining the position of a vehicle, which can be used in the field of artificial intelligence. The method includes:

First, first information and second information are obtained, wherein the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle; then, the first information and the second information are input into the first model to obtain prediction information generated by the first model, wherein the prediction information includes predicted position information of vehicles around the vehicle within the first time.

In real-life scenarios, the ego vehicle and the vehicles around it form an interdependent whole, and their respective behaviors affect each other's decisions. Previous studies often rely solely on the historical trajectory of the ego vehicle to predict the future trajectory of the vehicle, and the prediction results are obviously inaccurate. In this application, by combining the lane information around the ego vehicle, the predicted position information of the vehicles around the ego vehicle is associated with the lane, thereby further improving the accuracy of the prediction results, providing a basis for the decision-making planning of the autonomous driving vehicle, and also improving the riding experience of the autonomous driving vehicle.

In a possible implementation manner of the first aspect, the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.

In this possible implementation, based on the consideration of the lane information around the vehicle, the vehicle's future driving intention is bound to the lane by outputting the information of the lanes where the vehicles around the vehicle are located, which effectively utilizes the relationship between the vehicles and lanes around the vehicle and improves the prediction accuracy of the vehicle's position.

In a possible implementation manner of the first aspect, the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the method further includes:

The first lane is determined as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.

In this possible implementation, the future driving position of the vehicle is bound to the lane. Based on the lane information around the vehicle, the correlation between the target vehicle and the lanes around the vehicle is output to give the probability of the target vehicle traveling in each lane in the future. The lane with the highest correlation is determined as the lane where the target vehicle is located in the first time, thereby improving the accuracy of the prediction results.

In a possible implementation manner of the first aspect, the first model is constructed based on an attention mechanism, and the first information and the second information are input into the first model to obtain prediction information generated by the first model, including: inputting the first information and the second information into the first model, and generating fourth information based on the attention mechanism, the fourth information including a correlation between a target vehicle around the ego vehicle and the first lane set within a first time, the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;

Obtaining the category of the road scene to which the target vehicle belongs, the categories of the road scene include intersection scenes and non-intersection scenes;

According to the category of the road scene to which the target vehicle belongs, a second lane set is selected from the first lane set, where the second lane set includes lanes where vehicles around the ego vehicle are located at the first time;

The fifth information is obtained from the fourth information, and the third information is generated according to the fifth information, wherein the fifth information includes the target vehicle and the second lane set. The relevance of the combination in the first place;

Based on the second information and the fourth information, the predicted trajectory information of the vehicles around the vehicle within the first time is generated.

In this possible implementation, by inputting the information of vehicles and lanes around the vehicle into the first model, based on the attention mechanism, the correlation between the target vehicle and the first lane set in the first time is obtained. On the one hand, by binding the vehicle's future driving intention with the lane, the predicted intention is made more stable; on the other hand, by screening the road scene to which the target vehicle belongs, the lane to be retained is selected according to the road scene in which the vehicle is located, thereby further improving the accuracy of the prediction.

In a possible implementation manner of the first aspect, acquiring fifth information from fourth information, and generating third information according to the fifth information includes:

Acquire fifth information from the fourth information, and perform a normalization operation on the fifth information to obtain normalized fifth information;

The normalized fifth information is input into the multi-layer perceptron to obtain the third information.

In this possible implementation, the fourth information includes the correlation between the target vehicle around the vehicle and the first lane set in the first time, that is, the attention score of the target vehicle relative to each lane in the first lane set. According to the road scene to which the target vehicle belongs, the second lane set to which the target vehicle belongs can be determined, and the corresponding fifth information can be screened out from the fourth information according to the second lane set, so that the lane prediction can be carried out in a targeted manner according to the specific road scene to which the target vehicle belongs, so as to further improve the accuracy of the prediction. In addition, by normalizing each element contained in the fifth information so that the sum of all elements is 1, and then inputting it into the multi-layer perceptron, under the action of the multi-layer perceptron, the correlation between the target vehicle and the lanes around the vehicle in the first time is output.

In a possible implementation manner of the first aspect, generating fourth information according to the first information and the second information includes:

Performing vectorization processing and linear mapping on the first information and the second information respectively to obtain a first linear matrix and a second linear matrix;

A normalization operation is performed on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.

In this possible implementation, the first information and the second information can be fused based on the attention mechanism to obtain an attention score of the target vehicle relative to each lane.

In a possible implementation manner of the first aspect, generating predicted trajectory information of vehicles around the vehicle within a first time according to the second information and the fourth information includes:

Performing a matrix multiplication operation on the second linear matrix and the fourth information to obtain sixth information;

The sixth information is input into the multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle in the first time.

In this possible implementation, the data of the second linear matrix and the fourth information can be fused based on the attention mechanism, and the obtained third information is input into a multi-layer perceptron. Under the action of the multi-layer perceptron, the predicted trajectory information of the vehicles around the vehicle in the first time is obtained.

In a second aspect, the present application provides a model training method that can be used in the field of artificial intelligence. The method includes:

First, first information and second information are obtained, the first information including information about vehicles around the vehicle, and the second information including information about lanes around the vehicle; then, the first information and the second information are input into the first model to obtain prediction information generated by the first model, the prediction information including predicted position information of vehicles around the vehicle within the first time; finally, the first model is trained according to the loss function, the loss function indicates the similarity between the prediction information and the correct information, and the correct information includes the correct position information of vehicles around the vehicle within the first time.

In the present application, when training the first model, the training samples used include complete information about vehicles around the vehicle and complete information about lanes around the vehicle, so that the position information output by the first model is more accurate. It is understandable that the first model can be used to perform the steps in the aforementioned first aspect or the optional implementation of the first aspect.

In a possible implementation manner of the second aspect, the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.

In a possible implementation manner of the second aspect, the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the method further includes:

In a possible implementation manner of the second aspect, the first model is constructed based on an attention mechanism, and the first information and the second information are input into the first model to obtain prediction information generated by the first model, including:

Input the first information and the second information into the first model, and generate fourth information based on the attention mechanism, wherein the fourth information includes the correlation between the target vehicle around the ego vehicle and the first lane set within the first time, the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;

Acquire fifth information from the fourth information, and generate third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;

In a possible implementation manner of the second aspect, acquiring fifth information from fourth information, and generating third information according to the fifth information includes:

In a possible implementation manner of the second aspect, generating fourth information according to the first information and the second information includes:

In a possible implementation manner of the second aspect, generating predicted trajectory information of vehicles around the vehicle within a first time according to the second information and the fourth information includes:

For the specific meanings of the nouns in the second aspect of the embodiment of the present application and the various possible implementations of the second aspect, reference may be made to the descriptions in the various possible implementations of the first aspect, and will not be repeated here one by one.

In a third aspect, the present application provides a vehicle position acquisition device that can be used in the field of artificial intelligence. The device includes an acquisition module and a position prediction module. The acquisition module is used to acquire first information and second information, the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle; the position prediction module is used to input the first information and the second information into a first model to obtain prediction information generated by the first model, and the prediction information includes predicted position information of vehicles around the vehicle within the first time.

In a possible implementation manner of the third aspect, the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.

In a possible implementation manner of the third aspect, the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the device further includes:

The lane determination module is used to determine the first lane as the lane where the target vehicle is located within the first time, wherein the first lane is a lane with the highest correlation with the target vehicle within the first time among at least one lane around the vehicle.

In a possible implementation manner of the third aspect, the first model is constructed based on the attention mechanism, and the position prediction module is specifically used for:

In a possible implementation manner of the third aspect, the location prediction module is specifically used to:

In the third aspect of the present application, the various modules included in the vehicle position acquisition device can also be used to implement the steps in the various possible implementation methods of the first aspect. For the specific implementation methods of the third aspect of the embodiment of the present application and certain steps in the various possible implementation methods of the third aspect, as well as the beneficial effects brought about by each possible implementation method, you can refer to the description of the various possible implementation methods in the first aspect, and will not be repeated here one by one.

In a fourth aspect, the present application provides a model training device that can be used in the field of artificial intelligence. The device includes an acquisition module, a position prediction module, and a model training module. Among them, the acquisition module is used to acquire first information and second information, the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle; the position prediction module is used to input the first information and the second information into the first model to obtain the prediction information generated by the first model, and the prediction information includes the predicted position information of the vehicles around the vehicle within the first time; the model training module is used to train the first model according to the loss function, and the loss function indicates the similarity between the prediction information and the correct information, and the correct information includes the correct position information of the vehicles around the vehicle within the first time.

In a possible implementation manner of the fourth aspect, the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.

In a possible implementation manner of the fourth aspect, the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the device further includes:

In a possible implementation manner of the fourth aspect, the first model is constructed based on the attention mechanism, and the position prediction module is specifically used for:

In a possible implementation manner of the fourth aspect, the location prediction module is specifically used to:

In the fourth aspect of the present application, the modules included in the training device of the model can also be used to implement the steps in the various possible implementation methods of the second aspect. For the specific implementation methods of the fourth aspect of the embodiment of the present application and certain steps in the various possible implementation methods of the fourth aspect, as well as the beneficial effects brought about by each possible implementation method, you can refer to the description of the various possible implementation methods in the second aspect, and will not be repeated here one by one.

In a fifth aspect, an embodiment of the present application provides an execution device, which may include a processor, the processor and a memory are coupled, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the vehicle position acquisition method described in the first aspect is implemented. For the steps executed by the execution device in each possible implementation method of the processor executing the first aspect, please refer to the first aspect above for details, and no further description is given here.

In a sixth aspect, an embodiment of the present application provides an autonomous driving vehicle, which may include a processor, the processor and a memory are coupled, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the vehicle position acquisition method described in the first aspect is implemented. For the steps executed by the execution device in each possible implementation method of the processor executing the first aspect, the details can be referred to the first aspect above, and will not be repeated here.

In a seventh aspect, an embodiment of the present application provides a training device, which may include a processor, the processor and a memory are coupled, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the training method of the model described in the second aspect is implemented. For the steps performed by the training device in each possible implementation method of the processor performing the second aspect, please refer to the second aspect for details, and no further description is given here.

In an eighth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which, when running on a computer, enables the computer to execute the method described in the first aspect or any possible implementation of the first aspect, or enables the computer to execute the method described in the second aspect or any possible implementation of the second aspect.

In the ninth aspect, an embodiment of the present application provides a circuit system, which includes a processing circuit, and the processing circuit is configured to execute the method described in the first aspect or any possible implementation of the first aspect, or the processing circuit is configured to execute the method described in the second aspect or any possible implementation of the second aspect.

In the tenth aspect, an embodiment of the present application provides a computer program product, which, when running on a computer, enables the computer to execute the method described in the first aspect or any possible implementation of the first aspect, or enables the computer to execute the method described in the second aspect or any possible implementation of the second aspect.

In the eleventh aspect, the present application provides a chip system, including a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method as described in the first aspect or any possible implementation of the first aspect, or to enable the computer to execute the method as described in the second aspect or any possible implementation of the second aspect. The chip system can be composed of a chip, or it can include a chip and other discrete devices.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1a is a schematic diagram of a structure of an artificial intelligence main framework;

FIG1b is a structural schematic diagram of a road condition;

FIG1c is a schematic diagram of a structure of an automatic driving device with an automatic driving function provided in an embodiment of the present application;

FIG2a is a schematic diagram of a system architecture provided by the present application;

FIG2b is a schematic diagram of a flow chart of a method for obtaining a vehicle position provided in the present application;

FIG3 is a schematic diagram of a structure of a multi-layer perceptron provided in an embodiment of the present application;

FIG4 is another schematic diagram of a flow chart of a method for obtaining a vehicle position provided by the present application;

FIG5 is a schematic diagram of a structure of a first model provided in an embodiment of the present application;

FIG6 is another schematic diagram of the structure of the first model provided in an embodiment of the present application;

FIG7 is a schematic diagram of a structure of a first embedded module provided in an embodiment of the present application;

FIG8 is a schematic diagram of a structure of a second embedded module provided in an embodiment of the present application;

FIG9 is a schematic diagram of a structure of a first decoder module provided in an embodiment of the present application;

FIG10 is a flow chart of a method for training a model provided in an embodiment of the present application;

FIG11 is a schematic diagram of a structure of a vehicle position acquisition device provided in an embodiment of the present application;

FIG12 is a schematic diagram of a structure of a training device for a model provided in an embodiment of the present application;

FIG13 is a schematic diagram of a structure of an execution device provided in an embodiment of the present application;

FIG14 is a schematic diagram of a structure of a training device provided in an embodiment of the present application;

FIG. 15 is a schematic diagram of the structure of a chip provided in an embodiment of the present application.

Detailed ways

The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only some embodiments of the present application, not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without making creative work are within the scope of protection of this application. It is known to those skilled in the art that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.

The terms "first", "second", etc. in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable where appropriate, so that the embodiments described herein can be implemented in an order other than that illustrated or described herein. In addition, the terms "including" and "having" and any of their variations are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device that includes a series of steps or modules is not necessarily limited to those steps or modules that are clearly listed, but may include other steps or modules that are not clearly listed or inherent to these processes, methods, products or devices.

The term "and/or" in this application can be a description of the association relationship of associated objects, indicating that three relationships can exist. For example, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone. In addition, the character "/" in this application generally indicates that the associated objects before and after are in an "or" relationship.

It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order of the drawings. For example, two figures shown in succession may in fact occur substantially simultaneously or may sometimes be performed in the reverse order, depending on the functions/acts involved.

In the embodiments of the present application, unless otherwise specified, the meaning of "at least one" refers to one or more, and the meaning of "plurality" refers to two or more. It is understood that in the present application, "when", "if" and "if" all refer to the device making corresponding processing under certain objective circumstances, and do not limit the time, nor do they require that there must be a judgment action when the device is implemented, nor do they mean that there are other limitations. In addition, the special word "exemplary" means "used as an example, embodiment or illustrative". Any embodiment described as "exemplary" is not necessarily interpreted as being superior or better than other embodiments.

The vehicle position acquisition method provided in this application can be applied to artificial intelligence (AI) scenarios. AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.

The embodiments of the present application are described below in conjunction with the accompanying drawings. It is known to those skilled in the art that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.

First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1a. Figure 1a shows a structural diagram of the main framework of artificial intelligence. The following is an explanation of the above artificial intelligence theme framework from the two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be a general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensation process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecology process of the system.

(1) Infrastructure:

The infrastructure provides computing power support for the artificial intelligence system, enables communication with the outside world, and is supported by the basic platform. It communicates with the outside world through sensors; computing power is provided by smart chips, such as central processing units (CPU), neural-network processing units (NPU), graphics processing units (GPU), application specific integrated circuits (ASIC) or field programmable gate arrays (FPGA) and other hardware acceleration chips; the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.

(2) Data

The data on the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data involves graphics, images, voice, video, The text also involves IoT data of traditional equipment, including business data of existing systems and perception data such as force, displacement, liquid level, temperature, and humidity.

(3) Data processing

Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.

Among them, machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, and training.

Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formalized information to perform machine thinking and solve problems based on reasoning control strategies. Typical functions are search and matching.

Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.

(4) General capabilities

After the data has undergone the data processing mentioned above, some general capabilities can be further formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing (such as image recognition, target detection, etc.), speech recognition, etc.

(5) Smart products and industry applications

Smart products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, which productizes intelligent information decision-making and realizes practical applications. Its application areas mainly include: smart manufacturing, smart transportation, smart home, smart medical care, smart security, autonomous driving, smart terminals, etc.

The present application can be applied to the field of autonomous driving, and specifically can realize the prediction of driving intention and driving trajectory of other vehicles in the field of autonomous driving.

Driving intention refers to the driving strategy that a vehicle will take in the future. Specifically, the driving intention of a vehicle can be estimated based on the vehicle's road condition information and driving status. Vehicle trajectory prediction refers to predicting the location of the vehicle at each time point in the future.

In the field of autonomous driving, by estimating the driving intentions of surrounding vehicles in real time, accurately and reliably, and predicting the future driving trajectory of the vehicle, it can help the vehicle to predict the traffic conditions ahead, establish the traffic situation around the vehicle, help judge the importance of the surrounding vehicle targets, screen the key targets for interaction, and facilitate the vehicle to plan the path in advance and pass through complex scenes safely. It should be understood that in the embodiments of the present application, the above-mentioned surrounding vehicles can also be referred to as associated vehicles located around the vehicle.

In the prior art, driving intention is defined as directional intentions such as going straight, turning left, and turning right. For example, in an intersection scenario, the driving intention of a vehicle may include going straight, turning left, and turning right. However, the above definition of driving intention has limited representation capabilities in complex scenarios, and directional intentions cannot cover all driving intentions in some complex intersections or other complex lane scenarios. For example, referring to Figure 1b, Figure 1b is a structural schematic diagram of a road condition, in which lanes 1 and 2 are left-turn lanes, lanes 3 and 4 are straight lanes, and lane 5 is an S-shaped lane.

A vehicle position acquisition method provided in an embodiment of the present application can be applied to an automatic driving prediction system. The prediction system can predict the driving intention and predicted trajectory of other vehicles based on road condition information, the vehicle's historical driving route and other information.

In an embodiment of the present application, the prediction system may include a hardware circuit (such as an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, etc.), or a combination of these hardware circuits. For example, the prediction system may be a hardware system with an instruction execution function, such as a CPU, DSP, etc., or a hardware system without an instruction execution function, such as an ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems without an instruction execution function and hardware systems with an instruction execution function.

Specifically, the prediction system can be a hardware system with an execution instruction function, and the vehicle position acquisition provided in the embodiment of the present application can be a software code stored in a memory. The prediction system can obtain the software code from the memory and execute the obtained software code to implement the vehicle position acquisition provided in the embodiment of the present application.

It should be understood that the prediction system can be a combination of a hardware system that does not have the function of executing instructions and a hardware system that has the function of executing instructions. Some steps of obtaining the vehicle position provided in the embodiment of the present application can also be implemented by a hardware system in the prediction system that does not have the function of executing instructions, which is not limited here.

In an embodiment of the present application, the prediction system can be deployed on a vehicle or a server on the cloud side. Next, taking the deployment of the prediction system on a vehicle as an example, combined with the software and hardware modules on the vehicle, the prediction process of using the prediction system to realize the driving intention of other vehicles and the predicted trajectory is described.

The vehicles in the embodiments of the present application, such as the target vehicle in the embodiments of the present application, the associated vehicles around the target vehicle, etc., may refer to internal combustion engine vehicles that use an engine as a power source, hybrid vehicles that use an engine and an electric motor as power sources, electric vehicles that use an electric motor as a power source, and the like.

In an embodiment of the present application, the vehicle may include an automatic driving device 100 with an automatic driving function.

Referring to FIG. 1c, FIG. 1c is a functional block diagram of an automatic driving device 100 with an automatic driving function provided in an embodiment of the present application. In one embodiment, the automatic driving device 100 may include various subsystems, such as a travel system 102, a sensor system 104, a control system 106, one or more peripheral devices 108, and a power supply 110, a computer system 112, and a user interface 116. Optionally, the automatic driving device 100 may include more or fewer subsystems, and each subsystem may include multiple elements. In addition, each subsystem and element of the automatic driving device 100 may be interconnected by wire or wirelessly.

The travel system 102 may include components that provide powered movement for the autonomous driving device 100. In one embodiment, the travel system 102 may include an engine 118, an energy source 119, a transmission 120, and wheels/tires 121. The engine 118 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a hybrid engine consisting of a gasoline engine and an electric motor, or a hybrid engine consisting of an internal combustion engine and an air compression engine. The engine 118 converts the energy source 119 into mechanical energy.

Examples of energy source 119 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity. Energy source 119 may also provide energy for other systems of autonomous driving device 100.

The transmission 120 can transmit mechanical power from the engine 118 to the wheels 121. The transmission 120 may include a gearbox, a differential, and a drive shaft. In one embodiment, the transmission 120 may also include other devices, such as a clutch. Among them, the drive shaft may include one or more shafts that can be coupled to one or more wheels 121.

The sensor system 104 may include several sensors that sense information about the environment surrounding the autonomous driving device 100. For example, the sensor system 104 may include a positioning system 122 (the positioning system may be a global positioning system (GPS) system, or a Beidou system or other positioning systems), an inertial measurement unit (IMU) 124, a radar 126, a laser rangefinder 128, and a camera 130. The sensor system 104 may also include sensors of the internal systems of the monitored autonomous driving device 100 (e.g., an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensor data from one or more of these sensors may be used to detect objects and their corresponding characteristics (position, shape, direction, speed, etc.). Such detection and recognition are key functions for the safe operation of the autonomous autonomous driving device 100.

Positioning system 122 may be used to estimate the geographic location of autonomous driving device 100. IMU 124 is used to sense position and orientation changes of autonomous driving device 100 based on inertial acceleration. In one embodiment, IMU 124 may be a combination of an accelerometer and a gyroscope.

The radar 126 may utilize radio signals to sense objects within the surrounding environment of the autonomous driving device 100. In some embodiments, in addition to sensing objects, the radar 126 may also be used to sense the speed and/or heading of the objects.

The radar 126 may include an electromagnetic wave transmitting unit and a receiving unit. The radar 126 may be implemented as a pulse radar or a continuous wave radar based on the principle of radio wave transmission. In the continuous wave radar mode, the radar 126 may be implemented as a frequency modulated continuous wave (FMCW) mode or a frequency shift keying (FSK) mode according to the signal waveform.

The radar 126 can detect an object based on a time of flight (TOF) method or a phase-shift method using electromagnetic waves as a medium, and detect the position of the detected object, the distance to the detected object, and the relative speed. In order to detect an object located in front of, behind, or to the side of the vehicle, the radar 126 can be configured at an appropriate position outside the vehicle. The laser radar 126 can detect an object based on a TOF method or a phase-shift method using laser as a medium, and detect the position of the detected object, the distance to the detected object, and the relative speed.

Alternatively, in order to detect objects located in front of, behind, or to the sides of the vehicle, the lidar 126 may be configured at a suitable location on the exterior of the vehicle.

The laser rangefinder 128 may utilize laser light to sense objects in the environment in which the autonomous driving device 100 is located. In some embodiments, the laser rangefinder 128 may include one or more laser sources, a laser scanner, and one or more detectors, among other system components.

The camera 130 may be used to capture multiple images of the surrounding environment of the autonomous driving device 100. The camera 130 may be a still camera or a video camera. machine.

Optionally, in order to obtain images of the outside of the vehicle, the camera 130 may be located at an appropriate position outside the vehicle. For example, in order to obtain images in front of the vehicle, the camera 130 may be arranged in the interior of the vehicle close to the front windshield. Alternatively, the camera 130 may be arranged around the front bumper or radiator grille. For example, in order to obtain images of the rear of the vehicle, the camera 130 may be arranged in the interior of the vehicle close to the rear window glass. Alternatively, the camera 130 may be arranged around the rear bumper, trunk, or tailgate. For example, in order to obtain images of the side of the vehicle, the camera 130 may be arranged in the interior of the vehicle close to at least one of the side windows. Alternatively, the camera 130 may be arranged around the side mirrors, fenders, or doors.

In the embodiment of the present application, the road condition information, historical driving route, and historical driving routes of associated vehicles located around the target vehicle, etc. of the target vehicle can be acquired based on one or more sensors in the sensor system 104.

The control system 106 controls the operation of the autonomous driving device 100 and its components. The control system 106 may include various elements, including a steering system 132 , a throttle 134 , a brake unit 136 , a sensor fusion algorithm 138 , a computer vision system 140 , a path control system 142 , and an obstacle avoidance system 144 .

The steering system 132 is operable to adjust the forward direction of the autonomous driving device 100. For example, in one embodiment, it may be a steering wheel system.

The throttle 134 is used to control the operating speed of the engine 118 and thus the speed of the autopilot 100 .

The brake unit 136 is used to control the deceleration of the automatic driving device 100. The brake unit 136 can use friction to slow down the wheel 121. In other embodiments, the brake unit 136 can convert the kinetic energy of the wheel 121 into electric current. The brake unit 136 can also take other forms to slow down the rotation speed of the wheel 121 to control the speed of the automatic driving device 100.

The computer vision system 140 may be operable to process and analyze images captured by the camera 130 in order to identify objects and/or features in the environment surrounding the autonomous driving device 100. The objects and/or features may include traffic signs, road boundaries, and obstacles. The computer vision system 140 may use object recognition algorithms, structure from motion (SFM) algorithms, video tracking, and other computer vision techniques. In some embodiments, the computer vision system 140 may be used to map the environment, track objects, estimate the speed of objects, and the like.

The route control system 142 is used to determine the driving route of the autonomous driving device 100. In some embodiments, the route control system 142 may combine data from the sensors 138, the positioning system 122, and one or more predetermined maps to determine the driving route for the autonomous driving device 100.

The obstacle avoidance system 144 is used to identify, evaluate, and avoid or otherwise negotiate potential obstacles in the environment of the autonomous driving device 100 .

Of course, in one example, the control system 106 may include additional or alternative components other than those shown and described, or may also reduce some of the components shown above.

The autonomous driving device 100 interacts with external sensors, other autonomous driving devices, other computer systems, or users through peripheral devices 108. The peripheral devices 108 may include a wireless communication system 146, an onboard computer 148, a microphone 150, and/or a speaker 152.

In some embodiments, the peripheral device 108 provides a means for the user of the autonomous driving device 100 to interact with the user interface 116. For example, the onboard computer 148 can provide information to the user of the autonomous driving device 100. The user interface 116 can also operate the onboard computer 148 to receive input from the user. The onboard computer 148 can be operated through a touch screen. In other cases, the peripheral device 108 can provide a means for the autonomous driving device 100 to communicate with other devices located in the vehicle. For example, the microphone 150 can receive audio (e.g., voice commands or other audio input) from the user of the autonomous driving device 100. Similarly, the speaker 152 can output audio to the user of the autonomous driving device 100.

The wireless communication system 146 can communicate wirelessly with one or more devices directly or via a communication network. For example, the wireless communication system 146 can use 3G cellular communication, such as code division multiple access (CDMA), EVDO, global system for mobile communications (GSM)/general packet radio service (GPRS), or 4G cellular communication, such as long term evolution (LTE), or 5G cellular communication. The wireless communication system 146 can communicate with a wireless local area network (WLAN) using WiFi. In some embodiments, the wireless communication system 146 can communicate directly with the device using an infrared link, Bluetooth, or ZigBee. Other wireless protocols, such as various autonomous driving device communication systems, for example, the wireless communication system 146 may include one or more dedicated short range communications (DSRC) devices, which may include public and/or private data communications between autonomous driving devices and/or roadside stations.

In one implementation, the road condition information, historical driving trajectory and other information in the embodiments of the present application can be received by the vehicle from other vehicles or a cloud-side server through the wireless communication system 146.

When the prediction system is located on a server on the cloud side, the vehicle can receive driving intention information and the like for the target vehicle transmitted by the server through the wireless communication system 146 .

The power source 110 can provide power to the various components of the autonomous driving device 100. In one embodiment, the power source 110 can be a rechargeable lithium-ion or lead-acid battery. One or more battery packs of such batteries can be configured as a power source to provide power to the various components of the autonomous driving device 100. In some embodiments, the power source 110 and the energy source 119 can be implemented together, such as in some all-electric vehicles.

Some or all functions of the autonomous driving device 100 are controlled by a computer system 112. The computer system 112 may include at least one processor 113 that executes instructions 115 stored in a non-transitory computer-readable medium such as a memory 114. The computer system 112 may also be a plurality of computing devices that control individual components or subsystems of the autonomous driving device 100 in a distributed manner.

Processor 113 can be any conventional processor, such as a commercially available central processing unit (CPU). Alternatively, the processor can be a dedicated device such as an application specific integrated circuit (ASIC) or other hardware-based processor. Although FIG. 1c functionally illustrates the processor, memory, and other elements of the computer 110 in the same block, it should be understood by those skilled in the art that the processor, computer, or memory may actually include multiple processors, computers, or memories that may or may not be stored in the same physical housing. For example, the memory may be a hard drive or other storage medium located in a housing different from the computer 110. Therefore, references to processors or computers will be understood to include references to a collection of processors or computers or memories that may or may not operate in parallel. Different from using a single processor to perform the steps described herein, some components such as steering components and deceleration components can each have their own processor that performs only calculations related to the functions specific to the component.

In various aspects described herein, the processor may be located remotely from the autonomous driving device and in wireless communication with the autonomous driving device. In other aspects, some of the processes described herein are performed on a processor disposed within the autonomous driving device and others are performed by a remote processor, including taking the necessary steps to perform a single maneuver.

In some embodiments, the memory 114 may include instructions 115 (e.g., program logic) that may be executed by the processor 113 to perform various functions of the autonomous driving device 100, including those described above. The memory 114 may also include additional instructions, including instructions to send data to, receive data from, interact with, and/or control one or more of the travel system 102, the sensor system 104, the control system 106, and the peripherals 108.

Memory 114 may store data, such as road maps, route information, the position, direction, speed, and other such autopilot data of the autopilot, and other information in addition to instructions 115. Such information may be used by the autopilot 100 and computer system 112 during operation of the autopilot 100 in autonomous, semi-autonomous, and/or manual modes.

The vehicle position acquisition method provided in the embodiment of the present application may be a software code stored in the memory 114. The processor 113 may acquire the software code from the memory and execute the acquired software code to implement the vehicle position acquisition method provided in the embodiment of the present application. After obtaining the driving intention of the target vehicle, the driving intention may be transmitted to the control system 106, and the control system 106 may determine the driving strategy of the vehicle based on the driving intention.

The user interface 116 is used to provide information to or receive information from a user of the autonomous driving device 100. Optionally, the user interface 116 may include one or more input/output devices within the set of peripheral devices 108, such as a wireless communication system 146, an onboard computer 148, a microphone 150, and a speaker 152.

Computer system 112 may control functions of autonomous driving device 100 based on input received from various subsystems (e.g., travel system 102, sensor system 104, and control system 106) and from user interface 116. For example, computer system 112 may utilize input from control system 106 in order to control steering unit 132 to avoid obstacles detected by sensor system 104 and obstacle avoidance system 144. In some embodiments, computer system 112 may be operable to provide control over many aspects of autonomous driving device 100 and its subsystems.

Optionally, one or more of the above components may be installed or associated separately from the autonomous driving device 100. For example, the memory 114 may be partially or completely separate from the autonomous driving device 100. The above components may be communicatively coupled together in a wired and/or wireless manner.

Optionally, the above components are only an example. In practical applications, the components in the above modules may be added or deleted according to actual needs. FIG. 1c should not be understood as a limitation on the embodiments of the present application.

Referring to FIG. 2a, the present embodiment provides a system architecture 200a. The system architecture includes a database 230a and a client device 240a. The data acquisition device 260a is used to collect data and store it in the database 230a. The training module 202a is based on the data maintained in the database 230a. The data generates the target model/rule 201a. The following will describe in more detail how the training module 202a obtains the target model/rule 201a based on the data. The target model/rule 201a is the first model mentioned in the following embodiments of the present application.

The calculation module 211a may include a training module 202a, and the target model/rule obtained by the training module 202a may be applied to different systems or devices. In FIG2a, the execution device 210a is configured with a transceiver 212a, which may be a wireless transceiver, an optical transceiver, or a wired interface (such as an I/O interface), etc., to interact with external devices for data, and a "user" may input data to the transceiver 212a through a client device 240a. For example, the client device 240a may send a target task to the execution device 210a, requesting the execution device to train a neural network, and send a database for training to the execution device 210a.

The execution device 210a can call data, codes, etc. in the data storage system 250a, and can also store data, instructions, etc. in the data storage system 250a.

The calculation module 211a uses the target model/rule 201a to process the input data. Specifically, the calculation module 211a is used to: first, obtain the first information and the second information, wherein the first information includes the information of the vehicles around the vehicle, and the second information includes the information of the lanes around the vehicle; then, input the first information and the second information into the first model to obtain the prediction information generated by the first model, wherein the prediction information includes the predicted position information of the vehicles around the vehicle within the first time.

Finally, the transceiver 212a returns the output of the neural network to the client device 240a. For example, the user can input a text to be converted into a sign language action through the client device 240a, and the neural network outputs the sign language action or the parameters representing the sign language action and feeds it back to the client device 240a.

More deeply, the training module 202a can obtain corresponding target models/rules 201a for different tasks based on different data to provide users with better results.

In the case shown in FIG. 2a, the data input into the execution device 210a can be determined based on the user's input data. For example, the user can operate in the interface provided by the transceiver 212a. In another case, the client device 240a can automatically input data into the transceiver 212a and obtain the result. If the automatic data input of the client device 240a requires the user's authorization, the user can set the corresponding authority in the client device 240a. The user can view the result output by the execution device 210a on the client device 240a, and the specific presentation form can be a specific method such as display, sound, action, etc. The client device 240a can also serve as a data collection terminal to store the collected data associated with the target task into the database 230a.

The training or updating process mentioned in the present application can be performed by the training module 202a. It is understandable that the training process of the neural network is to learn the way to control the spatial transformation, more specifically, to learn the weight matrix. The purpose of training the neural network is to make the output of the neural network as close to the expected value as possible. Therefore, the weight vector of each layer of the neural network in the neural network can be updated according to the difference between the predicted value and the expected value of the current network (of course, the weight vector can usually be initialized before the first update, that is, the parameters of each layer in the deep neural network are pre-configured). For example, if the predicted value of the network is too high, the value of the weight in the weight matrix is adjusted to reduce the predicted value, and after continuous adjustment, the value output by the neural network is close to or equal to the expected value. Specifically, the difference between the predicted value and the expected value of the neural network can be measured by a loss function or an objective function. Taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. The training of the neural network can be understood as a process of minimizing the loss as much as possible.

As shown in FIG. 2 a , a target model/rule 201 a is obtained through training according to the training module 202 a . In the embodiment of the present application, the target model/rule 201 a may be the first model in the present application.

In the training phase, the database 230a can be used to store sample sets for training. The execution device 210a generates a target model/rule 201a for processing samples, and iteratively trains the target model/rule 201a using the sample set in the database to obtain a mature target model/rule 201a, which is specifically represented by a neural network. The neural network obtained by the execution device 210a can be applied to different systems or devices.

In the inference phase, the execution device 210a can call the data, code, etc. in the data storage system 250a, or store the data, instructions, etc. in the data storage system 250a. The data storage system 250a can be placed in the execution device 210a, or the data storage system 250a can be an external memory relative to the execution device 210a. The calculation module 211a can process the samples obtained by the execution device 210a through the neural network to obtain the prediction result. The specific expression form of the prediction result is related to the function of the neural network.

It should be noted that FIG. 2a is only an exemplary schematic diagram of a system architecture provided in an embodiment of the present application, and the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 2a, the data storage system 250a is located relative to the execution device 210a is an external memory. In other scenarios, the data storage system 250a may also be placed in the execution device 210a.

The target model/rule 201a trained by the training module 202a can be applied to different systems or devices, such as mobile phones, tablet computers, laptops, augmented reality (AR)/virtual reality (VR), vehicle terminals, etc., and can also be servers or cloud devices.

Specifically, in a possible implementation, please refer to FIG. 2b, which is a flow chart of a method for obtaining the position of a vehicle provided in an embodiment of the present application. The method can be executed by an execution device 210a as shown in FIG. 2a. The method specifically includes: 201b, obtaining first information and second information, the first information including information about vehicles around the vehicle, and the second information including information about lanes around the vehicle; 202b, inputting the first information and the second information into a first model to obtain prediction information generated by the first model, the prediction information including predicted position information of vehicles around the vehicle within a first time.

The above is an introduction to the application architecture of the embodiment of the present application. Next, the vehicle position acquisition method provided in the embodiment of the present application is described in detail.

First of all, in order to better understand the solutions of the embodiments of the present application, the relevant terms and concepts that may be involved in the embodiments of the present application are introduced below.

(1) Neural Network

A neural network can be composed of neural units. Specifically, it can be understood as a neural network with an input layer, a hidden layer, and an output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in between are all hidden layers. Among them, a neural network with many hidden layers is called a deep neural network (DNN). The work of each layer in a neural network can be described by mathematical expressions. From a physical level, the work of each layer in a neural network can be understood as completing the transformation from input space to output space (i.e., the row space to column space of a matrix) through five operations on the input space (a set of input vectors). These five operations include: 1. Dimension increase/reduction; 2. Enlargement/reduction; 3. Rotation; 4. Translation; 5. "Bending". Among them, operations 1, 2, and 3 are completed by, operation 4 is completed by "+b", and operation 5 is implemented by "a()". The reason why the word "space" is used here is that the classified object is not a single thing, but a class of things. Space refers to the collection of all individuals of this type of thing. Among them, W is the weight matrix of each layer of the neural network. Each value in the matrix represents the weight value of a neuron in this layer. The matrix W determines the spatial transformation from the input space to the output space described above, that is, the W of each layer of the neural network controls how to transform the space. The purpose of training a neural network is to eventually obtain the weight matrices of all layers of the trained neural network. Therefore, the training process of a neural network is essentially learning how to control spatial transformation, more specifically learning the weight matrix.

(2) Convolutional Neural Network

Convolutional neural network (CNN) is a deep neural network with convolutional structure. Convolutional neural network contains a feature extractor composed of convolution layer and subsampling layer. The feature extractor can be regarded as a filter, and the convolution process can be regarded as convolving the same trainable filter with an input image or convolution feature plane (feature map). Convolution layer refers to the neuron layer in convolutional neural network that performs convolution processing on the input signal. In the convolution layer of convolutional neural network, a neuron can only be connected to some neurons in the adjacent layer. A convolution layer usually contains several feature planes, each of which can be composed of some rectangular arranged neural units. The neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of position. The implicit principle is that the statistical information of a part of the image is the same as that of other parts. This means that the image information learned in a part can also be used in another part. So for all positions on the image, the same learned image information can be used. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally speaking, the more convolution kernels there are, the richer the image information reflected by the convolution operation.

The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network. In addition, the direct benefit of shared weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

(3) Deep Neural Networks

Deep Neural Network (DNN), also known as multi-layer neural network, can be understood as a neural network with many hidden layers. There is no special metric for "many" here. According to the position of different layers in DNN, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in between is All are hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN looks complicated, the work of each layer is not complicated. In simple terms, it can be expressed as the following linear relationship: in, is the input vector, is the output vector, is the bias vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer simply performs such a simple operation on the input vector to obtain the output vector. Since the number of DNN layers is large, the coefficient W and the bias vector The number of these parameters is also very large. The definition of these parameters in DNN is as follows: Take the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the 4th neuron in the second layer to the 2nd neuron in the third layer is defined as The superscript 3 represents the layer number of coefficient W, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input.

In summary, the coefficients from the kth neuron in the L-1th layer to the jth neuron in the Lth layer are defined as

It should be noted that the input layer does not have a W parameter. In a deep neural network, more hidden layers allow the network to better describe complex situations in the real world. Theoretically, the more parameters a model has, the higher its complexity and the greater its "capacity", which means it can complete more complex learning tasks. Training a deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by many layers of vectors W).

(4) Loss function

In the process of training a neural network, because we want the output of the neural network to be as close as possible to the value we really want to predict, we can compare the predicted value of the current network with the target value we really want, and then update the weight matrix of each layer of the neural network according to the difference between the two (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the neural network). For example, if the predicted value of the network is high, adjust the weight matrix to make it predict lower, and keep adjusting until the neural network can predict the target value we really want. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function or objective function, which is an important equation for measuring the difference between the predicted value and the target value. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, so the training of the neural network becomes a process of minimizing this loss as much as possible. For example, in classification tasks, the loss function is used to characterize the gap between the predicted category and the true category, and the cross entropy loss function (cross entropy loss) is a commonly used loss function in classification tasks.

In the training process of the neural network, the error back propagation (BP) algorithm can be used to correct the size of the parameters in the initial neural network model, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the forward transmission of the input signal to the output will generate error loss, and the parameters in the initial neural network model are updated by back propagating the error loss information, so that the error loss converges. The back propagation algorithm is a back propagation movement dominated by error loss, which aims to obtain the optimal parameters of the neural network model, such as the weight matrix.

(5) Transformer structure

The Transformer structure is a feature extraction network that includes an encoder and a decoder (similar to a convolutional neural network).

Encoder: Performs feature learning in the global receptive field through self-attention, such as pixel features.

Decoder: Learn the features of the required modules, such as the features of the output box, through self-attention and cross-attention.

(6) Attention mechanism

The attention mechanism imitates the internal process of biological observation behavior, that is, a mechanism that aligns internal experience and external sensations to increase the observation precision of some areas, and can use limited attention resources to quickly filter out high-value information from a large amount of information. The attention mechanism can quickly extract important features of sparse data, and is therefore widely used in natural language processing tasks, especially machine translation. The self-attention mechanism is an improvement on the attention mechanism, which reduces dependence on external information and is better at capturing the internal correlation of data or features. The essential idea of the attention mechanism can be rewritten as the following formula:

The self-attention mechanism provides an effective modeling method to capture global context information through QKV. Assume that the input is Q (query), and the context is stored in the form of a key-value pair (K, V). Then, the attention mechanism is actually a mapping function from query to a series of key-value pairs (key, value). The essence of the attention function can be described as a mapping from a query to a series of (key-value) pairs. Attention essentially assigns a weight coefficient to each element in the sequence, which can also be understood as soft addressing. If each element in the sequence is stored in the form of (K, V), then attention completes the addressing by calculating the similarity between Q and K. The similarity calculated between Q and K reflects the importance of the retrieved V value, that is, the weight, and then the weighted sum is used to obtain the final eigenvalue.

The calculation of attention is mainly divided into three steps. The first step is to calculate the similarity between the query and each key to obtain the weight. Commonly used similarity functions include dot product, concatenation, perceptron, etc. Then the second step is generally to use a softmax function (on the one hand, it can be normalized, The probability distribution that the sum of all weight coefficients is 1 is obtained. On the other hand, the characteristics of the softmax function can be used to highlight the weights of important elements) to normalize these weights; finally, the weights and the corresponding key values are weighted and summed to obtain the final eigenvalues.

In addition, attention includes self-attention and cross-attention. Self-attention can be understood as a special attention, that is, the input of QKV is consistent. However, the input of QKV in cross-attention is inconsistent. Attention uses the similarity between features (such as inner product) as weight to integrate the queried features as the update value of the current feature. Self-attention is the attention extracted based on the attention of the feature map itself.

(7) Multi-Layer Perceptron (MLP)

Multilayer Perceptron, also known as Multilayer Perceptron, is a feedforward artificial neural network model. MLP is an artificial neural network (ANN) based on a fully connected (FC) forward structure, which contains artificial neurons ranging from a dozen to hundreds of thousands (AN, hereinafter referred to as neurons). MLP organizes neurons into a multi-layer structure, and uses a full connection method between layers to form an ANN with multi-weighted connection layers connected layer by layer. Its basic structure is shown in Figure 3. The fully connected layers of MLP that contain calculations are numbered from 1, and the total number of layers is L. The input layer number is set to 0, and the fully connected layers of MLP are divided into two categories: odd layers and even layers. Generally speaking, MLP contains an input layer (this layer does not actually contain calculations), one or more hidden layers, and an output layer.

(8) Features, labels, and samples

A feature is an input variable, i.e. the x variable in a simple linear regression. A simple machine learning task might use a single feature, while a more complex machine learning task might use millions of features.

The label is the y variable in simple linear regression, and the label can include multiple meanings. In some embodiments of the present application, the label can refer to the classification category of the input data. By labeling each of the different categories of input data, the label is used to indicate to the computing device the specific information represented by the data. Therefore, labeling the data is to tell the computing device what the multiple features of the input variable describe (i.e., y). y can be called a label or a target (i.e., a target value).

A sample refers to a specific instance of data. A sample x represents an object. Sample x is usually represented by a feature vector x=(x1,x2,…,xd)∈Rd, where d represents the dimension of sample x (i.e. the number of features). Samples are divided into labeled samples and unlabeled samples. Labeled samples contain both features and labels, while unlabeled samples contain features but not labels. The task of machine learning is often to learn the potential patterns in the input d-dimensional training sample set (which can be simply referred to as the training set).

(9) Back propagation algorithm

(10) Backbone

A network structure used to extract features from input information in detectors, segmenters, or classifiers. Usually, in addition to the backbone network, neural networks can also include other functional networks, such as region proposal networks (RPNs) and feature pyramid networks (FPNs), which are used to further process the features extracted by the backbone network, such as identifying the classification of features and performing semantic segmentation on features.

(11) Matrix multiplication operation (MatMul)

Matrix multiplication is a binary operation that obtains a third matrix from two matrices. The third matrix is the product of the first two, and is also commonly called the matrix product. Matrices can be used to represent linear mappings, and matrix products can be used to represent the composition of linear mappings.

(12) Normalization function

The normalized (softmax) function, also known as the normalized exponential function, is a generalization of the logistic function. The softmax function can transform a K-dimensional vector Z containing any real number into another K-dimensional vector σ(Z), so that each element of the transformed vector σ(Z) is between (0, 1) and the sum of all elements is 1. The calculation method of the softmax function can be shown in Formula 1.

Among them, σ(Z) _j represents the value of the jth element in the vector after the softmax function transformation, and Z _j represents the value of the jth element in the vector Z. value, Z _k represents the value of the kth element in vector Z, and ∑ represents the sum.

(13) Embedding layer

The embedding layer may be referred to as an input embedding layer. The current input may be a text input, for example, a paragraph of text or a sentence. The text may be a Chinese text, an English text, or a text in another language. After obtaining the current input, the embedding layer may embed each word in the current input, and obtain a feature vector of each word. In some embodiments, the embedding layer includes an input embedding layer and a positional encoding layer. In the input embedding layer, each word in the current input may be subjected to word embedding processing, thereby obtaining a word embedding vector of each word. In the positional encoding layer, the position of each word in the current input may be obtained, and then a position vector may be generated for the position of each word. In some examples, the position of each word may be the absolute position of each word in the current input. When the word embedding vector and the position vector of each word in the current input are obtained, the position vector of each word and the corresponding word embedding vector may be combined to obtain a feature vector of each word, that is, to obtain multiple feature vectors corresponding to the current input. Multiple feature vectors may be represented as embedding vectors with preset dimensions. The number of feature vectors in the multiple feature vectors may be set to M, and the preset dimension may be H, so that the multiple feature vectors may be represented as M×H embedding vectors.

The following is a description of the vehicle location acquisition method provided in an embodiment of the present application. The method can be executed by the vehicle's location acquisition device, or by a component of the vehicle's location acquisition device (such as a processor, a chip, or a chip system, etc.). The vehicle's location acquisition device can be a cloud device, or it can be a vehicle or terminal device (such as a vehicle-mounted terminal, an aircraft terminal, etc.). Of course, the method can also be executed by a system consisting of a cloud device and a vehicle. Optionally, the method can be processed by a CPU in the vehicle's location acquisition device, or it can be processed jointly by a CPU and a GPU, or it can be processed without a GPU, and other processors suitable for neural network calculations can be used, which is not limited by the present application. The application scenario of the method can be used in intelligent driving scenarios.

In combination with the above description, the specific implementation process of the reasoning phase and the training phase of the vehicle location method provided in the embodiment of the present application will be described below.

1. Reasoning Stage

In the embodiment of the present application, the reasoning stage describes the process of how the execution device 210a uses the target model/rule 201a to process the collected information data to generate a prediction result. Please refer to Figure 4 for details. Figure 4 is another flow chart of the vehicle position acquisition method provided in the embodiment of the present application. Figure 4 takes the embodiment of the present application applied to the field of autonomous driving as an example for explanation. The method may include steps 401 to 403.

401. An execution device obtains first information and second information.

In the embodiment of the present application, the first information includes information about vehicles around the vehicle, the second information includes information about lanes around the vehicle, and the execution device obtains information about vehicles and lanes around the vehicle. Specifically, in some application scenarios, the execution device may be the vehicle, and the vehicle may directly collect information about vehicles and lanes around the vehicle through a collection device, such as a camera device, a radar device, etc. The execution device may also receive information sent by other external devices, or select information from a database, etc., which are not specifically limited here.

In one implementation, when the vehicle is driving, in order to accurately predict whether other vehicles around it will affect the driving safety of the vehicle, whether it will affect the driving decision of the vehicle, and how to control the driving strategy of the vehicle based on the surrounding vehicles, it is necessary to determine the driving intention of at least one associated vehicle located around the vehicle. The target vehicle in the embodiment of the present application is any one of the at least one associated vehicle located around the vehicle.

It should be understood that the above-mentioned "associated vehicles" can be understood as vehicles that are within a certain preset range of distance from the own vehicle, that is, based on the distance, it is determined which vehicles are associated with the own vehicle, and then these associated vehicles are regarded as associated vehicles of the own vehicle; in addition, "associated vehicles" can also be understood as vehicles that will affect the driving state decision of the own vehicle in the future, that is, based on whether it will affect the driving strategy of the own vehicle in the future, it is determined which vehicles are associated with the own vehicle, and then these associated vehicles are regarded as associated vehicles of the own vehicle.

In one implementation, the processor of the vehicle can control the relevant sensors on the vehicle to obtain vehicle information and lane information of surrounding vehicles based on the software code related to step 401 in the memory 114, and determine which vehicles are associated vehicles based on the acquired information, that is, determine which vehicles need to be predicted for intention.

Alternatively, the above process of determining the target vehicle can be determined by other vehicles or a server on the cloud side, which is not limited here.

In order to clearly predict the target vehicle's future driving intention, it is necessary to obtain the target vehicle's vehicle information, lane information, etc., which can be used as a basis for predicting the target vehicle's driving intention. Among them, the target vehicle's position can be the target vehicle's position in the map. The absolute position of the target vehicle can also be the relative position between the target vehicle and the own vehicle. The absolute position of the target vehicle can be determined based on the absolute position of the own vehicle and the relative position between the target vehicle and the own vehicle.

Taking the associated vehicle as the target vehicle as an example, in the embodiment of the present application, the driving status information of the target vehicle can be obtained, where the driving status information may include the position of the target vehicle. Specifically, the position of the target vehicle can be sensed by the sensor carried by the vehicle itself, or the position of the target vehicle can be obtained through interaction with other vehicles and servers on the cloud side.

Taking the associated vehicle as the target vehicle as an example, in one implementation, the position of the target vehicle can be acquired in real time, or the position of the target vehicle can be acquired once at a certain interval.

In a possible implementation, after obtaining the first information and the second information, the first information and the second information need to be preprocessed and labeled, and the data after preprocessing and labeling can be used as input data for the first model. Among them, preprocessing includes basic data processing operations such as data outlier processing, which will not be repeated here.

Exemplarily, the main steps of acquiring and processing the first information and the second information are as follows:

(1) Collecting first information and second information.

The first information includes the information of the vehicles around the vehicle. The vehicle information includes 8 feature data, namely the vehicle's horizontal coordinate, vertical coordinate, type, length, width, height, current speed, and direction of speed (i.e. the direction in which the vehicle is moving). The specific collection method can be: obtain a frame of vector data every 0.2s, obtain a total of ten frames of vector data within the historical 2s, and add the vector data of the current frame to obtain a total of 11 frames of vector data, wherein each frame of vector data collected includes 8 features. Assuming that only 64 data are taken, the data of the targets that are farther away are selected according to the distance between the vehicles around the vehicle and the vehicle itself, and the data of the targets that are farther away are deleted. The specific data of the collected vehicles can be found in Table 1 below.

The second information includes the information of the lanes around the vehicle. The lane information includes 8 feature data. Take 20 waypoints on each lane around the vehicle, and each waypoint corresponds to 8 feature data, which are the horizontal and vertical coordinates of the waypoint, the type of lane (such as non-motorized vehicle lane and motor vehicle lane), whether the lane can go straight, whether the lane can turn left, whether the lane can turn right, whether the lane can turn around, and the lane number. The specific collection method can be: collect the features of all lanes within 200 meters around the vehicle, take 20 waypoints for each lane, and take 8 features for each waypoint. Assuming that only 256 lanes of data are taken, they are sorted according to the distance between the lanes around the vehicle and the vehicle, and the data of targets that are farther away are deleted. In one implementation method, the waypoints on the lane can be selected by sampling the farthest point. The collected lane data can be specifically referred to in Table 1 below.

Table 1

It is understandable that, in actual operation, the characteristic data contained in the information of vehicles around the vehicle and the lane information can be set according to actual needs, and no limitation is made here. In addition, in this embodiment, only the positions of vehicles around the vehicle are predicted, so only the information of vehicles around the vehicle is collected. In real life, obstacles such as non-motor vehicles and pedestrians are also included around the vehicle. On this basis, the information of non-motor vehicles and pedestrians around the vehicle can also be collected, and the positions of non-motor vehicles and pedestrians can be predicted by using the position acquisition method provided in the embodiment of the present application.

(2) Labeling

In the embodiment of the present application, the trajectory of the vehicle in the first time in the future is actually obtained, and the trajectory data is labeled for use in the subsequent training stage. The category labels include:

Track label: indicates the actual driving track of the target vehicle in the first time. For example, the actual driving track in the next 3 seconds, the position of the target vehicle is collected every 0.2 seconds, a total of 15 points, each position includes x and y coordinates, and the output is (15,2) data.

Intersection label: Indicates the exit lane selected by the target vehicle when leaving the intersection.

Non-intersection label: Indicates the non-exit lane where the target vehicle is located at the first time.

402, the execution device inputs the first information and the second information into the first model to obtain prediction information generated by the first model, where the prediction information includes predicted position information of vehicles around the vehicle within the first time.

In an embodiment of the present application, after the execution device obtains the information of vehicles and lanes around the vehicle, it can input this information into the first model, so as to predict the predicted position information of any vehicle around the vehicle through the first model based on the information of any vehicle and lane around the vehicle.

In one implementation, the first model includes an encoder and a decoder based on an attention mechanism. Please refer to Figure 5, which is a schematic diagram of the structure of the first model provided in an embodiment of the present application. In Figure 5, the execution device inputs the acquired first information and second information into the first model, and outputs prediction information based on the structure of the encoder and decoder of the attention mechanism.

The specific implementation process of the embodiment of the present application is described below:

First, the specific content of the prediction information is described.

In one implementation, the predicted position information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.

In this implementation, the prediction information obtained by the execution device from the first model may include two aspects: one aspect refers to the trajectory information of the vehicles around the vehicle in the first time in the future, and the second aspect refers to the third information, which indicates the lanes where the vehicles around the vehicle are located in the first time in the future. In one implementation, the third information includes the correlation between the target vehicle around the vehicle and at least one lane around the vehicle in the first time, and the target vehicle is a vehicle around the vehicle.

It is understandable that the execution device can simultaneously collect information about several vehicles around the vehicle and output prediction information about several vehicles. When the prediction information of a target vehicle is needed, it can be directly obtained from the prediction information of several vehicles. The correlation described in the third information specifically refers to the attention score of the target vehicle and the lanes around the vehicle in the first time. By inputting the first information and the second information into the first model, the attention score of the vehicles around the vehicle relative to each lane around the vehicle can be obtained based on the relevant operations of the attention mechanism.

Next, the specific process of generating prediction information by the first model based on the first information and the second information is described.

Please refer to FIG. 6 , which is another structural schematic diagram of the first model provided in an embodiment of the present application.

In one implementation, the encoder in the first model includes an embedding module and an attention module, and the decoder includes a first decoder module and a second decoder module. Each module is described in detail below.

1. Encoder

The encoder consists of an embedding module and an attention module.

(1) Embedded module

The embedded modules include a first embedded module and a second embedded module.

In an embodiment of the present application, the execution device inputs the first information and the second information into an embedding module, and obtains three different weight matrices, namely, matrix Q, matrix K and matrix V, after embedding processing is performed on the input sequence.

In one implementation, the embedding module processes the first information through a first embedding module, and the first embedding module includes three submodules, namely a first submodule, a second submodule and a third submodule.

In this implementation, specifically, please refer to Figure 7, which is a structural diagram of the first embedding module provided in an embodiment of the present application. The first submodule specifically includes a two-dimensional convolution layer Conv2d1, a two-dimensional batch normalization layer BatchNorm2d, an activation function layer ReLU, and a two-dimensional convolution layer Conv2d2. The second submodule and the third submodule have the same composition, both including a one-dimensional convolution layer Conv1d1, a one-dimensional batch normalization layer BatchNorm1d, an activation function layer ReLU, and a one-dimensional convolution layer Conv1d2. Among them, the convolution kernel size kernel_size of the convolution layer Conv2d is (1,1), and the step size stride and zero padding are both default values. ReLU(x) is a nonlinear activation function, and its specific form is: ReLU(x) = max(0,x). Among them, x represents the input variable of the function.

Exemplarily, in combination with the content exemplified in Table 1 above, the processing process of the first embedded module is specifically as follows:

The vehicle data (16, 64, 11, 8), the position data (16, 64, 2) and the time data (16, 11) are obtained from the first information, and the vehicle data, the position data and the time data are respectively input into the first submodule, the second submodule and the third submodule, and the data of (16, 64, 11, 256), (16, 64, 1, 256) and (16, 1, 11, 256) are respectively output. Subsequently, the three data are fused by the embedding method, and the data of (16, 64, 11, 256) is output, that is, the data form of the matrix Q is (16, 64, 11, 256).

Among them, 16 represents the batch size, 64 represents the number of vehicles around the vehicle (i.e. the number of data in Table 1), and 11 represents the number of collected The time or number of data, 2 represents the position of the vehicle (i.e., the horizontal and vertical coordinates shown in Table 1). In layman's terms, the information obtained from the 64 vehicles around the vehicle is the first information. Taking any vehicle as an example, the data of the vehicle is obtained at 11 intervals and 11 times, and each acquired data includes a number of feature data (corresponding to the 8 features in Table 1).

It can be understood that the characteristic data corresponding to the vehicles around the vehicle can be set according to actual needs or experiments. This is only an example and not a limitation.

In one implementation, the embedding module processes the second information through a second embedding module, and the second embedding module includes a fourth submodule.

In this implementation, specifically, please refer to FIG8, which is a schematic diagram of the structure of the second embedding module provided in an embodiment of the present application. The fourth submodule specifically includes a two-dimensional convolution layer Conv2d1, a two-dimensional batch normalization layer BatchNorm2d, an activation function layer ReLU, and a two-dimensional convolution layer Conv2d2. Among them, the convolution kernel size kernel_size of the convolution layer Conv2d is (1,1), and the stride and zero padding are both default values.

Exemplarily, the processing of the second embedding module is specifically described in combination with the examples in Table 1 above. The data format of the second information is (16, 256, 20, 8), and the data is input into the fourth submodule to output (16, 256, 11, 256) data as the data format of matrix K and then matrix V. Corresponding to Table 1 above, it can be seen that 16 refers to batch size, 256 represents the number of lanes, 20 represents 20 waypoint data on each lane, and 8 represents 8 feature data corresponding to each waypoint data. In layman's terms, the information taken from the 256 lanes around the car is the second information. Taking any lane as an example, 20 waypoint data on the lane are taken, and each waypoint data includes 8 feature data.

It can be understood that the lane data around the vehicle can be set according to actual needs or tests, and is only used as an example and is not limited here.

It should be noted that Embedding is a mapping method, and the convolution and RELU activation function method used in the embedding module is only one of the implementation methods. In actual operation, other general embedding methods can also be used to embed the first information and the second information.

(2) Attention Module

The calculation of the attention module is mainly divided into three steps. The first step is to calculate the similarity between the query and each key to obtain the weight. Commonly used similarity functions include dot product, concatenation, perceptron, etc.; the second step is to use the normalized softmax function to normalize these weights; the third step is to perform weighted summation of the weight and the corresponding key value to obtain the final eigenvalue.

It is understandable that the main functions of the normalization function include: on the one hand, the probability distribution of the sum of all weight coefficients being 1 can be obtained through normalization. Through the normalization function, the score is converted into a matrix with values distributed between 0 and 1, and the result is the relevance of each lane to the current vehicle; on the other hand, the weights of important elements can be more highlighted through the inherent mechanism of softmax. In addition, normalization can stabilize the gradient during training.

In one implementation, the first information and the second information are input into the first model, and based on the attention mechanism, the fourth information is generated. The fourth information includes the correlation between the target vehicle around the own vehicle and the first lane set within the first time. The target vehicle is a vehicle around the own vehicle, and the first lane set includes all lanes around the own vehicle included in the second information.

Combined with the contents of Figures 6 to 8, in this implementation, matrix Q, matrix K and matrix V are used as inputs of the attention module. Optionally, matrix Q, matrix K and matrix V are linearly mapped respectively to obtain a first linear matrix and a second linear matrix. Then, the first linear matrix is used as matrix Q, the second linear matrix is used as matrix K and matrix V, and matrix Q, matrix K and matrix V are used as inputs of the attention module. The correlation between each two input vectors is calculated through matrix Q and matrix K, that is, the correlation between the target vehicles around the vehicle and the first lane set in the first time, that is, the attention score, and the fourth information is output. Subsequently, optionally, by performing a matrix multiplication operation on the second linear matrix and the fourth information, the sixth information is obtained, and the sixth information includes the predicted trajectory information of the target vehicles around the vehicle.

It is understandable that in the embodiment of the present application, the main structure of the first model is an encoder-decoder structure based on the attention mechanism. In the actual operation process, in order to improve the accuracy of the prediction results and the safety of the execution device, the prediction can be made based on the multi-attention mechanism through the redundancy of the first model. In addition, for those skilled in the art, on the premise of realizing the function of the first model, other neural networks can also be used to replace the first model. The specific structure and composition of the first model are only illustrated here by way of example and are not limited.

2. Decoder

The decoder includes a first decoder module and a second decoder module.

(1) First decoder module

The first decoder module is mainly used to process the predicted lane information around the vehicle.

In one implementation, the main working process of the first decoder module is:

The fifth information is obtained from the fourth information, and the third information is generated based on the fifth information. The fifth information includes the correlation between the target vehicle and the second lane set within the first time, and the third information includes the correlation between the target vehicle around the own vehicle and at least one lane around the own vehicle within the first time.

In this implementation, after obtaining the fourth information, that is, the attention score of the target vehicle relative to each lane around the vehicle, some data is filtered out through the scene where the target vehicle is located to improve the accuracy of the prediction result. For example, please refer to Figure 9, which is a structural diagram of the first decoder module provided in an embodiment of the present application. Assuming that the road scene to which the target vehicle belongs is an intersection scene, it means that the vehicle will travel on the intersection lane in the first time in the future, then the lanes in the non-intersection scene can be eliminated. Therefore, the second lane set can be selected from the first lane set, and the attention scores of the target vehicle relative to all non-intersection lanes are set to the minimum value. After the second lane set is filtered out, the fifth information is obtained from the fourth information, that is, the attention score of the target vehicle relative to each filtered lane.

It is understandable that the category of the road scene to which the target vehicle belongs can be directly obtained through a map or through other methods, which are not limited here.

In one implementation, a specific process of generating the third information according to the fifth information may include:

A normalization operation is performed on the fifth information to obtain normalized fifth information, and then the normalized fifth information is input into a multi-layer perceptron to obtain the third information.

In this implementation, after the fifth information is selected from the fourth information according to the scene requirements, the fifth information is normalized so that the range of each attention score in the normalized fifth information is between (0, 1), and the sum of all elements is 1. Subsequently, the normalized fifth information is input into a multi-layer perceptron, under the action of the perceptron, the correlation between the target vehicle and the lanes around the vehicle in the first time is output, and the intended lane and probability of the other vehicle in the future are given.

(2) Second decoder module

The second decoder module is mainly used to process the predicted trajectory information of target vehicles around the vehicle.

In one implementation, the main working process of the second decoder module is:

The sixth information is input into a multi-layer perceptron to obtain predicted trajectory information of vehicles around the vehicle within the first time.

In this implementation, the second decoder module includes an MLP, and the sixth information is input into the second decoder module to output the trajectory of the target vehicle within the first future time. For example, assuming that the driving trajectory of the next 3 seconds is output, and one point is output every 0.2 seconds, the coordinates of a total of 15 points can be output.

It is understandable that in the embodiment of the present application, the second decoder module can use a variety of methods to obtain the predicted trajectory of the vehicle. The above method is only an example and is not limited here.

403, the execution device determines the first lane as the lane where the target vehicle is located in the first time, wherein the first lane is a lane with the highest correlation with the target vehicle in the first time among at least one lane around the vehicle.

In the embodiment of the present application, after the first information and the second information are input into the first model, the output of the first model is the predicted position information of the vehicles around the vehicle within the first time and the third information, wherein the third information includes the correlation between the target vehicle around the vehicle and at least one lane around the vehicle within the first time, that is, the attention score corresponding to the target vehicle and each lane, or it can be understood as the probability of the target vehicle driving in each lane in the future. On this basis, in one implementation method, the execution device can select the lane with the highest attention score from the attention scores corresponding to the multiple lanes as the predicted lane where the target vehicle is located within the first time.

It can be understood that step 403 is an optional step. In step 402, the third information included in the prediction information generated by the first model indicates the lanes where the vehicles around the vehicle are located in the first time. How to further determine the lanes where the vehicles around the vehicle are located in the first time through the third information can be expanded into multiple implementation methods. In one implementation, the execution device uses the lane with the highest correlation as the lane where the vehicles around the vehicle are located in the first time. In actual application, the attention can be The force score is used as one of the prediction bases, and is combined with other features or methods to predict the position, which is not limited here.

In an embodiment of the present application, different numbers of lanes can be selected as environmental features to be considered according to the actual situation of the target vehicle, and the target vehicle trajectory can be combined with the surrounding lane features to form an attention feature by adopting an attention mechanism, so that the network can pay attention to the characteristics of the surrounding lanes when learning, making the vehicle trajectory prediction results more in line with actual driving rules and improving the accuracy of the prediction results.

2. Training Phase

In the embodiment of the present application, the training stage describes the process of how the training device 220 generates a mature neural network using the image data set in the database 230a. Specifically, please refer to FIG. 10, which is a flow chart of the training method of the model provided in the embodiment of the present application. The training method of the model provided in the embodiment of the present application may include:

1001. A training device obtains first information and second information, where the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle.

In the embodiment of the present application, the training device obtains a data set of the first information and the second information, divides the data set into a training set, a validation set, and a test set, trains the model with the training set, adjusts the parameters with the validation set, and evaluates the performance with the test set. The data division ratio of the training set, validation set, and test set can be set according to actual needs and is not limited here.

In the embodiment of the present application, the specific implementation method of the training device executing step 1001 can refer to the description of the specific implementation method of step 401 in the embodiment corresponding to Figure 4, which will not be repeated here.

It can be understood that when training the first model, the training samples used include complete information about vehicles around the vehicle and complete information about lanes around the vehicle, so that the position information output by the first model is also more accurate.

1002. The training device inputs the first information and the second information into the first model to obtain prediction information generated by the first model, where the prediction information includes predicted position information of vehicles around the vehicle within the first time.

In an embodiment of the present application, the training device inputs the acquired first information and second information into the first model to obtain prediction information generated by the first model. In one implementation, the prediction information includes the predicted trajectory information of vehicles around the vehicle within the first time and third information, and the third information includes the correlation between the target vehicle around the vehicle and at least one lane around the vehicle within the first time. Referring to the above embodiment of the present application, the first model includes an encoder and a decoder, and the decoder includes a first decoder module and a second decoder module, wherein after the training device inputs the first information and the second information into the encoder of the first model, it will output the third information through the first decoder module, and output the predicted trajectory information of the vehicles around the vehicle within the first time through the second decoder module.

In the embodiment of the present application, the specific implementation method of the training device executing step 1302 can refer to the description of the specific implementation method of step 402 in the embodiment corresponding to Figure 4, which will not be repeated here.

1003. The training device trains the first model according to a loss function, where the loss function indicates the similarity between the predicted information and correct information, and the correct information includes correct position information of vehicles around the vehicle within the first time.

In the embodiment of the present application, the training device is pre-configured with training data, and the training data includes expected results corresponding to the information of vehicles and lanes around the vehicle. After obtaining the prediction results corresponding to the information of vehicles and lanes around the vehicle, the training device can calculate the function value of the target loss function according to the prediction results and the expected results, and update the parameter value of the model to be trained according to the function value of the target loss function and the back propagation algorithm to complete one training of the model to be trained.

Among them, the "model to be trained" can also be understood as the "target model to be trained". The meaning represented by the "expected result corresponding to the information and lane information of the vehicles around the vehicle" is similar to the meaning of the "prediction result corresponding to the information and lane information of the vehicles around the vehicle", the difference is that the "prediction result corresponding to the information and lane information of the vehicles around the vehicle" is the prediction result generated by the model to be trained, and the "expected result corresponding to the information and lane information of the vehicles around the vehicle" is the correct result corresponding to the information and lane information of the vehicles around the vehicle. As an example, for example, when the model to be processed is used to perform a target detection task, the prediction result is used to indicate the expected position of at least one object in the target environment, and the expected result is used to indicate the expected position (also referred to as the correct position) of at least one object in the target environment. It should be understood that the examples here are only for the convenience of understanding this solution, and are not used to exhaustively enumerate the meanings of the expected results in various application scenarios.

The training device can repeat steps 1001 to 1003 multiple times to achieve iterative training of the model to be trained until the preset conditions are met and the trained model to be trained is obtained, wherein the preset conditions can be the convergence conditions for reaching the target loss function, or the number of iterations of steps 1001 to 1003 reaches a preset number.

In the embodiments of the present application, not only the specific implementation of the reasoning process of the model is provided, but also the specific implementation of the training process of the model is provided. The present method expands the application scenarios of this solution.

The complete process of training the first model by the training device according to the loss function is described in detail below.

1. The training device collects the data set, obtains the required original data set and its corresponding category labels, and divides the training set, validation set, and test set into proportional quantities, which are used for subsequent training, validation, and evaluation of the model.

2. The training device builds the first model based on the attention mechanism.

3. The training device inputs the data of the training set into the first model, trains the first model using the first loss function and the second loss function, updates the recognition model through the back propagation algorithm, and uses the data of the verification set to screen out the optimal first model.

(1) Determine the loss function

In one implementation, the loss function used by the training device includes a first loss function and a second loss function.

In this implementation, since the prediction information output by the first model includes the predicted trajectory information of the vehicles around the vehicle within the first time and the correlation between the target vehicle around the vehicle and at least one lane around the vehicle within the first time, the loss value between the predicted trajectory information of the vehicles around the vehicle within the first time output by the first model and the correct trajectory information is calculated by the first loss function, and the loss value between the correlation between at least one lane around the vehicle within the first time and the correct information is calculated by the second loss function.

Exemplarily, the formula of the first loss function is specifically:

Among them, l _n represents the loss value between the predicted coordinates and the real coordinates of the target vehicle corresponding to the nth sample, x _n represents the vector data of the predicted coordinates of the target vehicle corresponding to the nth sample, y _n represents the vector data of the real coordinates of the target vehicle corresponding to the nth sample, and beta represents the error threshold. For example, the coordinates of the target vehicle in the next 3 seconds are collected every 0.2 seconds, and the position coordinates of 15 points can be obtained, and each sampling point corresponds to one sample.

The formula of the second loss function is:

Among them, l _n is the loss value loss corresponding to the nth sample, x _n is the predicted lane where the nth sample is located in the first time, y _n represents the actual lane where the nth sample is located in the first time, sample represents the vehicles around the vehicle, and w represents the weight.

It can be understood that since the third information output by the first model includes the correlation between the target vehicle around the own vehicle and at least one lane around the own vehicle within the first time, it is not convenient to obtain the true value of the attention score during the actual model training process. Therefore, based on the third information output by the first model, the lane with the highest correlation with the target vehicle within the first time among the at least one lane around the own vehicle can be used as the lane where the target vehicle is located within the first time, and the predicted lane information of the target lane can be compared with the actual lane information to train the model.

(2) The training device trains the first model according to the loss function

After the loss function is determined, the training device trains the first model using the training set, verifies it on the validation set, and saves the network model parameters that perform best on the validation set.

In one implementation, the specific process of the training device training the first model according to the loss function is:

(1) Based on the back propagation algorithm, the first loss function is used to train the first model, and after the training is completed, the model with the smallest first loss value is saved.

(2) Based on the back propagation algorithm, the model obtained in (1) is trained using the second loss function, and after the training is completed, the model with the smallest second loss value is saved.

It is understandable that during the training of the first model, the training device can update the parameters of the first model through the error back propagation algorithm. Simply put, the training device can correct the size of the parameters in the initial denoising classification network during the training of the first model through the error back propagation algorithm, so that the error loss becomes smaller and smaller. Specifically, the forward transmission of the input signal until the output will generate error loss, and the parameters in the initial first model are updated by back propagating the error loss information, so that the error loss converges. The back propagation algorithm is a back propagation movement dominated by error loss, which aims to obtain the optimal parameters of the neural network model, such as the weight matrix.

4. The training device uses the data of the test set to test the prediction performance of the first model and obtain the final model recognition accuracy. When the model recognition accuracy reaches the set threshold, the data to be predicted is input into the first model for recognition; otherwise, it returns to the third step until the model recognition accuracy reaches the set threshold. The accuracy reaches the set threshold.

It can be understood that since the prediction information output by the first model includes two parts, namely the predicted trajectory information of the target vehicle and the attention score of the target vehicle relative to the lane, the accuracy evaluation is performed on these two parts respectively.

In one implementation, the formula for evaluating the accuracy of the predicted trajectory information is specifically:

Among them, FDE (Final displacement error) measures the error of the step length in the future period of time (Euclidean distance, in meters), and MR (Miss Rate) represents the ratio of the inaccurate predicted trajectory information output by the first model, that is, the ratio of the distance between the predicted trajectory information output by the first model and each sampling point in the actually measured trajectory information that exceeds the tolerance distance. N represents the batch-size, which corresponds to 256 in Table 1, n represents the number of points in each trajectory, x and y are the horizontal and vertical coordinates of the point, respectively. They represent the true values of the horizontal and vertical coordinates corresponding to the j-th point of the collected points n for each trajectory (that is, the values collected in the trajectory label), _xnj and _ynj represent the predicted values of the horizontal and vertical coordinates corresponding to the j-th point of the collected points n for each trajectory, dist_threshold refers to the tolerance distance, which can be set to 1.5 meters; valid_num represents the number of valid data collected.

In one implementation, if the road scene to which the target vehicle belongs is an intersection scene, the accuracy of the intersection lane intention is evaluated by the following formula:

Acc _exit represents the accuracy of the exit lane. The lane where the target vehicle leaves the intersection scene is called the exit lane. exit_lane_right_num represents the number of real lanes where the target vehicle will be located in the first time in the future and the predicted lanes. The real lane where the target vehicle will be located in the first time in the future can be obtained from the intersection label. valid_exit_lane_num represents the number of valid data collected.

In one implementation, if the road scene to which the target vehicle belongs is a non-intersection scene, in the non-intersection scene, the vehicle has two behaviors during driving: changing lanes or not changing lanes.

The accuracy of vehicle lane changing is evaluated by the following formula:

Among them, Acc _cutin represents the accuracy of the target vehicle changing lanes in the first time, cutin_right_num represents that the target vehicle actually changes lanes in the first time in the future, and the vehicle lane change is predicted by the first model, and valid_cutin_num represents the number of valid data collected.

The accuracy of the vehicle not changing lanes is evaluated by the following formula:

Acc _keep indicates the accuracy of the target vehicle not changing lanes in the first time, that is, the accuracy of lane change false alarms. keep_right_num indicates the number of real lanes where the target vehicle is located in the first time in the future that are consistent with the predicted lanes. The real lane where the target vehicle is located in the first time in the future can be obtained from the non-intersection label. valid_keep_num indicates the number of valid data collected.

On the basis of the embodiments corresponding to FIG. 1a to FIG. 10 , in order to better implement the above-mentioned solution of the embodiment of the present application, the following also provides related devices for implementing the above-mentioned solution. Specifically, refer to FIG. 11 , which is a structural schematic diagram of a vehicle position acquisition device provided in the embodiment of the present application. The vehicle position acquisition device 1100 may include:

The acquisition module 1101 is used to acquire first information and second information, where the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle.

The position prediction module 1102 is used to input the first information and the second information into the first model to obtain prediction information generated by the first model, where the prediction information includes the predicted position information of vehicles around the vehicle within the first time.

Among them, the specific description of the acquisition module 1101 and the position prediction module 1102 can refer to the description of step 401 to step 402 in the above embodiment, and will not be repeated here.

In one implementation, the prediction information includes predicted trajectory information of vehicles around the vehicle within a first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.

In one implementation, the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within a first time, the target vehicle is a vehicle around the vehicle, and the vehicle position acquisition device 1100 further includes:

In one implementation, the first model is constructed based on the attention mechanism, and the position prediction module 1102 is specifically used to:

In one implementation, the location prediction module 1102 is specifically used for:

In a possible implementation manner of the third aspect, the location prediction module 1102 is specifically configured to:

It should be noted that the information interaction, execution process, etc. between the modules/units in the vehicle position acquisition device 1100 are based on the same concept as the various method embodiments corresponding to Figure 4 in the present application. The specific contents can be found in the description of the method embodiments shown in the previous part of the present application, and will not be repeated here.

The present application embodiment further provides a model training device. Please refer to FIG. 12 , which is a schematic diagram of a structure of the model training device provided in the present application embodiment. The model training device 1200 may include:

The acquisition module 1201 is used to acquire first information and second information, where the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle.

The position prediction module 1202 is used to input the first information and the second information into the first model to obtain prediction information generated by the first model, where the prediction information includes the predicted position information of vehicles around the vehicle within the first time.

The model training module 1203 is used to train the first model according to the loss function, where the loss function indicates the similarity between the predicted information and the correct information, and the correct information includes the correct position information of the vehicles around the vehicle within the first time.

Among them, the specific description of the acquisition module 1201, the position prediction module 1202 and the model training module 1203 can refer to the description of steps 1001 to 1003 in the above embodiment, and will not be repeated here.

In one implementation, the prediction information includes predicted trajectory information of vehicles around the vehicle within the first time and third information, where the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.

In one implementation, the third information includes the relationship between the target vehicle around the vehicle and at least one lane around the vehicle within the first time. The target vehicle is a vehicle around the vehicle, and the model training device 1200 also includes:

In this implementation, since the third information output by the first model includes the correlation between the target vehicle around the own vehicle and at least one lane around the own vehicle within the first time, it is not convenient to obtain the correct value of the attention score during the actual model training process. Therefore, based on the third information output by the first model, the lane with the highest correlation with the target vehicle within the first time among the at least one lane around the own vehicle can be used as the lane where the target vehicle is located within the first time, and the predicted lane information of the target lane is compared with the actual lane information to train the model.

It can be understood that, in one implementation, during the actual operation, after obtaining the prediction information through the position prediction module 1202, the lane in which the target vehicle is located in the first time can be determined by the lane determination module first, and then the first model can be trained through the model training module 1203; or, in one implementation, during the actual operation, if the correlation between the target vehicle around the vehicle and at least one lane around the vehicle can be correctly measured within the first time, the first model can be trained directly according to the error between the correlations, and there is no need to execute the lane determination module. Corresponding to the actual application scenario, whether the training device needs to use the lane determination module can be set according to actual needs and is not limited here.

In one implementation, the first model is constructed based on the attention mechanism, and the position prediction module 1202 is specifically used to:

In one implementation, the location prediction module 1202 is specifically used for:

It should be noted that the information interaction, execution process, etc. between the modules/units in the model training device 1200 are based on the same concept as the various method embodiments corresponding to Figure 10 in the present application. The specific contents can be found in the description of the method embodiments shown in the previous part of the present application and will not be repeated here.

Next, an execution device provided in an embodiment of the present application is introduced. Please refer to Figure 13. Figure 13 is a structural schematic diagram of an execution device provided in an embodiment of the present application. The execution device 1300 can be specifically manifested as a vehicle, a mobile robot, a monitoring data processing device or other equipment, etc., which is not limited here. Specifically, the execution device 1300 includes: a receiver 1301, a transmitter 1302, a processor 1303 and a memory 1304 (wherein the number of processors 1303 in the execution device 1300 can be one or more, and one processor is taken as an example in Figure 13), wherein the processor 1303 may include an application processor 13031 and a communication processor 13032. In some embodiments of the present application, the receiver 1301, the transmitter 1302, the processor 1303 and the memory 1304 may be connected via a bus or other means.

The memory 1304 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1303. A portion of the memory 1304 may also include a non-volatile random access memory (NVRAM). The memory 1304 stores processor and operation instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein: The operation instructions may include various operation instructions for implementing various operations.

The processor 1303 controls the operation of the execution device. In specific applications, the various components of the execution device are coupled together through a bus system, wherein the bus system includes not only a data bus but also a power bus, a control bus, and a status signal bus, etc. However, for the sake of clarity, various buses are referred to as bus systems in the figure.

The method disclosed in the above embodiment of the present application can be applied to the processor 1303, or implemented by the processor 1303. The processor 1303 can be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by the hardware integrated logic circuit in the processor 1303 or the instruction in the form of software. The above processor 1303 can be a general processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and can further include an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The processor 1303 can implement or execute the various methods, steps and logic block diagrams disclosed in the embodiment of the present application. The general processor can be a microprocessor or the processor can also be any conventional processor, etc. The steps of the method disclosed in the embodiment of the present application can be directly embodied as a hardware decoding processor to be executed, or a combination of hardware and software modules in the decoding processor can be executed. The software module may be located in a storage medium mature in the art, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, etc. The storage medium is located in the memory 1304, and the processor 1303 reads the information in the memory 1304 and completes the steps of the above method in combination with its hardware.

The receiver 1301 can be used to receive input digital or character information and generate signal input related to the relevant settings and function control of the execution device. The transmitter 1302 can be used to output digital or character information through the first interface; the transmitter 1302 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1302 can also include a display device such as a display screen.

In an embodiment of the present application, in one case, the application processor 13031 in the processor 1303 is used to execute the vehicle position acquisition method executed by the execution device in the embodiments corresponding to Figures 4 to 9. It should be noted that the specific manner in which the application processor 13031 executes the aforementioned steps is based on the same concept as the various method embodiments corresponding to Figures 4 to 9 in the present application, and the technical effects brought about are the same as the various method embodiments corresponding to Figures 4 to 9 in the present application. For specific contents, please refer to the description in the method embodiments shown in the aforementioned present application, which will not be repeated here.

The embodiment of the present application also provides a training device, please refer to Figure 14, which is a structural diagram of a training device provided by the embodiment of the present application. Specifically, the training device 1400 is implemented by one or more servers, and the training device 1400 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (CPU) 1422 (for example, one or more processors) and a memory 1432, and one or more storage media 1430 (for example, one or more mass storage devices) storing application programs 1442 or data 1444. Among them, the memory 1432 and the storage medium 1430 can be short-term storage or permanent storage. The program stored in the storage medium 1430 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the training device. Furthermore, the central processor 1422 can be configured to communicate with the storage medium 1430 to execute a series of instruction operations in the storage medium 1430 on the training device 1400.

The training device 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input and output interfaces 1458, and/or, one or more operating systems 1441, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

In the embodiment of the present application, the central processor 1422 is used to execute the vehicle position acquisition method executed by the training device in the embodiment corresponding to Figure 10. It should be noted that the specific manner in which the central processor 1422 executes the aforementioned steps is based on the same concept as the various method embodiments corresponding to Figure 10 in the present application, and the technical effects brought about are the same as the various method embodiments corresponding to Figure 10 in the present application. For specific contents, please refer to the description in the method embodiments shown in the previous embodiment of the present application, and no further description will be given here.

Also provided in an embodiment of the present application is a computer program product, which, when executed on a computer, enables the computer to execute the steps executed by the execution device in the method described in the embodiments shown in Figures 4 to 9 above, or enables the computer to execute the steps executed by the training device in the method described in the embodiment shown in Figure 10 above.

A computer-readable storage medium is also provided in an embodiment of the present application, which stores a program for signal processing. When the computer-readable storage medium is run on a computer, the computer executes the steps executed by the execution device in the method described in the embodiments shown in Figures 4 to 9 above, or the computer executes the steps executed by the training device in the method described in the embodiment shown in Figure 10 above.

The vehicle position acquisition device, model training device, execution device and training device provided in the embodiments of the present application can be specifically a chip, and the chip includes: a processing unit and a communication unit, the processing unit can be, for example, a processor, and the communication unit can be, for example, an input/output interface, a pin or a circuit, etc. The processing unit can execute the computer execution instructions stored in the storage unit, so that the chip executes the vehicle position acquisition method described in the embodiments shown in Figures 4 to 9 above, or so that the chip executes the model training method described in the embodiment shown in Figure 10 above. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit can also be a storage unit located outside the chip in the wireless access device end, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.

Specifically, please refer to FIG. 15 , which is a schematic diagram of a structure of a chip provided in an embodiment of the present application, wherein the chip may be a neural network processor NPU 150, which is mounted on the host CPU (Host CPU) as a coprocessor and is assigned tasks by the Host CPU. The core part of the NPU is the operation circuit 1503, which is controlled by the controller 1504 to extract matrix data from the memory and perform multiplication operations.

In some implementations, the operation circuit 1503 includes multiple processing units (Process Engine, PE) inside. In some implementations, the operation circuit 1503 is a two-dimensional systolic array. The operation circuit 1503 can also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 1503 is a general-purpose matrix processor.

For example, assume there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit takes the corresponding data of matrix B from the weight memory 1502 and caches it on each PE in the operation circuit. The operation circuit takes the matrix A data from the input memory 1501 and performs matrix operation with matrix B, and the partial result or final result of the matrix is stored in the accumulator 1508.

Unified memory 1506 is used to store input data and output data. Weight data is directly transferred to weight memory 1502 through Direct Memory Access Controller (DMAC) 1505. Input data is also transferred to unified memory 1506 through DMAC.

BIU stands for Bus Interface Unit, that is, the bus interface unit 1510, which is used for the interaction between AXI bus and DMAC and instruction fetch buffer (IFB) 1509.

The bus interface unit 1510 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1509 to obtain instructions from the external memory, and is also used for the storage unit access controller 1505 to obtain the original data of the input matrix A or the weight matrix B from the external memory.

DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1506 or to transfer weight data to the weight memory 1502 or to transfer input data to the input memory 1501.

The vector calculation unit 1507 includes multiple operation processing units, which further process the output of the operation circuit when necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as Batch Normalization, pixel-level summation, upsampling of feature planes, etc.

In some implementations, the vector calculation unit 1507 can store the processed output vector to the unified memory 1506. For example, the vector calculation unit 1507 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1503, such as linear interpolation of the feature plane extracted by the convolution layer, and then, for example, a vector of accumulated values to generate an activation value. In some implementations, the vector calculation unit 1507 generates a normalized value, a pixel-level summed value, or both. In some implementations, the processed output vector can be used as an activation input to the operation circuit 1503, for example, for use in a subsequent layer in a neural network.

An instruction fetch buffer 1509 connected to the controller 1504 is used to store instructions used by the controller 1504;

Unified memory 1506, input memory 1501, weight memory 1502 and instruction fetch memory 1509 are all on-chip memories. External memories are private to the NPU hardware architecture.

Among them, the operations of each layer in the target model shown in Figures 4 to 10 can be performed by the operation circuit 1503 or the vector calculation unit 1507.

The processor mentioned in any of the above places may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the above-mentioned first aspect method.

It should also be noted that the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed over multiple network units. Some or all of the modules may be selected to implement the present invention according to actual needs. In addition, in the drawings of the device embodiments provided by the present application, the connection relationship between modules indicates that there is a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines.

Through the description of the above implementation mode, the technicians in the field can clearly understand that the present application can be implemented by means of software plus necessary general hardware, and of course, it can also be implemented by special hardware including special integrated circuits, special CPUs, special memories, special components, etc. In general, all functions completed by computer programs can be easily implemented by corresponding hardware, and the specific hardware structure used to implement the same function can also be various, such as analog circuits, digital circuits or special circuits. However, for the present application, software program implementation is a better implementation mode in more cases. Based on such an understanding, the technical solution of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, a U disk, a mobile hard disk, a ROM, a RAM, a disk or an optical disk, etc., including a number of instructions to enable a computer device (which can be a personal computer, a training device, or a network device, etc.) to execute the methods described in each embodiment of the present application.

In the above embodiments, all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented by software, all or part of the embodiments may be implemented in the form of a computer program product.

The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website site, a computer, a training device, or a data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, training device, or data center. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training device, a data center, etc. that includes one or more available media integrations. The available medium may be a magnetic medium, (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)), etc.

Claims

A method for obtaining a vehicle position, characterized in that the method comprises:

Acquire first information and second information, wherein the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle;

The first information and the second information are input into a first model to obtain prediction information generated by the first model, wherein the prediction information includes predicted position information of vehicles around the vehicle within a first time.
The method according to claim 1 is characterized in that the prediction information includes predicted trajectory information of vehicles around the vehicle within the first time and third information, and the third information indicates the lanes in which the vehicles around the vehicle are located within the first time.
The method according to claim 2, characterized in that the third information includes a correlation between a target vehicle around the vehicle and at least one lane around the vehicle within the first time, the target vehicle being a vehicle around the vehicle, and the method further comprising:

A first lane is determined as a lane where the target vehicle is located during the first time, wherein the first lane is a lane with the highest correlation with the target vehicle during the first time among at least one lane around the vehicle.
The method according to claim 2 or 3 is characterized in that the first model is constructed based on an attention mechanism, and the inputting the first information and the second information into the first model to obtain the prediction information generated by the first model comprises: inputting the first information and the second information into the first model, and generating fourth information based on the attention mechanism, wherein the fourth information comprises the degree of association between a target vehicle around the ego vehicle and a first lane set within the first time, the target vehicle being a vehicle around the ego vehicle, and the first lane set comprising all lanes around the ego vehicle included in the second information;

Acquire the category of the road scene to which the target vehicle belongs, where the category of the road scene includes an intersection scene and a non-intersection scene;

Selecting a second lane set from the first lane set according to the category of the road scene to which the target vehicle belongs, the second lane set including lanes where vehicles around the ego vehicle are located during the first time;

Acquire fifth information from the fourth information, and generate the third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;

The predicted trajectory information of vehicles around the own vehicle within the first time is generated based on the second information and the fourth information.
The method according to claim 4, characterized in that the obtaining the fifth information from the fourth information and generating the third information according to the fifth information comprises:

Acquire the fifth information from the fourth information, and perform a normalization operation on the fifth information to obtain normalized fifth information;

The normalized fifth information is input into a multi-layer perceptron to obtain the third information.
The method according to claim 4 or 5, characterized in that generating the fourth information according to the first information and the second information comprises:

Performing vectorization processing and linear mapping on the first information and the second information respectively to obtain a first linear matrix and a second linear matrix;

A normalization operation is performed on a matrix product of the first linear matrix and the second linear matrix to obtain the fourth information.
The method according to claim 6, characterized in that the step of generating the predicted trajectory information of vehicles around the vehicle within the first time period based on the second information and the fourth information comprises:

Performing a matrix multiplication operation on the second linear matrix and the fourth information to obtain sixth information;

The sixth information is input into a multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle within the first time.
A model training method, characterized in that the method comprises:

Acquire first information and second information, wherein the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle;

Inputting the first information and the second information into a first model to obtain prediction information generated by the first model, the prediction information including predicted position information of vehicles around the vehicle within a first time;

The first model is trained according to a loss function, wherein the loss function indicates a similarity between the predicted information and correct information, wherein the correct information includes correct position information of vehicles around the self-vehicle within the first time.
The method according to claim 8 is characterized in that the prediction information includes predicted trajectory information of vehicles around the vehicle within the first time and third information, and the third information indicates the lane in which the vehicles around the vehicle are located within the first time.
The method according to claim 9, characterized in that the third information includes a correlation between a target vehicle around the ego vehicle and at least one lane around the ego vehicle within the first time, the target vehicle being a vehicle around the ego vehicle, and the method further comprising:

A first lane is determined as a lane where the target vehicle is located during the first time, wherein the first lane is a lane with the highest correlation with the target vehicle during the first time among at least one lane around the vehicle.
The method according to claim 9 or 10, characterized in that the first model is constructed based on an attention mechanism, and the inputting the first information and the second information into the first model to obtain the prediction information generated by the first model comprises:

Inputting the first information and the second information into the first model, generating fourth information based on the attention mechanism, wherein the fourth information includes a correlation between a target vehicle around the ego vehicle and a first lane set within the first time, wherein the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;

Acquire the category of the road scene to which the target vehicle belongs, where the category of the road scene includes an intersection scene and a non-intersection scene;

Selecting a second lane set from the first lane set according to the category of the road scene to which the target vehicle belongs, the second lane set including lanes where vehicles around the ego vehicle are located during the first time;

Acquire fifth information from the fourth information, and generate the third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;

The predicted trajectory information of vehicles around the own vehicle within the first time is generated based on the second information and the fourth information.
A vehicle position acquisition device, characterized by comprising:

An acquisition module, configured to acquire first information and second information, wherein the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle;

The position prediction module is used to input the first information and the second information into a first model to obtain prediction information generated by the first model, wherein the prediction information includes predicted position information of vehicles around the vehicle within a first time.
The device according to claim 12 is characterized in that the prediction information includes predicted trajectory information of vehicles around the vehicle within the first time and third information, and the third information indicates the lane in which the vehicles around the vehicle are located within the first time.
The device according to claim 13, characterized in that the third information includes a correlation between a target vehicle around the ego vehicle and at least one lane around the ego vehicle during the first time, the target vehicle being a vehicle around the ego vehicle, and the device further comprising:

A lane determination module is used to determine a first lane as a lane where the target vehicle is located during the first time, wherein the first lane is a lane with the highest correlation with the target vehicle during the first time among at least one lane around the vehicle.
The device according to claim 13 or 14, characterized in that the first model is constructed based on an attention mechanism, and the position prediction module is specifically used to:

Inputting the first information and the second information into the first model, generating fourth information based on the attention mechanism, wherein the fourth information includes a correlation between a target vehicle around the ego vehicle and a first lane set within the first time, wherein the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;

Acquire the category of the road scene to which the target vehicle belongs, where the category of the road scene includes an intersection scene and a non-intersection scene;

Selecting a second lane set from the first lane set according to the category of the road scene to which the target vehicle belongs, the second lane set including lanes where vehicles around the ego vehicle are located during the first time;

Acquire fifth information from the fourth information, and generate the third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;

Generate the predicted trajectory information of the vehicles around the vehicle within the first time according to the second information and the fourth information interest.
The device according to claim 15, characterized in that the position prediction module is specifically used to:

Acquire the fifth information from the fourth information, and perform a normalization operation on the fifth information to obtain normalized fifth information;

The normalized fifth information is input into a multi-layer perceptron to obtain the third information.
The device according to claim 15 or 16, characterized in that the position prediction module is specifically used to:

Performing vectorization processing and linear mapping on the first information and the second information respectively to obtain a first linear matrix and a second linear matrix;

A normalization operation is performed on a matrix product of the first linear matrix and the second linear matrix to obtain the fourth information.
The device according to claim 17, characterized in that the location prediction module is specifically used to:

Performing a matrix multiplication operation on the second linear matrix and the fourth information to obtain sixth information;

The sixth information is input into a multi-layer perceptron to obtain the predicted trajectory information of vehicles around the vehicle within the first time.
A model training device, characterized in that it comprises:

An acquisition module, configured to acquire first information and second information, wherein the first information includes information about vehicles around the vehicle, and the second information includes information about lanes around the vehicle;

a position prediction module, configured to input the first information and the second information into a first model to obtain prediction information generated by the first model, wherein the prediction information includes predicted position information of vehicles around the vehicle within a first time;

A model training module is used to train the first model according to a loss function, wherein the loss function indicates the similarity between the predicted information and the correct information, and the correct information includes the correct position information of the vehicles around the vehicle within the first time.
The device according to claim 19 is characterized in that the prediction information includes predicted trajectory information of vehicles around the self-vehicle within the first time and third information, and the third information indicates the lane in which the vehicles around the self-vehicle are located within the first time.
The device according to claim 20, characterized in that the third information includes a correlation between a target vehicle around the ego vehicle and at least one lane around the ego vehicle during the first time, the target vehicle being a vehicle around the ego vehicle, and the device further comprising:

A lane determination module is used to determine a first lane as a lane where the target vehicle is located during the first time, wherein the first lane is a lane with the highest correlation with the target vehicle during the first time among at least one lane around the vehicle.
The device according to claim 20 or 21, characterized in that the first model is constructed based on an attention mechanism, and the position prediction module is specifically used to:

Inputting the first information and the second information into the first model, generating fourth information based on the attention mechanism, wherein the fourth information includes a correlation between a target vehicle around the ego vehicle and a first lane set within the first time, wherein the target vehicle is a vehicle around the ego vehicle, and the first lane set includes all lanes around the ego vehicle included in the second information;

Acquire the category of the road scene to which the target vehicle belongs, where the category of the road scene includes an intersection scene and a non-intersection scene;

Selecting a second lane set from the first lane set according to the category of the road scene to which the target vehicle belongs, the second lane set including lanes where vehicles around the ego vehicle are located during the first time;

Acquire fifth information from the fourth information, and generate the third information according to the fifth information, wherein the fifth information includes a correlation between the target vehicle and the second lane set within the first time;

The predicted trajectory information of vehicles around the own vehicle within the first time is generated based on the second information and the fourth information.
An execution device, comprising a processor and a memory, wherein the processor is coupled to the memory.

The memory is used to store programs;

The processor is configured to execute the program in the memory so that the execution device executes the method according to any one of claims 1 to 7.
An autonomous driving vehicle, characterized in that it includes a processor, the processor is coupled to a memory, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the method described in any one of claims 1 to 7 is implemented.
A training device, comprising a processor and a memory, wherein the processor is coupled to the memory.

The memory is used to store programs;

The processor is used to execute the program in the memory so that the training device performs the method according to any one of claims 8 to 11.
A computer-readable storage medium comprises instructions, which, when executed on a computer, causes the computer to execute the method according to any one of claims 1 to 7, or the method according to any one of claims 8 to 11.
A circuit system, characterized in that the circuit system comprises a processing circuit, and the processing circuit is configured to execute the method as claimed in any one of claims 1 to 7, or to execute the method as claimed in any one of claims 8 to 11.