CN117994754A

CN117994754A - Vehicle position acquisition method, model training method and related equipment

Info

Publication number: CN117994754A
Application number: CN202211350093.1A
Authority: CN
Inventors: 李姗; 邓乃铭; 邢国成; 朱丽
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2024-05-07
Also published as: WO2024093321A1

Abstract

The application discloses a vehicle position acquisition method, a model training method and related equipment in the field of artificial intelligence, wherein the method comprises the following steps: acquiring first information and second information, wherein the first information comprises information of vehicles around the own vehicle, and the second information comprises information of lanes around the own vehicle; and inputting the first information and the second information into the first model to obtain prediction information generated by the first model, wherein the prediction information comprises the predicted position information of vehicles around the own vehicle in the first time. By combining the lane information around the own vehicle, the predicted position information of the vehicles around the own vehicle is related to the lane, so that the accuracy of the predicted result is improved.

Description

Vehicle position acquisition method, model training method and related equipment

Technical Field

The application relates to the field of artificial intelligence, in particular to a vehicle position acquisition method, a model training method and related equipment.

Background

An autonomous vehicle travels on a road, and the travel track of surrounding vehicles needs to be considered. When the driving intention of the surrounding vehicle changes, the automatic driving vehicle needs to make a corresponding response, so that collision accidents with the surrounding vehicle are avoided. Thus, accurate prediction of his driving intention is important for an automatically driven vehicle.

At present, a method such as kalman filtering is mainly used for predicting the position of a vehicle, but the method only depends on the historical track of the vehicle and does not consider lane information in the environment.

Disclosure of Invention

The application provides a vehicle position acquisition method, a model training method and related equipment, which can predict the positions of vehicles around a vehicle.

In a first aspect, the present application provides a method for obtaining a position of a vehicle, which can be used in the field of artificial intelligence. The method comprises the following steps:

First, acquiring first information and second information, wherein the first information comprises information of vehicles around a self-vehicle, and the second information comprises information of lanes around the self-vehicle; and then, inputting the first information and the second information into the first model to obtain prediction information generated by the first model, wherein the prediction information comprises the predicted position information of the vehicles around the vehicle in the first time.

In a real scene, the self-vehicle and the surrounding vehicles form an interdependent whole, the respective behaviors influence the decisions of each other, and the previous research often only depends on the historical track of the self-vehicle to predict the future track of the vehicle, so that the prediction result is obviously inaccurate. According to the method and the device, the predicted position information of the vehicles around the self-vehicle is associated with the lanes by combining the lane information around the self-vehicle, so that the accuracy of the predicted result is further improved, a basis is provided for decision-making planning of the self-driving vehicle, and the riding experience of the self-driving vehicle is also improved.

In a possible implementation manner of the first aspect, the prediction information includes prediction track information of vehicles around the own vehicle within a first time and third information, and the third information indicates a lane in which the vehicles around the own vehicle within the first time are located.

In the possible implementation manner, on the basis of considering the lane information around the own vehicle, the future running intention of the vehicle is bound with the lane by outputting the information of the lane where the vehicle around the own vehicle is located, so that the relationship between the vehicle around the own vehicle and the lane is effectively utilized, and the prediction accuracy of the vehicle position is improved.

In a possible implementation manner of the first aspect, the third information includes a degree of association between a target vehicle around the host vehicle and at least one lane around the host vehicle in a first time, and the method further includes:

And determining the first lane as the lane in which the target vehicle is located in the first time, wherein the first lane is one lane with the highest association degree with the target vehicle in the first time from at least one lane around the vehicle.

In the possible implementation manner, the future running position of the vehicle is bound with the lanes, the probability that the target vehicle runs in each lane in the future is given by outputting the association degree of the target vehicle and the lanes around the own vehicle on the basis of acquiring the lane information around the own vehicle, and the lane with the highest association degree is determined as the lane where the target vehicle is located in the first time, so that the accuracy of the prediction result is improved.

In a possible implementation manner of the first aspect, the first model is constructed based on an attention mechanism, the first information and the second information are input into the first model, so as to obtain prediction information generated by the first model, and the method includes: inputting the first information and the second information into a first model, and generating fourth information based on an attention mechanism, wherein the fourth information comprises the association degree of a target vehicle around a vehicle and a first lane set in the first time, the target vehicle is one vehicle around the vehicle, and the first lane set comprises all lanes around the vehicle included in the second information;

Acquiring the category of a road scene to which a target vehicle belongs, wherein the category of the road scene comprises an intersection scene and a non-intersection scene;

Selecting a second vehicle road set from the first vehicle road set according to the category of the road scene to which the target vehicle belongs, wherein the second vehicle road set comprises lanes of vehicles around the vehicle in a first time;

obtaining fifth information from the fourth information, and generating third information according to the fifth information, wherein the fifth information comprises the association degree of the target vehicle and the second vehicle road set in the first time;

and generating predicted track information of vehicles around the vehicle in the first time according to the second information and the fourth information.

In the possible implementation manner, the association degree of the target vehicle and the first lane set in the first time is obtained based on the attention mechanism by inputting the information of the vehicles around the own vehicle and the information of the lanes into the first model, and on one hand, the prediction intention is more stable by binding the future running intention of the vehicle with the lanes; on the other hand, the road scene of the target vehicle is screened, and the lane to be reserved is selected according to the road scene of the vehicle, so that the prediction accuracy is further improved.

In a possible implementation manner of the first aspect, the obtaining fifth information from the fourth information, and generating third information according to the fifth information, includes:

Obtaining fifth information from the fourth information, and carrying out normalization operation on the fifth information to obtain normalized fifth information;

and inputting the normalized fifth information into a multi-layer perceptron to obtain third information.

In this possible implementation manner, the fourth information includes a degree of association between the target vehicle around the vehicle and the first lane set in the first time, that is, a attention score of the target vehicle with respect to each lane in the first lane set, and according to a road scene to which the target vehicle belongs, the second lane set to which the target vehicle belongs may be determined, and according to the second lane set, the corresponding fifth information is selected from the fourth information, so that lane prediction may be performed pertinently according to a specific road scene to which the target vehicle belongs, so as to further improve accuracy of prediction. In addition, through carrying out normalization operation on each element contained in the fifth information, the sum of all elements is 1, then the elements are input into the multi-layer perceptron, and the association degree of the target vehicle and the lanes around the own vehicle in the first time is output under the action of the multi-layer perceptron.

In a possible implementation manner of the first aspect, generating fourth information according to the first information and the second information includes:

Vectorization processing and linear mapping are respectively carried out on the first information and the second information, so that a first linear matrix and a second linear matrix are obtained;

and performing normalization operation on the matrix product of the first linear matrix and the second linear matrix to obtain fourth information.

In this possible implementation manner, the first information and the second information can be fused based on the attention mechanism to obtain the attention score of the target vehicle relative to each lane.

In a possible implementation manner of the first aspect, generating predicted track information of vehicles around the vehicle at the first time according to the second information and the fourth information includes:

Performing matrix multiplication operation on the second linear matrix and the fourth information to obtain sixth information;

And inputting the sixth information into the multi-layer perceptron to obtain the predicted track information of the vehicles around the vehicle in the first time.

In this possible implementation manner, the second linear matrix and the fourth information can be fused based on the attention mechanism, and the obtained third information is input into the multi-layer perceptron, so that the predicted track information of the vehicles around the vehicle in the first time is obtained under the action of the multi-layer perceptron.

In a second aspect, the present application provides a method for training a model, which can be used in the field of artificial intelligence. The method comprises the following steps:

First, acquiring first information and second information, wherein the first information comprises information of vehicles around a self-vehicle, and the second information comprises information of lanes around the self-vehicle; then, inputting the first information and the second information into a first model to obtain prediction information generated by the first model, wherein the prediction information comprises the predicted position information of vehicles around the vehicle in a first time; finally, the first model is trained according to a loss function, the loss function indicating a similarity between the predicted information and correct information, the correct information including correct position information of vehicles around the host vehicle within a first time.

In the application, when the first model is trained, the training samples used comprise the complete information of the vehicles around the own vehicle and the complete information of the lanes around the own vehicle, so that the position information output by the first model is more accurate. It will be appreciated that the first model may be used to perform the steps of the aforementioned first aspect or alternative embodiments of the first aspect.

In one possible implementation manner of the second aspect, the predicted information includes predicted track information of vehicles around the own vehicle within the first time and third information indicating a lane in which the vehicles around the own vehicle are located within the first time.

In a possible implementation manner of the second aspect, the third information includes a degree of association between a target vehicle around the own vehicle and at least one lane around the own vehicle in the first time, and the method further includes:

In a possible implementation manner of the second aspect, the first model is constructed based on an attention mechanism, the first information and the second information are input into the first model, so as to obtain prediction information generated by the first model, and the method includes:

Inputting the first information and the second information into a first model, and generating fourth information based on an attention mechanism, wherein the fourth information comprises the association degree of a target vehicle around a vehicle and a first lane set in the first time, the target vehicle is one vehicle around the vehicle, and the first lane set comprises all lanes around the vehicle included in the second information;

In a possible implementation manner of the second aspect, the obtaining fifth information from the fourth information, and generating third information according to the fifth information, includes:

In a possible implementation manner of the second aspect, generating fourth information according to the first information and the second information includes:

In a possible implementation manner of the second aspect, generating predicted track information of vehicles around the vehicle in the first time according to the second information and the fourth information includes:

For the specific meaning of the noun in the second aspect of the embodiment of the present application and in the various possible implementations of the second aspect, reference may be made to the descriptions in the various possible implementations of the first aspect, which are not described in detail herein.

In a third aspect, the present application provides a position acquisition device for a vehicle, which can be used in the field of artificial intelligence. The device comprises an acquisition module and a position prediction module. The system comprises an acquisition module, a control module and a control module, wherein the acquisition module is used for acquiring first information and second information, the first information comprises information of vehicles around a self-vehicle, and the second information comprises information of lanes around the self-vehicle; the position prediction module is used for inputting the first information and the second information into the first model to obtain prediction information generated by the first model, wherein the prediction information comprises the predicted position information of vehicles around the vehicle in the first time.

In a possible implementation manner of the third aspect, the predicted information includes predicted track information of vehicles around the own vehicle within the first time and third information indicating a lane in which the vehicles around the own vehicle are located within the first time.

In a possible implementation manner of the third aspect, the third information includes a degree of association between a target vehicle around the host vehicle and at least one lane around the host vehicle in the first time, and the apparatus further includes:

The lane determining module is used for determining a first lane as a lane where the target vehicle is located in a first time, wherein the first lane is one lane with highest association degree with the target vehicle in the first time from at least one lane around the vehicle.

In a possible implementation manner of the third aspect, the first model is constructed based on an attention mechanism, and the position prediction module is specifically configured to:

In a possible implementation manner of the third aspect, the location prediction module is specifically configured to:

In the third aspect of the present application, each module included in the position obtaining device of the vehicle may be further configured to implement steps in various possible implementations of the first aspect, and for specific implementations of some steps in the third aspect and various possible implementations of the third aspect of the present application, and beneficial effects caused by each possible implementation, reference may be made to descriptions in various possible implementations of the first aspect, which are not described in detail herein.

In a fourth aspect, the present application provides a training apparatus for models, which may be used in the field of artificial intelligence. The device comprises an acquisition module, a position prediction module and a model training module. The system comprises an acquisition module, a control module and a control module, wherein the acquisition module is used for acquiring first information and second information, the first information comprises information of vehicles around a self-vehicle, and the second information comprises information of lanes around the self-vehicle; the position prediction module is used for inputting the first information and the second information into the first model to obtain prediction information generated by the first model, wherein the prediction information comprises the prediction position information of vehicles around the vehicle in the first time; the model training module is used for training the first model according to a loss function, wherein the loss function indicates the similarity between the prediction information and correct information, and the correct information comprises correct position information of vehicles around the vehicle in the first time.

In a possible implementation manner of the fourth aspect, the predicted information includes predicted track information of vehicles around the own vehicle in the first time and third information, and the third information indicates a lane in which the vehicles around the own vehicle in the first time are located.

In a possible implementation manner of the fourth aspect, the third information includes a degree of association between a target vehicle around the host vehicle and at least one lane around the host vehicle in the first time, and the apparatus further includes:

In a possible implementation manner of the fourth aspect, the first model is constructed based on an attention mechanism, and the position prediction module is specifically configured to:

In a possible implementation manner of the fourth aspect, the location prediction module is specifically configured to:

In the fourth aspect of the present application, each module included in the training device of the model may be further used to implement steps in various possible implementations of the second aspect, and for specific implementations of some steps in the fourth aspect of the embodiments of the present application and various possible implementations of the fourth aspect, and beneficial effects brought by each possible implementation, reference may be made to descriptions in various possible implementations of the second aspect, which are not described herein in detail.

In a fifth aspect, an embodiment of the present application provides an execution device, which may include a processor, where the processor is coupled to a memory, and the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, implement the method for acquiring a position of a vehicle according to the first aspect. For the steps executed by the execution device in each possible implementation manner of the first aspect executed by the processor, reference may be made to the above first aspect, which is not described herein.

In a sixth aspect, an embodiment of the present application provides an autonomous vehicle, which may include a processor, the processor being coupled to a memory, the memory storing program instructions, and the memory storing program instructions implementing the method for acquiring a position of the vehicle according to the first aspect, when executed by the processor. For the steps executed by the execution device in each possible implementation manner of the first aspect executed by the processor, reference may be made to the above first aspect, which is not described herein.

In a seventh aspect, an embodiment of the present application provides a training apparatus, which may include a processor, and a memory coupled to the processor, where the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, implement the training method of the model according to the second aspect. For the steps performed by the training device in each possible implementation manner of the second aspect executed by the processor, reference may be made to the second aspect specifically, which is not described herein.

In an eighth aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, which when run on a computer causes the computer to perform the method according to the first aspect or any of the possible implementations of the first aspect or cause the computer to perform the method according to the second aspect or any of the possible implementations of the second aspect.

In a ninth aspect, embodiments of the present application provide a circuit system, the circuit system comprising a processing circuit configured to perform the method of the first aspect or any of the possible implementations of the first aspect, or the processing circuit is configured to perform the method of the second aspect or any of the possible implementations of the second aspect.

In a tenth aspect, embodiments of the present application provide a computer program product which, when run on a computer, causes the computer to perform the method of the first aspect or any of the possible implementations of the first aspect, or causes the computer to perform the method of the second aspect or any of the possible implementations of the second aspect.

In an eleventh aspect, the present application provides a chip system comprising a processor and a memory, the memory being for storing a computer program, the processor being for invoking and running the computer program stored in the memory to perform the method as described above in the first aspect or any of the possible implementations of the first aspect, or to cause a computer to perform the method of the second aspect or any of the possible implementations of the second aspect. The chip system can be composed of chips, and can also comprise chips and other discrete devices.

Drawings

FIG. 1a is a schematic diagram of a structure of an artificial intelligence main body frame;

FIG. 1b is a schematic view of a road condition;

fig. 1c is a schematic structural diagram of an autopilot device with autopilot function according to an embodiment of the present application;

FIG. 2a is a schematic diagram of a system architecture according to the present application;

FIG. 2b is a schematic flow chart of a method for obtaining a position of a vehicle according to the present application;

FIG. 3 is a schematic diagram of a multi-layer sensor according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of another method for obtaining a position of a vehicle according to the present application;

FIG. 5 is a schematic structural diagram of a first model according to an embodiment of the present application;

FIG. 6 is a schematic diagram of another structure of a first model according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a first embedded module according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a second embedded module according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a first decoder module according to an embodiment of the present application;

FIG. 10 is a schematic flow chart of a training method of a model according to an embodiment of the present application;

fig. 11 is a schematic structural view of a position acquisition device for a vehicle according to an embodiment of the present application;

FIG. 12 is a schematic structural view of a training device for a model according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an execution device according to an embodiment of the present application;

FIG. 14 is a schematic view of a training apparatus according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of a chip according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. As a person skilled in the art can know, with the development of technology and the appearance of new scenes, the technical scheme provided by the embodiment of the application is applicable to similar technical problems.

The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules that are expressly listed or inherent to such process, method, article, or apparatus.

The term "and/or" appearing in the present application may be an association relationship describing an associated object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In the present application, the character "/" generally indicates that the front and rear related objects are an or relationship.

It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may in fact be executed substantially concurrently or the reverse order may sometimes be executed, depending upon the functionality/acts involved.

In the embodiments of the present application, unless otherwise indicated, the meaning of "at least one" means one or more, and the meaning of "a plurality" means two or more. It is to be understood that in the present application, the terms "when …", "if" and "if" are used to indicate that the device is performing the corresponding process under some objective condition, and are not intended to limit the time and require no judgment in the implementation of the device, nor are other limitations meant to be implied. In addition, the specialized word "exemplary" means "serving as an example, embodiment, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The method for acquiring the position of the vehicle can be applied to an artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) scene. AI is a theory, method, technique, and application system that utilizes a digital computer or a digital computer-controlled machine to simulate, extend, and extend human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results. In other words, artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, man-machine interaction, recommendation and search, AI-based theory, and the like.

Embodiments of the present application are described below with reference to the accompanying drawings. As one of ordinary skill in the art can know, with the development of technology and the appearance of new scenes, the technical scheme provided by the embodiment of the application is also applicable to similar technical problems.

First, the overall workflow of the artificial intelligence system will be described, referring to fig. 1a, fig. 1a shows a schematic structural diagram of an artificial intelligence subject framework, and the artificial intelligence subject framework is described below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where the "intelligent information chain" reflects a list of processes from the acquisition of data to the processing. For example, there may be general procedures of intelligent information awareness, intelligent information representation and formation, intelligent reasoning, intelligent decision making, intelligent execution and output. In this process, the data undergoes a "data-information-knowledge-wisdom" gel process. The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of personal intelligence, information (provisioning and processing technology implementation), to the industrial ecological process of the system.

(1) Infrastructure:

The infrastructure provides computing capability support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the base platform. Communicating with the outside through the sensor; the computing power is provided by a smart chip, such as a hardware acceleration chip, e.g., a central processing unit (central processing unit, CPU), a Network Processor (NPU), a graphics processor (English: graphics processing unit, GPU), an application-specific integrated circuit (ASIC), or a field programmable gate array (field programmable GATE ARRAY, FPGA); the basic platform comprises a distributed computing framework, a network and other relevant platform guarantees and supports, and can comprise cloud storage, computing, interconnection and interworking networks and the like. For example, the sensor and external communication obtains data that is provided to a smart chip in a distributed computing system provided by the base platform for computation.

(2) Data

The data of the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data relate to graphics, images, voice, video and text, and also relate to internet of things data of traditional equipment, wherein the data comprise service data of an existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

Wherein machine learning and deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Reasoning refers to the process of simulating human intelligent reasoning modes in a computer or an intelligent system, and carrying out machine thinking and problem solving by using formal information according to a reasoning control strategy, and typical functions are searching and matching.

Decision making refers to the process of making decisions after intelligent information is inferred, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capability

After the data is processed as mentioned above, some general-purpose capabilities can be formed based on the result of the data processing, such as algorithms or a general-purpose system, for example, translation, text analysis, computer vision processing (e.g., image recognition, object detection, etc.), voice recognition, etc.

(5) Intelligent product and industry application

The intelligent product and industry application refers to products and applications of an artificial intelligent system in various fields, is encapsulation of an artificial intelligent overall solution, and realizes land application by making intelligent information decisions, and the application fields mainly comprise: intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, intelligent terminals and the like.

The method and the device can be applied to the field of automatic driving, and particularly can realize the prediction of the driving intention and the prediction of the driving track of the other vehicle in the field of automatic driving.

The driving intention may refer to a driving strategy to be performed by the vehicle in the future, and specifically, the driving intention of the vehicle may be estimated according to information such as road condition information and driving state of the vehicle. The prediction of the track of the vehicle refers to predicting the position of the vehicle at each time point in the future.

In the automatic driving field, the driving intention of surrounding vehicles is estimated accurately and reliably in real time, future driving tracks of the vehicles are predicted, the traffic situation in front of the own vehicle can be predicted, the traffic situation around the own vehicle is built, the importance judgment of other surrounding vehicle targets is facilitated, interactive key targets are screened, the own vehicle is convenient to conduct path planning in advance, and the complex scene is safely passed. It should be understood that the above-described surrounding vehicles may also be referred to as associated vehicles located around the own vehicle in embodiments of the present application.

In the related art, a driving intention is defined as a directional intention of straight, left turn, right turn, etc., for example, a driving intention of a vehicle in an intersection scene may include straight, left turn, right turn, etc. However, the above-mentioned definition manner of the driving intention is limited in the complex scene representation capability, and the directivity intention cannot cover all the driving intentions in some complex intersections or other complex lane scenes. For example, referring to fig. 1b, fig. 1b is a schematic structural diagram of a road condition, where lane 1 and lane 2 are left-hand lanes, lane 3 and lane 4 are straight lanes, and lane 5 is an S-type lane.

The vehicle position acquisition method provided by the embodiment of the application can be applied to an automatic driving prediction system, and the prediction system can predict the driving intention and the predicted track of the other vehicle based on the road condition information, the historical driving route of the vehicle and other information.

In an embodiment of the present application, the prediction system may include a hardware circuit (such as an application SPECIFIC INTEGRATED circuit, ASIC), a field-programmable gate array (FPGA), a general purpose processor, a digital signal processor (DIGITAL SIGNAL processing, DSP), a microprocessor, or a microcontroller, etc.), or a combination of these hardware circuits, for example, the prediction system may be a hardware system having a function of executing instructions, such as a CPU, a DSP, etc., or a hardware system not having a function of executing instructions, such as an ASIC, an FPGA, etc., or a combination of the above hardware systems not having a function of executing instructions and a hardware system having a function of executing instructions.

Specifically, the prediction system may be a hardware system with an instruction execution function, and the position acquisition of the vehicle provided by the embodiment of the present application may be a software code stored in a memory, and the prediction system may acquire the software code from the memory and execute the acquired software code to implement the position acquisition of the vehicle provided by the embodiment of the present application.

It should be understood that the prediction system may be a combination of a hardware system that does not have an instruction execution function and a hardware system that has an instruction execution function, and that part of the steps of position acquisition of the vehicle provided in the embodiment of the present application may also be implemented by a hardware system that does not have an instruction execution function in the prediction system, which is not limited herein.

In the embodiment of the application, the prediction system can be deployed on a vehicle or a cloud side server, and then the prediction process of realizing the driving intention and the predicted track of the other vehicle by the prediction system is described by taking the deployment of the prediction system on the vehicle as an example and combining with a software and hardware module on the vehicle.

The vehicle in the embodiment of the application, for example, the target vehicle in the embodiment of the application, the related vehicles around the target vehicle, and the like may refer to an internal combustion engine vehicle having an engine as a power source, a hybrid vehicle having an engine and an electric motor as power sources, an electric vehicle having an electric motor as a power source, and the like.

In an embodiment of the present application, a vehicle may include an autopilot device 100 having an autopilot function.

Referring to fig. 1c, fig. 1c is a functional block diagram of an autopilot apparatus 100 with autopilot functionality provided by an embodiment of the present application. In one embodiment, the autopilot 100 may include various subsystems such as a travel system 102, a sensor system 104, a control system 106, one or more peripherals 108, as well as a power supply 110, a computer system 112, and a user interface 116. Alternatively, the autopilot 100 may include more or fewer subsystems, and each subsystem may include multiple elements. In addition, each of the subsystems and elements of the autopilot 100 may be interconnected by wires or wirelessly.

The travel system 102 may include components that provide powered movement of the autopilot 100. In one embodiment, the travel system 102 may include an engine 118, an energy source 119, a transmission 120, and wheels/tires 121. The engine 118 may be an internal combustion engine, an electric motor, an air compression engine, or other type of engine combination, such as a hybrid engine of a gasoline engine and an electric motor, or a hybrid engine of an internal combustion engine and an air compression engine. Engine 118 converts energy source 119 into mechanical energy.

Examples of energy sources 119 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity. The energy source 119 may also provide energy to other systems of the autopilot 100.

The transmission 120 may transmit mechanical power from the engine 118 to the wheels 121. The transmission 120 may include a gearbox, a differential, and a drive shaft. In one embodiment, the transmission 120 may also include other devices, such as a clutch. Wherein the drive shaft may comprise one or more axles that may be coupled to one or more wheels 121.

The sensor system 104 may include several sensors that sense information about the environment surrounding the autopilot 100. For example, the sensor system 104 may include a positioning system 122 (which may be a global positioning system (global positioning system, GPS) system, as well as a Beidou system or other positioning system), an inertial measurement unit (inertial measurement unit, IMU) 124, a radar 126, a laser rangefinder 128, and a camera 130. The sensor system 104 may also include sensors (e.g., in-vehicle air quality monitors, fuel gauges, oil temperature gauges, etc.) that monitor the internal systems of the autopilot 100. Sensor data from one or more of these sensors may be used to detect objects and their corresponding characteristics (location, shape, direction, speed, etc.). Such detection and identification is a critical function of the safe operation of the autonomous automatic driving apparatus 100.

The positioning system 122 may be used to estimate the geographic location of the autopilot 100. The IMU 124 is used to sense changes in the position and orientation of the autopilot 100 based on inertial acceleration. In one embodiment, the IMU 124 may be a combination of an accelerometer and a gyroscope.

The radar 126 may utilize radio signals to sense objects within the surrounding environment of the autopilot 100. In some embodiments, in addition to sensing an object, the radar 126 may be used to sense the speed and/or heading of the object.

The radar 126 may include an electromagnetic wave transmitting portion, a receiving portion. The radar 126 may be implemented in a pulse radar (pulse radar) system or a continuous wave radar (continuous WAVE RADAR) system in principle of electric wave emission. The radar 126 may be implemented as a frequency modulated continuous wave (frequency modulated continuous wave, FMCW) system or a frequency shift monitoring (FSK) system according to a signal waveform in a continuous wave radar system.

The radar 126 may detect an object based on a time of flight (TOF) method or a phase-shift (phase-shift) method with an electromagnetic wave as a medium, and detect a position of the detected object, a distance from the detected object, and a relative speed. In order to detect objects located in front of, behind, or to the side of the vehicle, the radar 126 may be disposed at an appropriate location outside of the vehicle. The laser radar 126 may detect an object based on a TOF system or a phase shift system using laser light as a medium, and detect a position of the detected object, a distance from the detected object, and a relative speed.

Alternatively, the lidar 126 may be disposed at an appropriate location on the exterior of the vehicle in order to detect objects located in front of, behind, or to the side of the vehicle.

The laser rangefinder 128 may utilize a laser to sense objects in the environment in which the autopilot device 100 is located. In some embodiments, laser rangefinder 128 may include one or more laser sources, a laser scanner, and one or more detectors, among other system components.

The camera 130 may be used to capture a plurality of images of the surroundings of the autopilot 100. The camera 130 may be a still camera or a video camera.

Alternatively, to acquire an external image of the vehicle, the camera 130 may be located at an appropriate position outside the vehicle. For example, in order to acquire an image of the front of the vehicle, the camera 130 may be disposed in the interior of the vehicle so as to be close to the front windshield. Or the camera 130 may be disposed around the front bumper or radiator grille. For example, in order to acquire an image of the rear of the vehicle, the camera 130 may be disposed in the vehicle interior in close proximity to the rear window glass. Or the camera 130 may be disposed around the rear bumper, trunk or tailgate. For example, in order to acquire an image of the side of the vehicle, the camera 130 may be disposed in the interior of the vehicle so as to be close to at least one of the side windows. Or the camera 130 may be disposed at a side mirror, a fender, or a door periphery.

In an embodiment of the present application, the road condition information, the historical driving route of the related vehicles around the target vehicle, and the like of the target vehicle may be obtained based on one or more sensors in the sensor system 104.

The control system 106 is configured to control the operation of the autopilot 100 and its components. The control system 106 may include various elements including a steering system 132, a throttle 134, a brake unit 136, a sensor fusion algorithm 138, a computer vision system 140, a route control system 142, and an obstacle avoidance system 144.

The steering system 132 is operable to adjust the direction of travel of the autopilot 100. For example, in one embodiment may be a steering wheel system.

The throttle 134 is used to control the operating speed of the engine 118 and thus the speed of the autopilot 100.

The brake unit 136 is used to control the reduction of the speed of the automatic driving device 100. The brake unit 136 may use friction to slow the wheel 121. In other embodiments, the braking unit 136 may convert the kinetic energy of the wheels 121 into electric current. The brake unit 136 may take other forms to slow the rotational speed of the wheels 121 to control the speed of the autopilot 100.

The computer vision system 140 may be operable to process and analyze images captured by the camera 130 to identify objects and/or features in the environment surrounding the autopilot device 100. The objects and/or features may include traffic signals, road boundaries, and obstacles. The computer vision system 140 may use object recognition algorithms, in-motion restoration structure (structure frommotion, SFM) algorithms, video tracking, and other computer vision techniques. In some embodiments, the computer vision system 140 may be used to map an environment, track objects, estimate the speed of objects, and so forth.

The route control system 142 is used to determine a travel route of the autopilot 100. In some embodiments, route control system 142 may incorporate data from sensor 138, positioning system 122, and one or more predetermined maps to determine a travel route for autopilot 100.

The obstacle avoidance system 144 is used to identify, evaluate, and avoid or otherwise traverse potential obstacles in the environment of the autopilot 100.

Of course, in one example, control system 106 may additionally or alternatively include components other than those shown and described. Or some of the components shown above may be eliminated.

Autopilot 100 interacts with external sensors, other autopilots, other computer systems, or users through peripherals 108. Peripheral devices 108 may include a wireless communication system 146, a vehicle computer 148, a microphone 150, and/or a speaker 152.

In some embodiments, the peripheral device 108 provides a means for a user of the autopilot 100 to interact with the user interface 116. For example, the vehicle computer 148 may provide information to a user of the autopilot 100. The user interface 116 is also operable with the vehicle computer 148 to receive user input. The vehicle computer 148 may be operated by a touch screen. In other cases, the peripheral device 108 may provide a means for the autopilot 100 to communicate with other devices located within the vehicle. For example, microphone 150 may receive audio (e.g., voice commands or other audio inputs) from a user of autopilot device 100. Similarly, speaker 152 may output audio to a user of autopilot 100.

The wireless communication system 146 may communicate wirelessly with one or more devices directly or via a communication network. For example, the wireless communication system 146 may use 3G cellular communication, such as code division multiple access (code division multiple access, CDMA), EVD0, global system for mobile communications (global system for mobile communications, GSM)/General Packet Radio Service (GPRS), or 4G cellular communication, such as long term evolution (long term evolution, LTE), or 5G cellular communication. The wireless communication system 146 may communicate with a wireless local area network (wireless local area network, WLAN) using WiFi. In some embodiments, the wireless communication system 146 may utilize an infrared link, bluetooth, or ZigBee to communicate directly with the device. Other wireless protocols, such as various autopilot communication systems, for example, the wireless communication system 146 may include one or more dedicated short range communication (DEDICATED SHORT RANGE COMMUNICATIONS, DSRC) devices, which may include public and/or private data communications between autopilots and/or roadside stations.

In one implementation, the information such as road condition information and historical driving track in the embodiment of the present application may be received by the vehicle from other vehicles or cloud-side servers through the wireless communication system 146.

When the prediction system is located at a server on the cloud side, the vehicle may receive driving intention information for the target vehicle, etc., transmitted from the server through the wireless communication system 146.

The power source 110 may provide power to various components of the autopilot 100. In one embodiment, the power source 110 may be a rechargeable lithium ion or lead acid battery. One or more battery packs of such batteries may be configured as a power source to provide power to the various components of the autopilot 100. In some embodiments, the power source 110 and the energy source 119 may be implemented together, such as in some all-electric vehicles.

Some or all of the functions of the autopilot 100 are controlled by a computer system 112. The computer system 112 may include at least one processor 113, the processor 113 executing instructions 115 stored in a non-transitory computer-readable medium such as memory 114. The computer system 112 may also be a plurality of computing devices that control individual components or subsystems of the autopilot 100 in a distributed manner.

The processor 113 may be any conventional processor, such as a commercially available central processing unit (central processing unit, CPU). Alternatively, the processor may be a special purpose device such as an Application SPECIFIC INTEGRATED Circuits (ASIC) or other hardware-based processor. Although FIG. 1c functionally illustrates a processor, memory, and other elements of computer 110 in the same block, it will be understood by those of ordinary skill in the art that the processor, computer, or memory may in fact comprise a plurality of processors, computers, or memories that may or may not be stored within the same physical housing. For example, the memory may be a hard disk drive or other storage medium located in a different housing than computer 110. Thus, references to a processor or computer will be understood to include references to a collection of processors or computers or memories that may or may not operate in parallel. Rather than using a single processor to perform the steps described herein, some components, such as the steering component and the retarding component, may each have their own processor that performs only calculations related to the component-specific functions.

In various aspects described herein, the processor may be located remotely from and in wireless communication with the autopilot. In other aspects, some of the processes described herein are performed on a processor disposed within the autopilot and others are performed by a remote processor, including taking the necessary steps to perform a single maneuver.

In some embodiments, the memory 114 may contain instructions 115 (e.g., program logic) that the instructions 115 may be executed by the processor 113 to perform various functions of the autopilot 100, including those described above. The memory 114 may also contain additional instructions, including instructions to send data to, receive data from, interact with, and/or control one or more of the travel system 102, the sensor system 104, the control system 106, and the peripherals 108.

In addition to instructions 115, memory 114 may also store data such as road maps, route information, position, direction, speed of autopilot, and other such autopilot data, as well as other information. Such information may be used by the autopilot device 100 and the computer system 112 during operation of the autopilot device 100 in autonomous, semi-autonomous, and/or manual modes.

The method for acquiring the position of the vehicle provided by the embodiment of the application may be a software code stored in the memory 114, and the processor 113 may acquire the software code from the memory and execute the acquired software code to implement the method for acquiring the position of the vehicle provided by the embodiment of the application. After obtaining the intent of the target vehicle, the intent of the vehicle may be communicated to the control system 106, and the control system 106 may make a determination of the vehicle driving strategy based on the intent of the vehicle.

A user interface 116 for providing information to or receiving information from a user of the autopilot 100. Optionally, the user interface 116 may include one or more input/output devices within the set of peripheral devices 108, such as a wireless communication system 146, a vehicle computer 148, a microphone 150, and a speaker 152.

The computer system 112 may control the functions of the autopilot 100 based on inputs received from various subsystems (e.g., the travel system 102, the sensor system 104, and the control system 106) and from the user interface 116. For example, the computer system 112 may utilize inputs from the control system 106 to control the steering unit 132 to avoid obstacles detected by the sensor system 104 and the obstacle avoidance system 144. In some embodiments, the computer system 112 is operable to provide control over many aspects of the autopilot 100 and its subsystems.

Alternatively, one or more of these components may be mounted separately from or associated with the autopilot 100. For example, the memory 114 may exist partially or completely separate from the autopilot 100. The above components may be communicatively coupled together in a wired and/or wireless manner.

Alternatively, the above components are only an example, and in practical applications, components in the above modules may be added or deleted according to actual needs, and fig. 1c should not be construed as limiting the embodiments of the present application.

Referring to fig. 2a, an embodiment of the present application provides a system architecture 200a. The system architecture includes a database 230a and a client device 240a. The data collection device 260a is configured to collect data and store the data in the database 230a, and the training module 202a generates the target model/rule 201a based on the data maintained in the database 230 a. How the training module 202a obtains the target model/rule 201a based on the data, the target model/rule 201a being the first model mentioned in the following embodiments of the present application, will be described in more detail below.

The calculation module 211a may include a training module 202a, and the target model/rule obtained by the training module 202a may be applied to different systems or devices. In fig. 2a, the executing device 210a configures a transceiver 212a, where the transceiver 212a may be a wireless transceiver, an optical transceiver, or a wired interface (e.g., I/O interface), etc., to interact with external devices, and a "user" may input data to the transceiver 212a through the client device 240a, e.g., the client device 240a may send a target task to the executing device 210a requesting the executing device to train a neural network, and send a database for training to the executing device 210 a.

The execution device 210a may call data, code, etc. in the data storage system 250a, or may store data, instructions, etc. in the data storage system 250 a.

The calculation module 211a processes the input data using the target model/rule 201 a. Specifically, the calculation module 211a is configured to: first, acquiring first information and second information, wherein the first information comprises information of vehicles around a self-vehicle, and the second information comprises information of lanes around the self-vehicle; and then, inputting the first information and the second information into the first model to obtain prediction information generated by the first model, wherein the prediction information comprises the predicted position information of the vehicles around the vehicle in the first time.

Finally, transceiver 212a returns the output of the neural network to client device 240a. For example, the user may input a text to be converted into a sign language action through the client device 240a, output the sign language action or a parameter representing the sign language action through the neural network, and feed back to the client device 240a.

Further, the training module 202a may obtain corresponding target models/rules 201a based on different data for different tasks to provide better results to the user.

In the case shown in fig. 2a, the data in the input execution device 210a may be determined from the input data of the user, for example, the user may operate in an interface provided by the transceiver 212 a. In another case, the client device 240a may automatically input data to the transceiver 212a and obtain the result, and if the client device 240a needs to obtain the authorization of the user from the automatic input data, the user may set the corresponding authority in the client device 240 a. The user may view the results output by the execution device 210a at the client device 240a, and in particular, the presentation may be in the form of a display, sound, action, or the like. The client device 240a may also act as a data collection terminal to store the collected data associated with the target task in the database 230a.

The training or updating process referred to in the present application may be performed by the training module 202 a. It will be appreciated that the training process of the neural network learns the manner in which the spatial transformations are controlled, and more particularly the weight matrix. The objective of training the neural network is to make the output of the neural network as close as possible to the expected value, so the weight vector of each layer of the neural network can be updated by comparing the predicted value and the expected value of the current network and according to the difference between the predicted value and the expected value (of course, the weight vector can be initialized before the first update, that is, the pre-configured parameters of each layer in the deep neural network). For example, if the predicted value of the network is too high, the values of the weights in the weight matrix are adjusted to decrease the predicted value, and the adjustment is continued until the value output by the neural network approaches or equals the desired value. Specifically, the difference between the predicted and expected values of the neural network may be measured by a loss function (loss function) or an objective function (objective function). By way of example of a loss function, training of a neural network can be understood as a process that reduces loss as much as possible, with higher output values (loss) of the loss function indicating greater differences.

As shown in fig. 2a, a target model/rule 201a is obtained by training according to a training module 202a, and the target model/rule 201a may be the first model in the present application in the embodiment of the present application.

Wherein database 230a may be used to store a sample set for training during the training phase. The execution device 210a generates a target model/rule 201a for processing the sample and iteratively trains the target model/rule 201a with a set of samples in a database to obtain a mature target model/rule 201a, the target model/rule 201a being embodied as a neural network. The neural network obtained by executing the device 210a may be applied to different systems or devices.

In the inference phase, the execution device 210a may call data, code, etc. in the data storage system 250a, or may store data, instructions, etc. in the data storage system 250 a. The data storage system 250a may be disposed in the execution device 210a, or the data storage system 250a may be an external memory with respect to the execution device 210 a. The calculation module 211a may process the samples acquired by the execution device 210a through the neural network to obtain a prediction result, where a specific expression form of the prediction result is related to a function of the neural network.

It should be noted that fig. 2a is only an exemplary schematic diagram of a system architecture according to an embodiment of the present application, and the positional relationship between devices, apparatuses, modules, etc. shown in the drawings is not limited in any way. For example, in FIG. 2a, data storage system 250a is external memory to execution device 210a, and in other scenarios, data storage system 250a may be located within execution device 210 a.

The target model/rule 201a obtained by training according to the training module 202a may be applied to different systems or devices, such as a mobile phone, a tablet computer, a notebook computer, an augmented reality (augmented reality, AR)/Virtual Reality (VR), a vehicle-mounted terminal, etc., and may also be a server or cloud device, etc.

Specifically, in one possible implementation manner, please refer to fig. 2b, fig. 2b is a flowchart of a method for obtaining a position of a vehicle according to an embodiment of the present application. The method may be performed by an execution device 210a as shown in fig. 2 a. The method specifically comprises the following steps: 201b, acquiring first information and second information, wherein the first information comprises information of vehicles around the self-vehicle, and the second information comprises information of lanes around the self-vehicle; 202b, inputting the first information and the second information into the first model to obtain prediction information generated by the first model, wherein the prediction information comprises the predicted position information of vehicles around the vehicle in the first time.

The application architecture of the embodiment of the present application is described above in conjunction with the description, and the method for acquiring the position of the vehicle provided by the embodiment of the present application is described in detail below.

First, in order to better understand the aspects of the embodiments of the present application, related terms and concepts that may be related to the embodiments of the present application are described below.

(1) Neural network

A neural network may be composed of neural units, and is understood to mean, in particular, a neural network having an input layer, an hidden layer, and an output layer, where in general, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. Among them, the neural network with many hidden layers is called deep neural network (deep neural network, DNN). The operation of each layer in the neural network can be described by a mathematical expression, and from the physical level, the operation of each layer in the neural network can be understood as the conversion of an input space (a set of input vectors) into an output space (i.e., a row space to a column space of a matrix) by five operations including: 1. dimension increasing/decreasing; 2. zoom in/out; 3. rotating; 4. translating; 5. "bending". Wherein operations 1,2, 3 are completed, operations 4 are completed by "+b", and operations 5 are implemented by "a ()". The term "spatial" is used herein because the object being classified is not a single thing, but a class of things, and space refers to the collection of all individuals of such things, where W is the weight matrix of the layers of the neural network, each value in the matrix representing the weight value of one neuron of that layer. The matrix W determines the spatial transformation of the input space into the output space described above, i.e. the W of each layer of the neural network controls how the space is transformed. The purpose of training the neural network is to finally obtain the weight matrix of all layers of the trained neural network. The training process of the neural network is thus essentially a way to learn to control the spatial transformation, more specifically to learn the weight matrix.

(2) Convolutional neural network

The convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of a convolutional layer and a sub-sampling layer. The feature extractor can be seen as a filter and the convolution process can be seen as convolving the same trainable filter with an input image or a convolution feature plane (feature map). The convolution layer refers to a neuron layer in the convolution neural network, which performs convolution processing on an input signal. In the convolutional layer of the convolutional neural network, one neuron may be connected with only a part of adjacent layer neurons. A convolutional layer typically contains a number of feature planes, each of which may be composed of a number of neural elements arranged in a rectangular pattern. Neural elements of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights can be understood as the way image information is extracted is independent of location. The underlying principle in this is: the statistics of a certain part of the image are the same as other parts. I.e. meaning that the image information learned in one part can also be used in another part. The same learned image information can be used for all locations on the image. In the same convolution layer, a plurality of convolution kernels may be used to extract different image information, and in general, the greater the number of convolution kernels, the more abundant the image information reflected by the convolution operation.

The convolution kernel can be initialized in the form of a matrix with random size, and reasonable weight can be obtained through learning in the training process of the convolution neural network. In addition, the direct benefit of sharing weights is to reduce the connections between layers of the convolutional neural network, while reducing the risk of overfitting.

(3) Deep neural network

Deep neural networks (Deep Neural Network, DNN), also known as multi-layer neural networks, can be understood as neural networks having many hidden layers, many of which are not particularly metrics. From DNNs, which are divided by the location of the different layers, the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression: wherein/> Is an input vector,/>Is an output vector,/>Is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer simply operates on the input vector to obtain the output vector, and the coefficient W and the offset vector/>, because of the number of DNN layersAnd thus a large number. The parameters are defined in DNN as follows: taking the coefficient W as an example: it is assumed that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as/>The superscript 3 represents the number of layers in which the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

The summary is: the coefficients of the kth neuron of the L-1 th layer to the jth neuron of the L-1 th layer are defined as

It should be noted that the input layer is devoid of W parameters. In deep neural networks, more hidden layers make the network more capable of characterizing complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the greater the "capacity", meaning that it can accomplish more complex learning tasks. The process of training the deep neural network, i.e. learning the weight matrix, has the final objective of obtaining a weight matrix (a weight matrix formed by a number of layers of vectors W) for all layers of the trained deep neural network.

(4) Loss function (loss function)

In the process of training the neural network, because the output of the neural network is expected to be as close to the value actually expected, the weight matrix of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the actually expected target value (of course, there is usually an initialization process before the first update, that is, the parameters are preconfigured for each layer in the neural network), for example, if the predicted value of the network is higher, the weight matrix is adjusted to be lower than the predicted value, and the adjustment is continuously performed until the neural network can predict the actually expected target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function or an objective function, which is an important equation for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and the training of the neural network becomes the process of reducing the loss as much as possible. For example, in a classification task, the loss function is used to characterize the gap between the predicted class and the true class, and the cross entropy loss function (cross entropy loss) is the loss function commonly used in classification tasks.

In the training process of the neural network, an error Back Propagation (BP) algorithm can be adopted to correct the size of parameters in an initial neural network model, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the input signal is forwarded until the output is generated with error loss, and the parameters in the initial neural network model are updated by back-propagating the error loss information, so that the error loss converges. The back propagation algorithm is a back propagation motion that dominates the error loss, and aims to obtain parameters of the optimal neural network model, such as a weight matrix.

(5) Transformer structure

The transducer structure is a feature extraction network (classified as convolutional neural network) that includes an encoder and a decoder.

An encoder: feature learning, such as the features of pixels, is performed under a global receptive field by means of self-attention.

A decoder: features of the desired module, such as features of the output box, are learned by self-attention and cross-attention.

(6) Attention mechanism (attention mechanism)

The attention mechanism mimics the internal process of biological observation behavior, i.e., a mechanism that aligns internal experience with external sensations to increase the observation finesse of a partial region, enabling rapid screening of high value information from a large amount of information with limited attention resources. Attention mechanisms can quickly extract important features of sparse data and are thus widely used for natural language processing tasks, particularly machine translation. While the self-attention mechanism (self-attention mechanism) is an improvement of the attention mechanism, which reduces reliance on external information, and is more adept at capturing internal dependencies of data or features. The essential idea of the attention mechanism can be rewritten as the following formula:

The self-attention mechanism provides an efficient way of modeling to capture global context information through QKV. The context is stored in key-value pairs (K, V), assuming the input is Q (query). Then the attention mechanism is in essence a mapping function on a sequence of key-value pairs (keys). The nature of the attention function can be described as a mapping of a query to a series of key-value pairs. attention essentially assigns a weight coefficient to each element in the sequence, which can also be understood as soft addressing. If each element in the sequence is stored in (K, V), attention completes addressing by computing the similarity of Q and K. The calculated similarity of Q and K reflects the importance degree, namely the weight, of the extracted V value, and then the final characteristic value is obtained through weighted summation.

The attention calculation is mainly divided into three steps, wherein the first step is to calculate the similarity between the query and each key to obtain weight, and the common similarity functions comprise dot product, splicing, perceptron and the like; the second step is then typically to normalize these weights using a softmax function (normalization can be performed on the one hand to obtain a probability distribution with a sum of all weight coefficients of 1. Weights of important elements can be highlighted by the characteristics of the softmax function on the other hand); and finally, carrying out weighted summation on the weight and the corresponding key value to obtain the final characteristic value.

In addition, attention includes self-attention and cross-attention, self-attention being understood to be a special attention, i.e. the input of QKV is consistent. While the inputs to QKV in cross-attention are inconsistent. Attention is directed to integrating queried features as updated values for current features using the degree of similarity (e.g., inner product) between the features as a weight. Self-attention is attention drawn based on the attention of the feature map itself.

(7) Multi-Layer Perceptron (Multi-Layer Perceptron, MLP)

The multi-layer perceptron, which can also be called a multi-layer perceptron, is a feedforward artificial neural network model. MLP is AN artificial neural network (ARTIFICIAL NEURAL NETWORK, ANN) based on a full-Connected (FC) forward structure, which contains from tens to hundreds to thousands of artificial neurons (ARTIFICIAL NEURON, AN, hereinafter referred to as neurons). The MLP organizes neurons into a multi-layer structure, the layers adopt a full-connection method to form an ANN of multi-weight connection layers connected layer by layer, the basic structure is shown in figure 3, each full-connection layer containing calculation of the MLP is numbered from 1, the total layer number is L, the input layer number is set to 0, and the full-connection layer of the MLP is divided into two major types of odd-numbered layers and even-numbered layers. Generally, an MLP comprises an input layer (which layer does not actually contain operations), one or more hidden layers, and an output layer.

(8) Features, labels and samples

Features refer to input variables, i.e., x-variables in simple linear regression, a simple machine learning task may use a single feature, while a more complex machine learning task may use millions of features.

Tags are y variables in simple linear regression, and tags can include a variety of meanings. In some embodiments of the application, a tag may refer to a classification category of input data. By tagging each of the different categories of data entered, the tag is used to indicate to the computing device the specific information represented by the data. Thus, tagging data tells the computing device what the various features of the input variable describe (i.e., y), which may be referred to as label or target (i.e., target value).

A sample refers to a specific example of data, a sample x represents an object, the sample x is usually represented by a feature vector x= (x 1, x2, …, xd) ∈rd, where d represents the dimension (i.e. the number of features) of the sample x, the sample is divided into a labeled sample and an unlabeled sample, the labeled sample contains features and labels at the same time, the unlabeled sample contains features but does not contain labels, and the task of machine learning is often just to learn a potential pattern in an input d-dimensional training sample set (which may be simply referred to as a training set).

(9) Back propagation algorithm

(10) Backbone network (Backbone)

Network structure for feature extraction of input information in detectors, splitters or classifiers, etc. In general, in addition to the backbone network, other functional networks may be included in the neural network, such as a region generation network (region proposal network, RPN), a feature pyramid network (feature Pyramid network, FPN), and the like, for further processing the features extracted from the backbone network, such as classification of identification features, semantic segmentation of features, and the like.

(11) Matrix multiplication operation (MatMul)

Matrix multiplication is a binary operation that yields a third matrix, the product of the first two, also commonly referred to as the matrix product, from two matrices. The matrix may be used to represent a linear mapping and the matrix product may be used to represent a composite of the linear mapping.

(12) Normalization function

The normalization (softmax) function is also called normalized exponential function, and is a generalization of logic functions. The softmax function can transform one K-dimensional vector Z containing arbitrary real numbers into another K-dimensional vector σ (Z) such that each element in the transformed vector σ (Z) ranges between (0, 1) and the sum of all elements is 1. The softmax function may be calculated as shown in equation one.

Where σ (Z) _j represents the value of the jth element in the vector transformed by the softmax function, Z _j represents the value of the jth element in the vector Z, Z _k represents the value of the kth element in the vector Z, and Σ represents the summation.

(13) Embedded layer (embedding layer)

The embedded layer may be referred to as an input embedded (input embedding) layer. The current input may be a text input, for example, a text segment, or a sentence. The text can be Chinese text, english text, or other language text. After the embedding layer acquires the current input, the embedding layer can embed each word in the current input, and the feature vector of each word can be obtained. In some embodiments, the embedded layers include an input embedded layer and a position-coding (positional encoding) layer. In the input embedding layer, word embedding processing can be performed on each word in the current input, so that word embedding vectors of each word are obtained. The position encoding layer may obtain the position of each word in the current input, and then generate a position vector for the position of each word. In some examples, the location of each word may be the absolute location of each word in the current input. When the word embedding vector and the position vector of each word in the current input are obtained, the position vector of each word and the corresponding word embedding vector can be combined to obtain each word feature vector, and then a plurality of feature vectors corresponding to the current input are obtained. The plurality of feature vectors may be represented as embedded vectors having a predetermined dimension. The number of feature vectors in the plurality of feature vectors may be set to be M, and the preset dimension may be set to be H, and the plurality of feature vectors may be represented as m×h embedded vectors.

The following describes a method for acquiring a position of a vehicle according to an embodiment of the present application. The method may be performed by a position acquisition device of the vehicle or by a component of the position acquisition device of the vehicle (e.g., a processor, chip, or chip system, etc.). The vehicle position obtaining device may be a cloud device, a vehicle or a terminal device (for example, a vehicle terminal, an airplane terminal, etc.), etc. Of course, the method may also be performed by a system composed of the cloud device and the vehicle. Alternatively, the method may be processed by a CPU in the location acquisition device of the vehicle, or may be processed by both the CPU and the GPU, or may not use the GPU, and other suitable processors for neural network computation may be used, which is not limited by the present application. The application scenario of the method can be used for intelligent driving scenarios.

In combination with the above description, a description will be given below of a specific implementation flow of the reasoning stage and training stage of the vehicle position method provided by the embodiment of the present application.

1. Inference phase

In the embodiment of the present application, the reasoning stage describes how the executing device 210a processes the collected information data to generate the prediction result by using the target model/rule 201a, specifically please refer to fig. 4, fig. 4 is another flow chart of the method for obtaining the position of the vehicle provided in the embodiment of the present application, fig. 4 illustrates that the embodiment of the present application is applied to the autopilot field, and the method may include steps 401 to 403.

The execution device acquires 401 the first information and the second information.

In the embodiment of the application, the first information comprises information of vehicles around the own vehicle, the second information comprises information of lanes around the own vehicle, and the execution equipment acquires the information of the vehicles and the lanes around the own vehicle. Specifically, in some application scenarios, the execution device may be a vehicle, and the vehicle may directly collect vehicle and lane information around the vehicle through the collection device, such as an image capturing device, a radar device, and the like. The executing device may also adopt a mode of receiving information sent by other external devices, or a mode of selecting information from a database, etc., which is not limited herein.

In one implementation, in order to accurately predict whether other surrounding vehicles affect the driving safety of the vehicle, whether the driving decision of the vehicle will be affected, and how to control the driving strategy of the vehicle based on the surrounding vehicles, it is necessary to determine the driving intention of at least one associated vehicle located around the vehicle. The target vehicle in the embodiment of the application is any one of at least one associated vehicle positioned around the own vehicle.

It should be understood that the above-mentioned "associated vehicles" may be understood as vehicles within a certain preset range from the own vehicle in terms of distance, that is, determining which vehicles have an association relationship with the own vehicle based on the distance, and using those vehicles having an association relationship as the associated vehicles of the own vehicle; in addition, "associated vehicles" can also be understood as vehicles which influence the driving state decision of the vehicle in the future, i.e. which vehicles have an associated relationship with the vehicle based on whether the driving strategy of the vehicle will be influenced in the future, and these vehicles having an associated relationship are regarded as associated vehicles of the vehicle.

In one implementation, the processor of the host vehicle may control the sensors associated with the host vehicle to acquire vehicle information and lane information for surrounding vehicles based on software code associated with step 401 in memory 114, and determine which vehicles are associated vehicles, i.e., which vehicles require intent prediction, based on the acquired information.

Or the above-described process of determining the target vehicle may be determined by other vehicles or a server on the cloud side, which is not limited herein.

In order to clearly predict the future driving intention of the target vehicle, vehicle information, lane information and the like of the target vehicle need to be acquired, and the information can be used as a basis for predicting the driving intention of the target vehicle. The position of the target vehicle may be an absolute position of the target vehicle in the map, or may be a relative position with the own vehicle, and the absolute position of the target vehicle may be determined based on the absolute position of the own vehicle and the relative position between the target vehicle and the own vehicle.

Taking an associated vehicle as an example of a target vehicle, in the embodiment of the present application, driving state information of the target vehicle may be obtained, where the driving state information may include a position of the target vehicle, specifically, the position of the target vehicle may be sensed by a sensor carried by an own vehicle, or the position of the target vehicle may be obtained through interaction with other vehicles and a cloud side server.

Taking the associated vehicle as the target vehicle as an example, in one implementation, the position of the target vehicle may be obtained in real time, or the position of the target vehicle may be obtained once every a period of time.

In one possible implementation, after the first information and the second information are acquired, the first information and the second information need to be preprocessed and labeled, and the data after the preprocessing and the labeling can be used as input data of the first model. The preprocessing includes basic data processing operations such as outlier processing of data, and the like, which are not described herein.

Illustratively, the main steps of acquiring and processing the first information and the second information are as follows:

(1) First information and second information are collected.

The first information includes information of vehicles around the own vehicle. The information of the vehicle includes 8 pieces of characteristic data, which are respectively the abscissa, the ordinate, the type, the length, the width, the height, the current speed, and the direction of the speed (i.e., the direction in which the vehicle advances). The specific acquisition mode can be as follows: and obtaining vector data of one frame at each interval of 0.2s, obtaining vector data of ten frames in the history of 2s, and adding the vector data of the current frame to obtain vector data of 11 frames in total, wherein the vector data collected by each frame comprises 8 features. Assuming that only 64 data are taken, the data of the object with a longer distance are deleted according to the distance between the vehicles around the own vehicle and the own vehicle. The data collected for the vehicle can be seen in particular in table 1 below.

The second information includes information of lanes around the own vehicle. The information of the lane includes 8 pieces of characteristic data. And 20 road points are taken from each lane around the vehicle, each road point corresponds to 8 characteristic data, namely an abscissa and an ordinate of the road point, the type of the lane (such as a non-motor vehicle lane and a motor vehicle lane), whether the lane can go straight, whether the lane can turn left, whether the lane can turn right, whether the lane can turn around, and the number of the lane. The specific acquisition mode can be as follows: the characteristics of all lanes within 200 meters around the vehicle are collected, 20 road points are taken from each lane, and 8 characteristics are taken from each road point. Assuming that only 256 lanes of data are taken, the data of the objects with a longer distance are deleted by sorting the data according to the distance between the lanes around the own vehicle and the own vehicle. In one implementation, the road point may be selected on the lane in a most distant point sampling manner. The data of the collected lanes can be seen in particular in table 1 below.

TABLE 1

It is to be understood that, in the actual operation, the characteristic data included in the information of the vehicle around the own vehicle and the information of the lane may be set according to the actual demand, and is not limited herein. In addition, in the present embodiment, only the position of the vehicle around the host vehicle is predicted, so that only the information of the vehicle around the host vehicle is collected, and in real life, the host vehicle around also includes obstacles such as non-motor vehicles and pedestrians, and on the basis of the information, the information of the non-motor vehicles and pedestrians collected around the host vehicle can also be predicted by adopting the position acquisition method provided by the embodiment of the present application.

(2) Labeling process

In the embodiment of the application, the track of the vehicle running in the first time in the future is actually obtained, and the track data are labeled for the use of the subsequent training stage. The category label includes:

Track label: representing the actual driving track of the target vehicle in the first time. For example, the real track of the future 3s is acquired every 0.2s for the target vehicle's position, 15 points total, each position including x and y coordinates, and the data of (15, 2) is output.

Crossing tags: indicating the exit lane selected when the target vehicle leaves the intersection.

Non-intersection tags: indicating a non-exit lane in which the target vehicle is located within a first time.

The executing device inputs the first information and the second information into the first model to obtain prediction information generated by the first model, wherein the prediction information comprises predicted position information of vehicles around the vehicle in a first time.

In the embodiment of the application, after the execution device acquires the information of the vehicles and the lanes around the own vehicle, the information can be input into the first model, so that the predicted position information of any vehicle around the own vehicle can be predicted and obtained through the first model according to the information of any vehicle around the own vehicle and the information of the lanes.

In one implementation, the first model includes an attention-based encoder (endoder) and a decoder (decoder). Referring to fig. 5, fig. 5 is a schematic structural diagram of a first model according to an embodiment of the application. In fig. 5, the execution device inputs the acquired first information and second information into the first model, and outputs prediction information based on the structure of the encoder and decoder of the attention mechanism.

The following describes a specific implementation procedure of the embodiment of the present application:

First, the specific content of the prediction information will be described.

In one implementation, the predicted location information includes predicted trajectory information of vehicles around the host vehicle at a first time and third information indicating a lane in which the vehicles around the host vehicle are located at the first time.

In this implementation manner, the prediction information obtained by the execution device from the first model may include two aspects, that is, track information of vehicles around the own vehicle in a future first time, and the second aspect refers to third information, which indicates a lane in which the vehicles around the own vehicle are located in the future first time. In one implementation, the third information includes a degree of association of a target vehicle around the host vehicle with at least one lane around the host vehicle within a first time, the target vehicle being one vehicle around the host vehicle.

It can be understood that the execution device can collect information of a plurality of vehicles around the vehicle at the same time and output prediction information of a plurality of vehicles, and when the prediction information of a certain target vehicle needs to be obtained, the execution device can directly obtain the prediction information of a plurality of vehicles. The association degree in the third information specifically refers to the attention score of the target vehicle and the lane around the own vehicle in the first time, and the attention score of the vehicle around the own vehicle relative to each lane around the own vehicle can be obtained by inputting the first information and the second information into the first model based on the relevant operation of the attention mechanism.

Next, a specific process of generating prediction information from the first model of the first information and the second information will be described.

Referring to fig. 6, fig. 6 is a schematic diagram of another structure of the first model according to the embodiment of the application.

In one implementation, an encoder in a first model includes an embedding module and an attention module, and a decoder includes a first decoder module and a second decoder module. The modules are described in detail below.

1. Encoder with a plurality of sensors

The encoder includes an embedding module and an attention module.

(1) Embedded module

The embedding module comprises a first embedding module and a second embedding module.

In the embodiment of the application, the execution device inputs the first information and the second information into the embedding module, and three different weight matrixes, namely a matrix Q, a matrix K and a matrix V, are obtained after embedding embedding the input sequence.

In one implementation, the first information is processed by the first embedding module, where the first embedding module includes three sub-modules, that is, a first sub-module, a second sub-module, and a third sub-module.

In this implementation manner, referring to fig. 7 specifically, fig. 7 is a schematic structural diagram of a first embedding module according to an embodiment of the present application. The first submodule specifically comprises a two-dimensional convolution layer Conv2d1, a two-dimensional batch normalization layer BatchNorm d, an activation function layer ReLU and a two-dimensional convolution layer Conv2d2. The second sub-module and the third sub-module have the same composition and comprise a one-dimensional convolution layer Conv1d1, a one-dimensional batch normalization layer BatchNorm d, an activation function layer ReLU and a one-dimensional convolution layer Conv1d2. The convolution kernel size kernel_size of the convolution layer Conv2d is (1, 1), and the step size stride and the zero padding are default values. ReLU (x) is a nonlinear activation function, which is specifically formed by: reLU (x) =max (0, x). Where x represents the input variable of the function.

Illustratively, in connection with what is exemplified in table 1 above, the processing of the first embedded module is specifically:

Vehicle data (16,64,11,8), position data (16,64,2) and time data (16, 11) are obtained from the first information, and the vehicle data, the position data and the time data are respectively input into the first sub-module, the second sub-module and the third sub-module, and data of (16,64,11,256), (16,64,1,256) and (16,1,11,256) are respectively output. The three data are then fused by embedding method, outputting (16,64,11,256) the data, i.e. matrix Q in the form of data (16,64,11,256).

Where 16 denotes the batch size,64 denotes the number of vehicles surrounding the own vehicle (i.e., the data amount in table 1), 11 denotes the time or the number of times data is collected, and 2 denotes the position of the vehicle (i.e., the abscissa and the ordinate shown in table 1). In popular sense, the information of 64 vehicles around the vehicle is first information, and for example, any vehicle is taken as an example, the data of the vehicle is acquired 11 times at 11 times intervals, and each acquired data includes a plurality of feature data (corresponding to 8 features in table 1).

It should be understood that the feature data corresponding to the vehicles around the vehicle may be set according to the actual requirement or test, and is only illustrative and not limiting herein.

In one implementation, the embedding module processes the second information through a second embedding module, the second embedding module including a fourth sub-module.

In this implementation manner, referring to fig. 8 specifically, fig. 8 is a schematic structural diagram of a second embedded module according to an embodiment of the present application. The fourth submodule specifically comprises a two-dimensional convolution layer Conv2d1, a two-dimensional batch normalization layer BatchNorm d, an activation function layer ReLU and a two-dimensional convolution layer Conv2d2. The convolution kernel size kernel_size of the convolution layer Conv2d is (1, 1), and the step size stride and the zero padding are default values.

Illustratively, the processing of the second embedded module is described in detail in connection with what is illustrated in Table 1 above. The second information is in the form of data (16,256,20,8) and is output (16,256,11,256) as a matrix K and thus as a matrix V by being input into the fourth sub-module. As can be seen from table 1 above, 16 means batch size,256 means the number of lanes, 20 means 20 waypoint data on each lane, and 8 means 8 feature data corresponding to each waypoint data. Colloquially, it is understood that the information taken from 256 lanes around the vehicle is the second information, taking any one lane as an example, taking 20 waypoints data on that lane, each waypoint data comprising 8 feature data.

It will be appreciated that the data of the lanes around the vehicle may be set according to actual needs or tests, and are merely illustrative and not limiting.

It should be noted that Embedding is a mapping method, the convolution and RELU activation function method adopted in the embedding module is only one implementation, and other general embedding methods may be adopted to perform embedding processing on the first information and the second information in an actual operation process.

(2) Attention module

The calculation of the attention module mainly comprises three steps, namely, a first step, performing similarity calculation on the query and each key to obtain weight, and performing dot product, splicing, perceptron and the like on common similarity functions; second, normalize the weights using a normalized softmax function; and thirdly, carrying out weighted summation on the weights and the corresponding key value values to obtain the final characteristic value.

It will be appreciated that the primary roles of the normalization function include: on the one hand, probability distribution with the sum of all weight coefficients being 1 can be obtained through normalization, score is converted into a matrix with the value distributed between 0 and 1 through a normalization function, and the obtained result is the correlation of each lane to the current vehicle; on the other hand, the weights of the important elements can be highlighted by the intrinsic mechanism of softmax. In addition, normalization can stabilize the gradient during training.

In one implementation, the first information and the second information are input into the first model, fourth information is generated based on an attention mechanism, the fourth information comprises the association degree of a target vehicle around the own vehicle and a first lane set in a first time, the target vehicle is one vehicle around the own vehicle, and the first lane set comprises all lanes around the own vehicle included in the second information.

In connection with the contents of fig. 6 to 8, in this implementation, the matrix Q, the matrix K and the matrix V are taken as inputs of the attention module, and optionally, the matrix Q, the matrix K and the matrix V are mapped linearly, respectively, to obtain a first linear matrix and a second linear matrix. Then, the first linear matrix is used as a matrix Q, the second linear matrix is used as a matrix K and a matrix V, the matrix Q, the matrix K and the matrix V are used as inputs of the attention module, correlation between every two input vectors is calculated through the matrix Q and the matrix K, namely, the correlation degree between a target vehicle around a vehicle and a first lane set in the first time, namely, attention scores, and fourth information is output. Then, optionally, by performing a matrix multiplication operation on the second linear matrix and the fourth information, sixth information is obtained, the sixth information including predicted track information of the target vehicle around the own vehicle.

It will be appreciated that in embodiments of the present application, the primary structure of the first model is the encoder-decoder structure using an attention-based mechanism. In the actual operation process, in order to improve the accuracy of the prediction result and the safety of the execution device, the prediction can be performed based on a multi-attention mechanism by a redundancy mode of the first model. In addition, it will be apparent to those skilled in the art that other neural networks may be used in place of the first model, provided that the function of the first model is achieved, and the specific structure and composition of the first model is merely illustrative and not limiting.

2. Decoder

The decoder includes a first decoder module and a second decoder module.

(1) First decoder module

The first decoder module is mainly used for processing the predicted lane information around the own vehicle.

In one implementation, the primary working procedure of the first decoder module is:

And obtaining fifth information from the fourth information, and generating third information according to the fifth information, wherein the fifth information comprises the association degree of the target vehicle and the second lane set in the first time, and the third information comprises the association degree of the target vehicle around the own vehicle and at least one lane around the own vehicle in the first time.

In the implementation manner, after the fourth information, namely the attention score of the target vehicle relative to each lane around the own vehicle, is obtained, part of data is screened out through the scene where the target vehicle is located, so that the accuracy of the prediction result is improved. Referring to fig. 9, fig. 9 is a schematic structural diagram of a first decoder module according to an embodiment of the application. Assuming that the road scene to which the target vehicle belongs is an intersection scene, the condition that the vehicle can run on an intersection lane in the first time in the future is explained, and then the lanes in the non-intersection scene can be eliminated. Thus, a second set of lanes may be selected from the first set of lanes, setting the attention score of the target vehicle relative to all non-intersecting lanes to a minimum value. After screening out the second set of lanes, fifth information, namely the attention score of the target vehicle relative to each lane screened, is obtained from the fourth information.

It will be appreciated that the category of the road scene to which the target vehicle belongs may be obtained directly through a map, or may be obtained through other manners, which is not limited herein.

In one implementation, the specific process of generating the third information according to the fifth information may include:

And carrying out normalization operation on the fifth information to obtain normalized fifth information, and then inputting the normalized fifth information into the multi-layer perceptron to obtain third information.

In this implementation, after the fifth information is screened from the fourth information according to the scene requirement, normalization operation is performed on the fifth information, so that each attention score in the normalized fifth information ranges between (0, 1), and the sum of all elements is 1. And then, inputting the normalized fifth information into a multi-layer perceptron, and outputting the association degree of the target vehicle and the lanes around the own vehicle in the first time under the action of the perceptron to give the intention lanes and the probability of future driving of the own vehicle.

(2) Second decoder module

The second decoder module is mainly used for processing the predicted track information of the target vehicles around the own vehicle.

In one implementation, the main operation of the second decoder module is:

And inputting the sixth information into the multi-layer perceptron to obtain the predicted track information of the vehicles around the self-vehicle in the first time.

In this implementation, the second decoder module includes an MLP, and the target vehicle is output as a track within the first time in the future by inputting sixth information into the second decoder module. For example, assuming that a travel locus of 3s in the future is output, one point is output every 0.2s, coordinates of 15 points in total can be output.

It will be appreciated that the second decoder module in the embodiments of the present application may obtain the predicted track of the vehicle by using a variety of methods, which are only one example, and are not limited thereto.

403, The executing device determines the first lane as a lane where the target vehicle is located in the first time, where the first lane is one lane with the highest association degree with the target vehicle in the first time from at least one lane around the host vehicle.

In the embodiment of the application, after the first information and the second information are input into the first model, the predicted position information of the vehicles around the own vehicle in the first time and the third information are output through the first model, wherein the third information comprises the association degree of the target vehicles around the own vehicle and at least one lane around the own vehicle in the first time, namely the attention score corresponding to each lane of the target vehicle, or the probability that the target vehicle runs in each lane in the future can be understood. On this basis, in one implementation manner, the execution device may select, from the attention scores corresponding to the plurality of lanes, a lane with the highest attention score as a predicted lane in which the target vehicle is located in the first time.

It is understood that step 403 is an optional step. In step 402, the third information included in the prediction information generated by the first model, where the third information indicates the lane where the vehicle around the host vehicle is located in the first time, and specifically how to further determine the lane where the vehicle around the host vehicle is located in the first time by the third information, may be extended to various embodiments. In one implementation manner, the execution device uses a lane with the highest association degree as a lane where a vehicle around the own vehicle is located in the first time, and in an actual application process, the score of the attention score can be used as one of prediction basis, and the position prediction is performed by combining other characteristics or methods, which is not limited herein.

In the embodiment of the application, different numbers of lanes can be selected as considered environmental features according to the actual condition of the target vehicle, and the attention mechanism mode is adopted to combine the track of the target vehicle with the surrounding lane features to form the attention features, so that the network can pay attention to the features of the surrounding lanes during learning, the prediction result of the track of the vehicle is more in accordance with the actual driving rule, and the accuracy of the prediction result is improved.

2. Training phase

In the embodiment of the present application, the training phase describes how the training device 220 generates the mature neural network by using the image data set in the database 230a, specifically, referring to fig. 10, fig. 10 is a schematic flow chart of a training method of a model provided in the embodiment of the present application, and the training method of a model provided in the embodiment of the present application may include:

1001, the training apparatus acquires first information including information of vehicles around the own vehicle and second information including information of lanes around the own vehicle.

In the embodiment of the application, the training equipment acquires the data set of the first information and the second information, divides the data set into a training set, a verification set and a test set, trains a model by the training set, adjusts parameters by the verification set and evaluates performance by the test set. The data dividing ratio of the training set, the verification set and the test set can be set according to actual requirements, and is not limited herein.

In the embodiment of the present application, the specific implementation manner of the training device to execute step 1001 may refer to the description of the specific implementation manner of step 401 in the corresponding embodiment of fig. 4, which is not repeated herein.

It will be appreciated that in training the first model, the training samples used include complete information of the vehicle around the own vehicle and complete information of the lane around the own vehicle, so that the position information output by the first model is also more accurate.

The training device inputs 1002 the first information and the second information into the first model, and obtains prediction information generated by the first model, where the prediction information includes predicted position information of vehicles around the vehicle in a first time.

In the embodiment of the application, the training equipment inputs the acquired first information and second information into the first model to obtain the prediction information generated by the first model. In one implementation, the predicted information includes predicted trajectory information of vehicles around the host vehicle over a first time and third information including a degree of association of a target vehicle around the host vehicle with at least one lane around the host vehicle over the first time. Referring to the above embodiment of the present application, the first model includes an encoder and a decoder, and the decoder includes a first decoder module and a second decoder module, wherein after the training device inputs the first information and the second information into the encoder of the first model, third information is outputted through the first decoder module, and predicted track information of vehicles around the own vehicle in a first time is outputted through the second decoder module.

In the embodiment of the present application, the specific implementation manner of the training device to execute the step 1302 may refer to the description of the specific implementation manner of the step 402 in the corresponding embodiment of fig. 4, which is not repeated herein.

The training device trains the first model according to a loss function indicating a degree of similarity between the predicted information and correct information including correct position information of vehicles around the own vehicle in a first time 1003.

In the embodiment of the application, training data is pre-configured on the training equipment, and the training data comprises expected results corresponding to information of vehicles around the own vehicle and lane information. After obtaining the prediction result corresponding to the information of the vehicles around the own vehicle and the lane information, the training device can calculate the function value of the target loss function according to the prediction result and the expected result, and update the parameter value of the model to be trained according to the function value of the target loss function and the back propagation algorithm so as to complete one training of the model to be trained.

Wherein a "model to be trained" may also be understood as a "target model to be trained". The meaning of "the expected result corresponding to the information of the vehicle around the own vehicle and the lane information" is similar to the meaning of "the predicted result corresponding to the information of the vehicle around the own vehicle and the lane information", except that "the predicted result corresponding to the information of the vehicle around the own vehicle and the lane information" is the predicted result generated by the model to be trained, "the expected result corresponding to the information of the vehicle around the own vehicle and the lane information" is the correct result corresponding to the information of the vehicle around the own vehicle and the lane information. As an example, where the model to be processed is used to perform a target detection task, for example, the prediction result is used to indicate a desired position of at least one object in the target environment, and the desired result is used to indicate a desired position (may also be referred to as a correct position) of at least one object in the target environment, it should be understood that this is merely for convenience of understanding the present solution, and is not intended to be exhaustive of the meaning of the desired result in various application scenarios.

The training device may repeatedly execute steps 1001 to 1003 for a plurality of times, so as to implement iterative training of the model to be trained, until a preset condition is met, to obtain a trained model to be trained, where the preset condition may be a convergence condition for reaching the target loss function, or the number of iterations of steps 1001 to 1003 reaches a preset number of times.

In the embodiment of the application, not only the specific implementation mode of the reasoning process of the model is provided, but also the specific implementation mode of the training process of the model is provided, and the application scene of the scheme is expanded.

The complete process of training the first model according to the loss function by the training device is described in detail below.

1. The training equipment collects the data set to obtain the required original data set and the corresponding class labels thereof, and the training set, the verification set and the test set are divided according to the proportion number and are respectively used for training, verifying and evaluating the subsequent model.

2. The training device builds a first model based on the attention mechanism.

3. The training device inputs the data of the training set into the first model, trains the first model by adopting the first loss function and the second loss function, updates the identification model through a back propagation algorithm, and screens out the optimal first model by utilizing the data of the verification set.

(1) Determining a loss function

In one implementation, the penalty function employed by the training device includes a first penalty function and a second penalty function.

In this implementation, the predicted information output by the first model includes predicted track information of vehicles around the host vehicle in a first time and a degree of association between a target vehicle around the host vehicle and at least one lane around the host vehicle in the first time. Therefore, the loss value between the predicted track information and the correct track information of the vehicles around the vehicle in the first time, which are output by the first model, is calculated through the first loss function, and the loss value between the association degree of at least one lane around the vehicle in the first time and the correct information is calculated through the second loss function.

Illustratively, the formula for the first loss function is specifically:

Where l _n represents a loss value of the predicted coordinate and the real coordinate corresponding to the nth sample of the target vehicle, x _n represents vector data of the predicted coordinate corresponding to the nth sample of the target vehicle, y _n represents vector data of the real coordinate corresponding to the nth sample of the target vehicle, and beta represents an error threshold. For example, the coordinates of the target vehicle in the future 3s are acquired every 0.2s, so that the position coordinates of 15 points can be obtained, and each sampling point corresponds to one sample.

The formula of the second loss function is specifically:

l_n＝-w_ynx_ny_n (2)

Wherein l _n is a loss value loss corresponding to the nth sample, x _n is a predicted lane where the nth sample is located in the first time, y _n represents a lane where the nth sample is actually located in the first time, the samples represent vehicles around the own vehicle, and w represents weights.

It can be understood that, because the third information output by the first model includes the association degree of the target vehicle around the own vehicle and at least one lane around the own vehicle in the first time, the true value of the attention score is inconvenient to obtain in the process of actually performing model training, and therefore, on the basis of the third information output by the first model, one lane with the highest association degree with the target vehicle in the first time in the at least one lane around the own vehicle is taken as the lane where the target vehicle is located in the first time, and the model is trained by comparing the predicted lane information of the target lane with the actual lane information.

(2) Training device trains the first model according to the loss function

After determining the loss function, the training device trains the first model using the training set and verifies on the verification set, saving the best performing network model parameters on the verification set.

In one implementation, the specific process of training the first model by the training device according to the loss function is:

(1) Based on a back propagation algorithm, training the first model by adopting a first loss function, and storing the model with the minimum first loss value after training.

(2) Training the model obtained in the step (1) by adopting a second loss function based on a back propagation algorithm, and storing the model with the minimum second loss value after training is completed.

It will be appreciated that during the training of the first model, the training device may update the parameters of the first model via an error back propagation algorithm. In brief, the training device may correct the magnitude of the parameters in the initial noise reduction classification network during the training of the first model by using an error back propagation algorithm, so that the error loss is smaller and smaller. Specifically, the input signal is forwarded until the output is generated with error loss, and the parameters in the initial first model are updated by back-propagating the error loss information, so that the error loss converges. The back propagation algorithm is a back propagation motion that dominates the error loss, and aims to obtain parameters of the optimal neural network model, such as a weight matrix.

4. The training equipment uses the data of the test set to test the prediction performance of the first model to obtain the final model identification accuracy, and when the model identification accuracy reaches a set threshold, the data to be predicted is input into the first model for identification; otherwise, returning to the third step until the model identification accuracy reaches the set threshold.

It can be understood that since the predicted information output by the first model includes two parts, that is, the predicted trajectory information of the target vehicle and the attention score of the target vehicle with respect to the lane, respectively, the accuracy rate evaluation is performed for the two parts, respectively.

In one implementation, the formula for estimating the accuracy of the predicted track information is specifically:

The FDE (FINAL DISPLACEMENT error, last displacement error) measures the step length error (euclidean distance, in meters) in a future period, and the MR (Miss Rate) represents the inaccurate ratio of the predicted track information output by the first model, that is, the ratio that the distance between sampling points exceeds the fault tolerance distance for each sampling point in the predicted track information output by the first model and the actually measured track information. N represents the batch-size, corresponding to 256 in Table 1, N represents the number of points per track, x and y are the horizontal and vertical coordinates of the points respectively, Respectively representing the true values of the abscissa and the ordinate corresponding to the j-th point of the collected point n (namely, the collected values in the track label) for each track, wherein x _nj、y_nj respectively represents the predicted values of the abscissa and the ordinate corresponding to the j-th point of the collected point n for each track, and dist_threshold refers to the fault tolerance distance and can be set to be 1.5 meters; valid_num represents the number of valid data acquired.

In one implementation manner, if the road scene to which the target vehicle belongs is an intersection scene, the accuracy evaluation of the intersection lane intention is performed by the following formula:

The Acc _exit represents the accuracy of the exit lane, the lane from which the target vehicle leaves from the intersection scene is called the exit lane, the exit_lane_right_num represents the number of real lanes where the target vehicle is located in the first time in the future and the predicted lanes are consistent, the valid_exit_lane_num represents the number of collected valid data by acquiring the real lanes where the target vehicle is located in the first time in the future from the intersection tag.

In one implementation manner, if the road scene to which the target vehicle belongs is a non-intersection scene, in the non-intersection scene, the vehicle has two behaviors of lane change or no lane change in the driving process.

The accuracy of the lane change of the vehicle is evaluated by the following formula:

Acc _cutin represents the accuracy of lane changing of the target vehicle in the first time, cutin _right_num represents that the target vehicle actually performs lane changing in the first time in the future, and the lane changing of the vehicle is predicted through the first model, and valid_ cutin _num represents the quantity of the collected effective data.

The accuracy of vehicle lane change is evaluated by the following formula:

The Acc _keep represents the accuracy of the target vehicle without lane change in the first time, that is, the accuracy of lane change misinformation, the keep_right_num represents the number of real lanes where the target vehicle is located in the first time in the future and the number of predicted lanes are consistent, wherein the real lanes where the target vehicle is located in the first time in the future can be obtained from the non-intersection tag, and the valid_keep_num represents the number of collected effective data.

In order to better implement the above-described scheme of the embodiment of the present application on the basis of the embodiments corresponding to fig. 1a to 10, the following provides a related device for implementing the above-described scheme. Referring specifically to fig. 11, fig. 11 is a schematic structural diagram of a vehicle position obtaining device according to an embodiment of the present application, a vehicle position obtaining device 1100 may include:

the obtaining module 1101 is configured to obtain first information and second information, where the first information includes information of vehicles around the host vehicle, and the second information includes information of lanes around the host vehicle.

The position prediction module 1102 is configured to input the first information and the second information into the first model, and obtain prediction information generated by the first model, where the prediction information includes predicted position information of vehicles around the vehicle in a first time.

For the specific description of the obtaining module 1101 and the position predicting module 1102, reference may be made to the descriptions of the steps 401 to 402 in the above embodiments, which are not repeated here.

In one implementation, the predicted information includes predicted trajectory information of vehicles around the host vehicle at a first time and third information indicating a lane in which the vehicles around the host vehicle are located at the first time.

In one implementation, the third information includes a degree of association between a target vehicle around the host vehicle and at least one lane around the host vehicle in a first time, and the position acquisition device 1100 of the vehicle further includes:

In one implementation, the first model is built based on an attention mechanism, and the location prediction module 1102 is specifically configured to:

In one implementation, the location prediction module 1102 is specifically configured to:

In a possible implementation manner of the third aspect, the location prediction module 1102 is specifically configured to:

It should be noted that, in the position obtaining device 1100 of the vehicle, the contents of information interaction and execution process between the modules/units are based on the same concept, and specific contents may be referred to the description of the foregoing method embodiment of the present application, which is not repeated herein.

The embodiment of the present application further provides a training device for a model, referring to fig. 12, fig. 12 is a schematic structural diagram of the training device for a model provided by the embodiment of the present application, and a training device 1200 for a model may include:

An acquisition module 1201 is configured to acquire first information including information of vehicles around the own vehicle and second information including information of lanes around the own vehicle.

The position prediction module 1202 is configured to input the first information and the second information into the first model, and obtain prediction information generated by the first model, where the prediction information includes predicted position information of vehicles around the vehicle in a first time.

The model training module 1203 is configured to train the first model according to a loss function, where the loss function indicates a similarity between the prediction information and correct information, and the correct information includes correct position information of vehicles around the own vehicle in a first time.

For a specific description of the acquisition module 1201, the position prediction module 1202 and the model training module 1203, reference may be made to the descriptions of steps 1001 to 1003 in the above embodiment, and the description is not repeated here.

In one implementation, the predicted information includes predicted trajectory information of vehicles around the host vehicle within the first time and third information indicating a lane in which the vehicles around the host vehicle are located within the first time.

In one implementation, the third information includes a degree of association between a target vehicle around the host vehicle and at least one lane around the host vehicle in the first time, and the training apparatus 1200 of the model further includes:

In this implementation manner, since the third information output by the first model includes the association degree of the target vehicle around the own vehicle and at least one lane around the own vehicle in the first time, the correct value of the attention score is inconvenient to obtain in the process of actually performing model training, and therefore, on the basis of the third information output by the first model, one lane with the highest association degree with the target vehicle in the first time in the at least one lane around the own vehicle is taken as the lane where the target vehicle is located in the first time, and the model is trained by comparing the predicted lane information of the target lane with the actual lane information.

It may be appreciated that in one implementation, in the actual operation process, after the prediction information is obtained by the position prediction module 1202, the lane where the target vehicle is located in the first time may be determined by the lane determining module first, and then the first model is trained by the model training module 1203; in still another implementation manner, in the actual operation process, if the correlation between the target vehicle around the own vehicle and at least one lane around the own vehicle in the first time can be measured correctly, the first model can be trained directly according to the error between the correlations, and the lane determining module is not required to be executed. In the practical application scene, whether the training equipment needs to adopt the lane determining module or not can be set according to the practical requirements, and the training equipment is not limited herein.

In one implementation, the first model is built based on the attention mechanism, and the location prediction module 1202 is specifically configured to:

In one implementation, the location prediction module 1202 is specifically configured to:

It should be noted that, in the training apparatus 1200 of the model, the content of information interaction and execution process between each module/unit is based on the same concept, and specific content may be referred to the description of the foregoing method embodiment of the present application, which is not repeated herein.

Referring to fig. 13, fig. 13 is a schematic structural diagram of an execution device according to an embodiment of the present application, and the execution device 1300 may be embodied as a vehicle, a mobile robot, a monitoring data processing device, or other devices, which is not limited herein. Specifically, the execution apparatus 1300 includes: receiver 1301, transmitter 1302, processor 1303 and memory 1304 (where the number of processors 1303 in executing device 1300 may be one or more, as exemplified by one processor in fig. 13), where processor 1303 may include an application processor 13031 and a communication processor 13032. In some embodiments of the application, the receiver 1301, transmitter 1302, processor 1303, and memory 1304 may be connected by a bus or other means.

Memory 1304 may include read only memory and random access memory and provides instructions and data to processor 1303. A portion of the memory 1304 may also include non-volatile random access memory (non-volatile randomaccess memory, NVRAM). The memory 1304 stores a processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for performing various operations.

The processor 1303 controls operations of the execution device. In particular applications, the various components of the execution device are coupled together by a bus system that may include, in addition to a data bus, a power bus, a control bus, a status signal bus, and the like. For clarity of illustration, however, the various buses are referred to in the figures as bus systems.

The method disclosed in the above embodiment of the present application may be applied to the processor 1303 or implemented by the processor 1303. The processor 1303 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the method described above may be performed by integrated logic circuitry in hardware or instructions in software in the processor 1303. The processor 1303 may be a general purpose processor, a Digital Signal Processor (DSP), a microprocessor, or a microcontroller, and may further include an Application SPECIFIC INTEGRATED Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The processor 1303 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1304, and the processor 1303 reads information in the memory 1304, and performs the steps of the method in combination with hardware.

The receiver 1301 may be used to receive input numeric or character information and to generate signal inputs related to performing relevant settings and function control of the device. The transmitter 1302 may be configured to output numeric or character information via a first interface; the transmitter 1302 may also be configured to send instructions to the disk group through the first interface to modify data in the disk group; the transmitter 1302 may also include a display device such as a display screen.

In an embodiment of the present application, in one case, the application processor 13031 in the processor 1303 is configured to execute the method for acquiring the position of the vehicle executed by the execution device in the corresponding embodiment of fig. 4 to 9. It should be noted that, the specific manner in which the application processor 13031 executes the foregoing steps is based on the same concept as that of the method embodiments corresponding to fig. 4 to 9 in the present application, so that the technical effects thereof are the same as those of the method embodiments corresponding to fig. 4 to 9 in the present application, and the specific content can be referred to the descriptions in the foregoing method embodiments of the present application, which are not repeated herein.

The embodiment of the application also provides training equipment, referring to fig. 14, fig. 14 is a schematic structural diagram of the training equipment provided by the embodiment of the application. In particular, the training device 1400 is implemented by one or more servers, and the training device 1400 may vary considerably in configuration or performance, and may include one or more central processing units (central processing units, CPUs) 1422 (e.g., one or more processors) and memory 1432, one or more storage mediums 1430 (e.g., one or more mass storage devices) that store applications 1442 or data 1444. Wherein the memory 1432 and storage medium 1430 can be transitory or persistent storage. The program stored on the storage medium 1430 may include one or more modules (not shown) each of which may include a series of instruction operations for the training device. Still further, central processor 1422 may be provided in communication with a storage medium 1430 to perform a series of instruction operations in storage medium 1430 on training device 1400.

The training apparatus 1400 may also comprise one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input/output interfaces 1458, and/or one or more operating systems 1441, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.

In the embodiment of the present application, the central processor 1422 is configured to execute the method for acquiring the position of the vehicle executed by the training apparatus in the corresponding embodiment of fig. 10. It should be noted that, the specific manner in which the cpu 1422 executes the foregoing steps is based on the same concept as that of the method embodiment corresponding to fig. 10 in the present application, so that the technical effects thereof are the same as those of the method embodiment corresponding to fig. 10 in the present application, and the specific details can be found in the descriptions of the foregoing method embodiments of the present application, which are not repeated herein.

Embodiments of the present application also provide a computer program product which, when run on a computer, causes the computer to perform the steps performed by the apparatus in the method described in the embodiment of fig. 4 to 9 described above, or causes the computer to perform the steps performed by the training apparatus in the method described in the embodiment of fig. 10 described above.

In an embodiment of the present application, there is also provided a computer-readable storage medium having stored therein a program for performing signal processing, which when run on a computer causes the computer to perform the steps performed by the performing device in the method described in the embodiment shown in the foregoing fig. 4 to 9, or causes the computer to perform the steps performed by the training device in the method described in the embodiment shown in the foregoing fig. 10.

The position acquiring device, the training device, the executing device and the training device of the model of the vehicle provided by the embodiment of the application can be specifically a chip, wherein the chip comprises: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, pins or circuitry, etc. The processing unit may execute the computer-executable instructions stored in the storage unit to cause the chip to perform the position acquisition method of the vehicle described in the embodiment shown in fig. 4 to 9 described above, or to cause the chip to perform the training method of the model described in the embodiment shown in fig. 10 described above. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, or the like, and the storage unit may also be a storage unit in the wireless access device side located outside the chip, such as a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a random access memory (random access memory, RAM), or the like.

Specifically, referring to fig. 15, fig. 15 is a schematic structural diagram of a chip provided in an embodiment of the present application, where the chip may be represented as a neural network processor NPU 150, and the NPU 150 is mounted as a coprocessor on a main CPU (Host CPU), and the Host CPU distributes tasks. The core part of the NPU is an operation circuit 1503, and the controller 1504 controls the operation circuit 1503 to extract matrix data in the memory and perform multiplication.

In some implementations, the arithmetic circuit 1503 includes a plurality of processing units (PEs) inside. In some implementations, the operation circuit 1503 is a two-dimensional systolic array. The operation circuit 1503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 1503 is a general-purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit takes the data corresponding to matrix B from the weight memory 1502 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit takes matrix a data from the input memory 1501 and performs matrix operation with matrix B, and the obtained partial result or final result of the matrix is stored in an accumulator (accumulator) 1508.

Unified memory 1506 is used to store input data and output data. The weight data is carried directly to the weight memory 1502 through the memory cell access controller (Direct Memory Access Controller, DMAC) 1505. The input data is also carried into the unified memory 1506 through the DMAC.

BIU is Bus Interface Unit, bus interface unit 1510, for interaction of the AXI bus with the DMAC and instruction fetch memory (Instruction Fetch Buffer, IFB) 1509.

The bus interface unit 1510 (Bus Interface Unit, abbreviated as BIU) is configured to fetch the instruction from the external memory by the instruction fetch memory 1509, and further configured to fetch the raw data of the input matrix a or the weight matrix B from the external memory by the memory unit access controller 1505.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1506 or to transfer weight data to the weight memory 1502 or to transfer input data to the input memory 1501.

The vector calculation unit 1507 includes a plurality of operation processing units that perform further processing such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like on the output of the operation circuit, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization (batch normalization), pixel-level summation, up-sampling of a characteristic plane and the like.

In some implementations, the vector computation unit 1507 can store the vector of processed outputs to the unified memory 1506. For example, the vector calculation unit 1507 may apply a linear function and/or a nonlinear function to the output of the operation circuit 1503, for example, linearly interpolate the feature plane extracted by the convolution layer, and further, for example, accumulate a vector of values to generate an activation value. In some implementations, the vector calculation unit 1507 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as an activation input to the arithmetic circuit 1503, for example for use in subsequent layers in a neural network.

An instruction fetch memory (instruction fetch buffer) 1509 connected to the controller 1504 for storing instructions used by the controller 1504;

the unified memory 1506, the input memory 1501, the weight memory 1502 and the finger memory 1509 are all On-Chip memories. The external memory is proprietary to the NPU hardware architecture.

Here, the operations of the respective layers in the object model shown in fig. 4 to 10 may be performed by the operation circuit 1503 or the vector calculation unit 1507.

The processor mentioned in any of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the method of the first aspect.

It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course by means of special purpose hardware including application specific integrated circuits, special purpose CPUs, special purpose memories, special purpose components, etc. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. But a software program implementation is a preferred embodiment for many more of the cases of the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., comprising several instructions for causing a computer device (which may be a personal computer, a training device, a network device, etc.) to perform the method according to the embodiments of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device, a data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (Solid STATE DISK, SSD)), etc.

Claims

1. A position acquisition method of a vehicle, characterized by comprising:

acquiring first information and second information, wherein the first information comprises information of vehicles around a self-vehicle, and the second information comprises information of lanes around the self-vehicle;

And inputting the first information and the second information into a first model to obtain prediction information generated by the first model, wherein the prediction information comprises the predicted position information of vehicles around the own vehicle in a first time.

2. The method of claim 1, wherein the predicted information includes predicted trajectory information of vehicles around the own vehicle within the first time and third information indicating a lane in which the vehicles around the own vehicle are located within the first time.

3. The method of claim 2, wherein the third information includes a degree of association of a target vehicle around the host vehicle with at least one lane around the host vehicle for the first time, the target vehicle being one vehicle around the host vehicle, the method further comprising:

And determining a first lane as a lane in which the target vehicle is located in the first time, wherein the first lane is one lane with highest association degree with the target vehicle in the first time from at least one lane around the own vehicle.

4. A method according to claim 2 or 3, wherein the first model is constructed based on an attention mechanism, and wherein the inputting the first information and the second information into the first model results in the prediction information generated by the first model, comprises: inputting the first information and the second information into the first model, and generating fourth information based on the attention mechanism, wherein the fourth information comprises the association degree of a target vehicle around the own vehicle and a first lane set in the first time, the target vehicle is one vehicle around the own vehicle, and the first lane set comprises all lanes around the own vehicle, which are included in the second information;

acquiring the category of a road scene to which the target vehicle belongs, wherein the category of the road scene comprises an intersection scene and a non-intersection scene;

Selecting a second vehicle road set from the first vehicle road set according to the category of the road scene to which the target vehicle belongs, wherein the second vehicle road set comprises lanes of vehicles around the own vehicle in the first time;

obtaining fifth information from the fourth information, and generating the third information according to the fifth information, wherein the fifth information comprises the association degree of the target vehicle and the second vehicle road set in the first time;

And generating the predicted track information of the vehicles around the own vehicle in the first time according to the second information and the fourth information.

5. The method of claim 4, wherein the obtaining the fifth information from the fourth information and generating the third information from the fifth information comprises:

acquiring the fifth information from the fourth information, and carrying out normalization operation on the fifth information to obtain normalized fifth information;

And inputting the normalized fifth information into a multi-layer perceptron to obtain the third information.

6. The method of claim 4 or 5, wherein generating the fourth information from the first information and the second information comprises:

And performing normalization operation on the matrix product of the first linear matrix and the second linear matrix to obtain the fourth information.

7. The method of claim 6, wherein the generating the predicted trajectory information for vehicles around the host vehicle for the first time based on the second information and the fourth information comprises:

And inputting the sixth information into a multi-layer perceptron to obtain the predicted track information of the vehicles around the own vehicle in the first time.

8. A method of training a model, the method comprising:

Inputting the first information and the second information into a first model to obtain prediction information generated by the first model, wherein the prediction information comprises the predicted position information of vehicles around the own vehicle in a first time;

Training the first model according to a loss function, the loss function indicating a similarity between the predicted information and correct information, the correct information including correct position information of vehicles around the own vehicle within the first time.

9. The method of claim 8, wherein the predicted information includes predicted trajectory information of vehicles around the own vehicle within the first time and third information indicating a lane in which the vehicles around the own vehicle within the first time are located.

10. The method of claim 9, wherein the third information includes a degree of association of a target vehicle around the host vehicle with at least one lane around the host vehicle for the first time, the target vehicle being one vehicle around the host vehicle, the method further comprising:

11. The method according to claim 9 or 10, wherein the first model is constructed based on an attention mechanism, and the inputting the first information and the second information into the first model, to obtain the prediction information generated by the first model, includes:

Inputting the first information and the second information into the first model, and generating fourth information based on the attention mechanism, wherein the fourth information comprises the association degree of a target vehicle around the own vehicle and a first lane set in the first time, the target vehicle is one vehicle around the own vehicle, and the first lane set comprises all lanes around the own vehicle, which are included in the second information;

12. A position acquisition device of a vehicle, characterized by comprising:

An acquisition module configured to acquire first information including information of vehicles around a host vehicle and second information including information of lanes around the host vehicle;

And the position prediction module is used for inputting the first information and the second information into a first model to obtain prediction information generated by the first model, wherein the prediction information comprises the predicted position information of vehicles around the own vehicle in a first time.

13. The apparatus of claim 12, wherein the predicted information includes predicted trajectory information of vehicles around the own vehicle within the first time and third information indicating a lane in which the vehicles around the own vehicle within the first time are located.

14. The apparatus of claim 13, wherein the third information comprises a degree of association of a target vehicle around the host vehicle with at least one lane around the host vehicle for the first time, the target vehicle being one vehicle around the host vehicle, the apparatus further comprising:

the lane determining module is used for determining a first lane as a lane where the target vehicle is located in the first time, wherein the first lane is one lane with highest association degree with the target vehicle in the first time from at least one lane around the own vehicle.

15. The apparatus according to claim 13 or 14, wherein the first model is constructed based on an attention mechanism, and wherein the position prediction module is specifically configured to:

16. The apparatus of claim 15, wherein the location prediction module is specifically configured to:

17. The apparatus according to claim 15 or 16, wherein the position prediction module is specifically configured to:

18. The apparatus of claim 17, wherein the location prediction module is specifically configured to:

19. A training device for a model, comprising:

The position prediction module is used for inputting the first information and the second information into a first model to obtain prediction information generated by the first model, wherein the prediction information comprises the predicted position information of vehicles around the own vehicle in a first time;

And the model training module is used for training the first model according to a loss function, wherein the loss function indicates the similarity between the prediction information and correct information, and the correct information comprises correct position information of vehicles around the own vehicle in the first time.

20. The apparatus of claim 19, wherein the predicted information comprises predicted trajectory information for vehicles surrounding the host vehicle over the first time and third information indicating a lane in which vehicles surrounding the host vehicle are located over the first time.

21. The apparatus of claim 20, wherein the third information comprises a degree of association of a target vehicle around the host vehicle with at least one lane around the host vehicle for the first time, the target vehicle being one vehicle around the host vehicle, the apparatus further comprising:

22. The apparatus according to claim 20 or 21, wherein the first model is constructed based on an attention mechanism, and wherein the position prediction module is specifically configured to:

23. An execution device comprising a processor and a memory, the processor coupled to the memory,

The memory is used for storing programs;

The processor configured to execute a program in the memory, so that the execution device executes the method according to any one of claims 1 to 7.

24. An autonomous vehicle comprising a processor coupled with a memory storing program instructions that when executed by the processor implement the method of any of claims 1-7.

25. A training device comprising a processor and a memory, the processor being coupled to the memory,

The memory is used for storing programs;

the processor being configured to execute a program in the memory, causing the training device to perform the method of any one of claims 8 to 11.

26. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 7 or to perform the method of any one of claims 8 to 11.

27. Circuitry, characterized in that it comprises processing circuitry configured to perform the method of any one of claims 1 to 7 or to perform the method of any one of claims 8 to 11.