CN115035494A

CN115035494A - Image processing method, image processing device, vehicle, storage medium and chip

Info

Publication number: CN115035494A
Application number: CN202210778647.1A
Authority: CN
Inventors: 陈吕劼
Original assignee: Xiaomi Automobile Technology Co Ltd
Current assignee: Xiaomi Automobile Technology Co Ltd
Priority date: 2022-07-04
Filing date: 2022-07-04
Publication date: 2022-09-09

Abstract

The disclosure relates to an image processing method, an image processing device, a vehicle, a storage medium and a chip, wherein the image processing method extracts first target characteristic data corresponding to a current driving environment image through a multi-task perception model, the multi-task perception model comprises a plurality of task decoding modules, and different task decoding modules are used for completing different perception tasks; inputting the first target characteristic data and second target characteristic data corresponding to the historical driving environment image into each task decoding module to obtain a task processing result output by each task decoding module; therefore, each task processing result is obtained through the plurality of mutually independent task decoding modules, the multi-task processing speed can be effectively improved, and due to the fact that parallel processing can be conducted among different tasks, the task processing efficiency can be effectively improved, mutual influence among different task processing results can be reduced, and therefore the effect of effectively improving the accuracy of the task processing results can be achieved.

Description

Image processing method, image processing device, vehicle, storage medium and chip

Technical Field

The present disclosure relates to the field of automatic driving, and in particular, to an image processing method and apparatus, a vehicle, a storage medium, and a chip.

Background

In order to ensure safe, efficient and reliable driving, an automatic driving vehicle often needs to process dozens of sensing tasks at the same time, a multi-task sensing model adopted in the prior art generally extracts features through a Backbone network based on a convolutional neural network, and then a Heads module with task specificity is used for processing the tasks respectively.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides an image processing method, an apparatus, a vehicle, a storage medium, and a chip.

According to a first aspect of an embodiment of the present disclosure, there is provided an image processing method applied to a vehicle, including:

acquiring a current running environment image of the vehicle;

inputting the current driving environment image into a preset multitask perception model, wherein the multitask perception model comprises a feature extraction module and a plurality of task decoding modules, different task decoding modules are used for completing different perception tasks, and first target feature data corresponding to the current driving environment image are extracted through the feature extraction module;

acquiring second target characteristic data corresponding to the historical driving environment image;

and inputting the first target characteristic data and the second target characteristic data into each task decoding module to obtain a task processing result output by each task decoding module.

Optionally, the feature extraction module comprises an image slicing sub-module, an encoder and a data storage sub-module, an output of the image slicing sub-module is coupled with an input of the encoder, an output of the encoder is coupled with an input of the data storage sub-module, an output of the data storage sub-module is coupled with each of the plurality of task decoding modules respectively,

the image dicing submodule is used for dividing the current driving environment image into a plurality of image areas and acquiring a feature vector corresponding to each image area;

the encoder comprises a self-attention layer, and is used for performing feature extraction on a plurality of feature vectors corresponding to the plurality of image areas through the self-attention layer to obtain first target feature data corresponding to the current driving environment image, and inputting the first target feature data into the data storage submodule;

the data storage submodule is used for storing the first target characteristic data corresponding to the current running environment image output by the encoder.

Optionally, the extracting, by the feature extraction module, first target feature data corresponding to the current driving environment image includes:

in response to the fact that the current driving environment image is received, the current driving environment image is divided into a plurality of image areas through the image dicing submodule, and a feature vector corresponding to each image area is obtained;

extracting features of a plurality of feature vectors corresponding to the plurality of image areas through the self-attention layer to obtain first target feature data corresponding to the current driving environment image, and inputting the first target feature data into the data storage submodule;

and storing the first target characteristic data output by the encoder through the data storage submodule.

Optionally, the acquiring second target feature data corresponding to the historical driving environment image includes:

acquiring identification information corresponding to a historical driving environment image;

and reading second target characteristic data corresponding to the historical driving environment image from the data storage submodule according to the identification information.

Optionally, the task decoding modules include one or more decoders, and the inputting the first target feature data and the second target feature data into each task decoding module to obtain the task processing result output by each task decoding module includes:

performing, by the one or more decoders, task processing on the received first target feature data and the second target feature data to obtain the task processing result.

Optionally, the plurality of task decoding modules comprises at least one of a position detection class task decoding module, an image segmentation class task decoding module and a category detection class task decoding module,

the position detection task decoding module is used for determining the position information of a first specified object in the current vehicle running environment according to the first target characteristic data and the second target characteristic data;

the class detection task decoding module is used for determining class information of a second designated object in the current vehicle running environment according to the first target characteristic data and the second target characteristic data;

and the image segmentation class task decoding module is used for determining a lane line position and/or a travelable area in a vehicle traveling environment according to the first target characteristic data and the second target characteristic data.

Optionally, the position detection task decoding module includes one or more of a traffic light position detection task decoding module, a vehicle position detection task decoding module, a pedestrian position detection task decoding module, an obstacle position detection task decoding module, a lamp post position detection task decoding module, and a traffic sign position detection task decoding module;

the class detection task decoding module comprises a weather class detection task decoding module or a driving road class detection task decoding module.

Optionally, the multitask perception model is trained by:

acquiring a plurality of groups of running environment image samples, wherein each group of running environment image samples comprises a plurality of running environment sample images and annotation data of a current sensing task, and different running environment image samples comprise annotation data of different sensing tasks;

training a preset initial model by taking the multiple groups of driving environment image samples as training data to obtain the multitask perception model; the preset initial model comprises an initial feature extraction module and a plurality of initial task decoding modules, wherein the initial feature extraction module comprises an image block cutting initial sub-module, an initial encoder and a data storage sub-module, and the initial task decoding module comprises one or more initial decoders.

According to a second aspect of the embodiments of the present disclosure, there is provided an image processing apparatus applied to a vehicle, including:

a first acquisition module configured to acquire a current running environment image of the vehicle;

the second acquisition module is configured to input the current driving environment image into a preset multitask perception model, the multitask perception model comprises a feature extraction module and a plurality of task decoding modules, different task decoding modules are used for completing different perception tasks, and first target feature data corresponding to the current driving environment image are extracted through the feature extraction module;

the third acquisition module is configured to acquire second target characteristic data corresponding to the historical driving environment image;

the determining module is configured to input the first target characteristic data and the second target characteristic data into each task decoding module so as to obtain a task processing result output by each task decoding module.

Optionally, the feature extraction module comprises an image slicing submodule, an encoder and a data storage submodule, an output of the image slicing submodule is coupled with an input of the encoder, an output of the encoder is coupled with an input of the data storage submodule, an output of the data storage submodule is coupled with each of the plurality of task decoding modules respectively,

Optionally, the second obtaining module is configured to:

extracting features of the feature vectors corresponding to the image areas through the self-attention layer to obtain first target feature data corresponding to the current driving environment image, and inputting the first target feature data into the data storage submodule;

Optionally, the third obtaining module is configured to:

Optionally, the task decoding module comprises one or more decoders, and the determining module is configured to:

the class detection task decoding module is used for determining class information of a second specified object in the current vehicle running environment according to the first target characteristic data and the second target characteristic data;

the image segmentation class task decoding module is used for determining a lane line position and/or a travelable area in a vehicle traveling environment according to the first target characteristic data and the second target characteristic data

Optionally, the apparatus further comprises a model training module configured to:

acquiring a plurality of groups of running environment image samples, wherein each group of running environment image samples comprises a plurality of frames of running environment sample images and annotation data of a current sensing task, and different running environment image samples comprise annotation data of different sensing tasks;

According to a third aspect of the embodiments of the present disclosure, there is provided a vehicle including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

acquiring a plurality of frames of driving environment images;

inputting each frame of driving environment image into a preset multitask perception model respectively, wherein the multitask perception model comprises a feature extraction module and a plurality of task decoding modules, and different task decoding modules are used for completing different perception tasks;

sequentially extracting target characteristic data corresponding to each frame of driving environment image through the characteristic extraction module to obtain a plurality of target characteristic data corresponding to the plurality of frames of driving environment images;

and inputting the target feature data into each task decoding module to obtain a task processing result output by each task decoding module.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of the first aspect above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a chip comprising a processor and an interface; the processor is for reading instructions to perform the method of the first aspect above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

the second target characteristic data of the historical driving environment image and the first target characteristic data of the current driving environment image can be shared, each task processing result is obtained through the plurality of mutually independent task decoding modules, the multi-task processing speed can be effectively improved, and due to mutual independence among different tasks and parallel processing, the task processing efficiency can be effectively improved, mutual influence among different task processing results can be reduced, and the accuracy of the task processing results can also be effectively improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart illustrating a method of image processing in accordance with an exemplary embodiment of the present disclosure;

FIG. 2 is a block diagram of a multi-task aware model according to an exemplary embodiment of the present disclosure;

FIG. 3 is a flow chart of a method of image processing shown in the embodiment of FIG. 1 according to the present disclosure;

FIG. 4 is a flowchart illustrating a method for training a multi-task perceptual model in accordance with an exemplary embodiment of the present disclosure;

fig. 5 is a block diagram of an image processing apparatus shown in an exemplary embodiment of the present disclosure;

FIG. 6 is a functional block diagram schematic of a vehicle shown in an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

It should be noted that all actions of acquiring signals, information or data in the present application are performed under the premise of complying with the corresponding data protection regulation policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

Before describing the specific embodiments of the present disclosure in detail, first, the following description is made on an application scenario of the present disclosure, which may be applied to an autonomous vehicle, and in the current multitask perception scenario of the autonomous vehicle, in the related art, features are generally extracted through a Backbone network based on a convolutional neural network, and then a plurality of tasks are respectively processed by using a task-specific Heads module, and since the Backbone network based on the convolutional neural network is limited by a local inductive bias of a convolutional operation, it is difficult to perform a generative large-scale pre-training in an image reconstruction manner; moreover, when a task module is designed, a more appropriate backbone network characteristic is generally selected as a module input through manual experience and a large number of experiments, so that a large number of tasks are difficult to expand, and because various tasks processed by the Heads module are not mutually independent, the processing speed and the result of an upstream task can restrict and influence the processing speed and the detection result of a downstream task, so that the problems of low task processing speed, low efficiency and low accuracy of a task processing result are easily caused.

In order to solve the above technical problems, the present disclosure provides an image processing method, an apparatus, a vehicle, a storage medium, and a chip, where the image processing method extracts first target feature data corresponding to a current driving environment image through a multi-task perceptual model, where the multi-task perceptual model includes a plurality of task decoding modules, and different task decoding modules are used to complete different perceptual tasks; inputting the first target characteristic data and second target characteristic data corresponding to the historical driving environment image into each task decoding module to obtain a task processing result output by each task decoding module; therefore, each task processing result is obtained through the plurality of mutually independent task decoding modules, the multi-task processing speed can be effectively improved, and due to the fact that parallel processing can be conducted among different tasks, the task processing efficiency can be effectively improved, mutual influence among different task processing results can be reduced, and therefore the effect of effectively improving the accuracy of the task processing results can be achieved.

The technical scheme of the disclosure is explained in detail by combining specific embodiments.

FIG. 1 is a flow chart illustrating a method of image processing according to an exemplary embodiment of the present disclosure; as shown in fig. 1, the image processing method, applied to a vehicle, may include:

and step 101, acquiring a current running environment image of the vehicle.

In this step, the current driving environment image may be acquired by an image acquisition device provided in the vehicle.

And 102, inputting the current driving environment image into a preset multitask perception model, wherein the multitask perception model comprises a feature extraction module and a plurality of task decoding modules, different task decoding modules are used for completing different perception tasks, and first target feature data corresponding to the current driving environment image are extracted through the feature extraction module.

Fig. 2 is a schematic structural diagram of a multitask perceptual model according to an exemplary embodiment of the present disclosure, as shown in fig. 2, the feature extraction module 201 may include an image partitioning submodule 2011, an encoder 2012 and a data storage submodule 2013, an output end of the image partitioning submodule 2011 is coupled to an input end of the encoder 2012 (encoder), an output end of the encoder 2012 is coupled to an input end of the data storage submodule 2013, an output end of the data storage submodule 2013 is coupled to each of the plurality of task decoding modules 202, the image partitioning submodule 2011 is configured to divide the current driving environment image into a plurality of image regions and obtain a feature vector corresponding to each image region; the encoder 2012 includes a self-attention layer, configured to perform feature extraction on the feature vectors corresponding to the image regions through the self-attention layer to obtain the first target feature data corresponding to the current driving environment image, and input the first target feature data into the data storage sub-module; the data storage sub-module 2013 is configured to store the first target feature data corresponding to the current driving environment image output by the encoder; the task decoding module 202 includes one or more decoders.

This step can be implemented by the steps shown at S1 to S3 in fig. 3 (fig. 3 is a flowchart of an image processing method according to the embodiment shown in fig. 1 of the present disclosure;:

s1, in response to receiving the current driving environment image, dividing the current driving environment image into a plurality of image areas through the image partitioning submodule, and obtaining a feature vector corresponding to each image area.

For example, the image dicing sub-module may divide a 64 × 64 image into 8 × 8 image regions, and may perform feature extraction on each 8 × 8 image region to obtain a feature vector corresponding to each 8 × 8 image region, and when performing feature vector extraction on each image region, an Embedding algorithm or a Fingerprint algorithm in the prior art may be used, or another algorithm in the prior art may be used, which is not limited in this disclosure.

S2, performing feature extraction on the feature vectors corresponding to the image regions through the self-attention layer to obtain the first target feature data corresponding to the current driving environment image, and inputting the first target feature data into the data storage sub-module.

More context semantic features in the current driving environment image can be acquired through the self-attention layer, namely, a large amount of context semantic feature information is carried in the first target feature data, so that reliable data basis can be provided for subsequent task processing, and accuracy of task processing results can be improved.

And S3, storing the first target characteristic data output by the encoder through the data storage submodule.

The data storage submodule is used for storing first target characteristic data corresponding to the current running environment image, and the data storage submodule stores a plurality of first target characteristic data corresponding to the historical current running environment images along with the movement of time, namely the data storage submodule stores second target characteristic data corresponding to the historical running environment images.

Through the steps shown in S1 to S3, the first target feature data corresponding to the current driving environment image can be effectively extracted, and the first target feature data can be effectively stored, so that reliable guarantee is provided for data retrieval of a subsequent task decoding module.

And 103, acquiring second target characteristic data corresponding to the historical driving environment image.

In this step, identification information corresponding to the historical driving environment image can be acquired; and reading second target characteristic data corresponding to the historical driving environment image from the data storage submodule according to the identification information.

It should be noted that the data storage submodule stores second target characteristic data corresponding to the historical driving environment image in a preset time period. The identification information corresponding to the historical driving environment image may include a collection time and a collection position, for example, an image directly in front of the vehicle at 11 minutes and 20 seconds at 13 hours on 02, 01 and 2022 years. The process of acquiring the second target characteristic data corresponding to each historical driving environment image is the same as the process of acquiring the first target characteristic data, and the details of the disclosure are not repeated again.

And 104, inputting the first target characteristic data and the second target characteristic data into each task decoding module to obtain a task processing result output by each task decoding module.

Wherein each of the task decoding modules may include one or more decoders.

In this step, the one or more decoders may perform task processing on the received first target feature data and the second target feature data to obtain the task processing result.

It should be noted that, when the task processing is performed by the decoder, since the working principle of the decoder itself is query operation among a query vector, a key vector and a value vector, the task processing is performed by a task decoding module including one or more decoders, each perceptual task can acquire required information from first target feature data and second target feature data in a query manner, that is, an actual task is taken as a query vector (q), the first target feature data and the second target feature data are taken as key vectors (k), the first target feature data and the second target feature data are taken as value vectors (v) to perform scoring calculation, then the scoring result is normalized (i.e., softmax calculation) to obtain a normalized softmax score, and then all the value vectors are summed according to the softmax score, thereby obtaining a task processing result.

In addition, it should be noted that the plurality of task decoding modules are used for completing different perception tasks, different task decoding modules are independent from each other and can run in parallel, and because different task decoding modules are independent from each other, other task decoding modules can be added according to subsequent requirements, so that convenience can be provided for expansion of subsequent perception tasks.

According to the technical scheme, the second target characteristic data of the historical driving environment image and the first target characteristic data of the current driving environment image can be shared, each task processing result is obtained through the plurality of mutually independent task decoding modules, the multi-task processing speed can be effectively improved, and different tasks are mutually independent and can be processed in parallel, so that the task processing efficiency can be effectively improved, the mutual influence among different task processing results can be reduced, and the accuracy of the task processing results can be effectively improved.

the image segmentation class task decoding module is used for determining a lane line position and/or a travelable area in a vehicle traveling environment according to the first target feature data and the second target feature data.

The first designated object may be a vehicle, an obstacle, a traffic light, a pedestrian, a traffic signboard or the like, the second designated object may be a road surface, a tree, a white cloud or the like, the first designated object may be the same as or different from the designated object, and the disclosure does not limit the same.

Optionally, the multitask perception model is obtained by training in the manner shown in fig. 4, and fig. 4 is a flowchart of a training method of the multitask perception model according to an exemplary embodiment of the present disclosure; as shown in fig. 4, the training method of the multitask perception model may include:

step 401, obtaining multiple sets of driving environment image samples, where each set of driving environment image sample includes multiple frames of driving environment sample images and annotation data of a current sensing task, and different driving environment image samples include annotation data of different sensing tasks.

Step 402, training a preset initial model by taking the multiple groups of running environment image samples as training data to obtain the multitask perception model.

The preset initial model comprises an initial feature extraction module and a plurality of initial task decoding modules, wherein the initial feature extraction module comprises an image block cutting initial sub-module, an initial encoder and a data storage sub-module, and the initial task decoding module comprises one or more initial decoders.

It should be noted that the initial encoder may be encoders in transform, the initial decoder may be decoders, and the data storage submodule may be any data storage unit with a storage function in the prior art. The image dicing initial sub-module may be an Embedding algorithm unit.

According to the technical scheme, the multi-task perception model can be effectively trained, and due to the fact that different task decoding modules in the multi-task perception model are independent of one another, other task decoding modules can be added according to follow-up perception requirements, and therefore convenience can be brought to expansion of follow-up tasks.

Fig. 5 is a block diagram of an image processing apparatus shown in an exemplary embodiment of the present disclosure; as shown in fig. 5, the image processing apparatus, applied to a vehicle, may include:

a first obtaining module 501 configured to obtain a current driving environment image of the vehicle;

a second obtaining module 502, configured to input the current driving environment image into a preset multitask perception model, where the multitask perception model includes a feature extraction module and multiple task decoding modules, different task decoding modules are used to complete different perception tasks, and first target feature data corresponding to the current driving environment image is extracted through the feature extraction module;

a third obtaining module 503 configured to obtain second target feature data corresponding to the historical driving environment image;

a determining module 504 configured to input the first target feature data and the second target feature data into each of the task decoding modules to obtain a task processing result output by each of the task decoding modules.

According to the technical scheme, each task processing result is obtained through the plurality of mutually independent task decoding modules, the multi-task processing speed can be effectively improved, and due to the fact that different tasks can be processed in parallel, the task processing efficiency can be effectively improved, the mutual influence of the task processing results can be reduced, and therefore the accuracy of the task processing results can be effectively improved.

Optionally, the feature extraction module comprises an image slicing submodule, an encoder and a data storage submodule, an output of the image slicing submodule is coupled to an input of the encoder, an output of the encoder is coupled to an input of the data storage submodule, an output of the data storage submodule is coupled to each of the plurality of task decoding modules respectively,

the image cutting submodule is used for dividing the current driving environment image into a plurality of image areas and acquiring a feature vector corresponding to each image area;

the encoder comprises a self-attention layer, a data storage submodule and a data storage submodule, wherein the self-attention layer is used for performing feature extraction on a plurality of feature vectors corresponding to a plurality of image areas through the self-attention layer so as to obtain first target feature data corresponding to the current driving environment image, and the first target feature data are input into the data storage submodule;

the data storage sub-module is configured to store the first target feature data corresponding to the current driving environment image output by the encoder.

Optionally, the second obtaining module 502 is configured to:

in response to the current driving environment image being received, dividing the current driving environment image into a plurality of image areas through the image partitioning sub-module, and acquiring a feature vector corresponding to each image area;

extracting features of a plurality of feature vectors corresponding to the image areas through the self-attention layer to obtain first target feature data corresponding to the current driving environment image, and inputting the first target feature data into the data storage submodule;

Optionally, the third obtaining module 503 is configured to:

Optionally, the task decoding module comprises one or more decoders, and the determining module 504 is configured to:

and performing task processing on the received first target characteristic data and the second target characteristic data through the one or more decoders to obtain the task processing result.

the image segmentation class task decoding module is used for determining a lane line position and/or a travelable area in a vehicle traveling environment according to the first target characteristic data and the second target characteristic data.

training a preset initial model by taking the multiple groups of driving environment image samples as training data to obtain the multi-task perception model; the preset initial model comprises an initial feature extraction module and a plurality of initial task decoding modules, wherein the initial feature extraction module comprises an image block cutting initial sub-module, an initial encoder and a data storage sub-module, and the initial task decoding module comprises one or more initial decoders.

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

The apparatus may be a part of a stand-alone electronic device, for example, in an embodiment, the apparatus may be an Integrated Circuit (IC) or a chip, where the IC may be one IC or a collection of multiple ICs; the chip may include, but is not limited to, the following categories: a GPU (Graphics Processing Unit), a CPU (Central Processing Unit), an FPGA (Field Programmable Gate Array), a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an SOC (System on Chip, SOC, System on Chip, or System on Chip), and the like. The integrated circuit or chip may be configured to execute executable instructions (or code) to implement the image processing method. Where the executable instructions may be stored in the integrated circuit or chip or may be retrieved from another device or apparatus, such as an integrated circuit or chip that includes a processor, memory, and an interface for communicating with other devices. The executable instructions may be stored in the memory, and when executed by the processor, implement the image processing method described above; alternatively, the integrated circuit or chip may receive executable instructions through the interface and transmit the executable instructions to the processor for execution, so as to implement the image processing method.

FIG. 6 is a functional block diagram schematic of a vehicle, shown in an exemplary embodiment. The vehicle 600 may be configured in a fully or partially autonomous driving mode. For example, the vehicle 600 may acquire environmental information of its surroundings through the sensing system 620 and derive an automatic driving strategy based on an analysis of the surrounding environmental information to implement full automatic driving, or present the analysis result to the user to implement partial automatic driving.

Vehicle 600 may include various subsystems such as infotainment system 610, perception system 620, decision control system 630, drive system 640, and computing platform 650. Alternatively, vehicle 600 may include more or fewer subsystems, and each subsystem may include multiple components. In addition, each of the sub-systems and components of the vehicle 600 may be interconnected by wire or wirelessly.

In some embodiments, the infotainment system 610 may include a communication system 611, an entertainment system 612, and a navigation system 613.

The communication system 611 may comprise a wireless communication system that may communicate wirelessly with one or more devices, either directly or via a communication network. For example, the wireless communication system may use 3G cellular communication, such as CDMA, EVD0, GSM/GPRS, or 4G cellular communication, such as LTE. Or 5G cellular communication. The wireless communication system may communicate with a Wireless Local Area Network (WLAN) using WiFi. In some embodiments, the wireless communication system may utilize an infrared link, bluetooth, or ZigBee to communicate directly with the device. Other wireless protocols, such as various vehicular communication systems, for example, a wireless communication system may include one or more Dedicated Short Range Communications (DSRC) devices that may include public and/or private data communications between vehicles and/or roadside stations.

The entertainment system 612 may include a display device, a microphone, and a sound box, and a user may listen to a broadcast in the car based on the entertainment system, playing music; or the mobile phone is communicated with the vehicle, screen projection of the mobile phone is realized on the display equipment, the display equipment can be in a touch control type, and a user can operate the display equipment by touching the screen.

In some cases, the voice signal of the user may be acquired through a microphone, and certain control of the vehicle 600 by the user, such as adjusting the temperature in the vehicle, etc., may be implemented according to the analysis of the voice signal of the user. In other cases, music may be played to the user through a stereo.

The navigation system 613 may include a map service provided by a map provider to provide navigation of a route of travel for the vehicle 600, and the navigation system 613 may be used in conjunction with a global positioning system 621 and an inertial measurement unit 622 of the vehicle. The map service provided by the map provider can be a two-dimensional map or a high-precision map.

The sensing system 620 may include several types of sensors that sense information about the environment surrounding the vehicle 600. For example, the sensing system 620 may include a global positioning system 621 (the global positioning system may be a GPS system, a beidou system or other positioning system), an Inertial Measurement Unit (IMU) 622, a laser radar 623, a millimeter wave radar 624, an ultrasonic radar 625, and a camera 626. The sensing system 620 may also include sensors of internal systems of the monitored vehicle 600 (e.g., an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensor data from one or more of these sensors may be used to detect the object and its corresponding characteristics (position, shape, orientation, velocity, etc.). Such detection and identification is a critical function of the safe operation of the vehicle 600.

Global positioning system 621 is used to estimate the geographic location of vehicle 600.

The inertial measurement unit 622 is used to sense a pose change of the vehicle 600 based on the inertial acceleration. In some embodiments, inertial measurement unit 622 may be a combination of accelerometers and gyroscopes.

Lidar 623 utilizes laser light to sense objects in the environment in which vehicle 600 is located. In some embodiments, lidar 623 may include one or more laser sources, laser scanners, and one or more detectors, among other system components.

The millimeter-wave radar 624 utilizes radio signals to sense objects within the surrounding environment of the vehicle 600. In some embodiments, in addition to sensing objects, the millimeter-wave radar 624 may also be used to sense the speed and/or heading of objects.

The ultrasonic radar 625 may sense objects around the vehicle 600 using ultrasonic signals.

The camera 626 is used to capture image information of the surroundings of the vehicle 600. The image capturing device 626 may include a monocular camera, a binocular camera, a structured light camera, a panoramic camera, and the like, and the image information acquired by the image capturing device 626 may include still images or video stream information.

Decision control system 630 includes a computing system 631 that makes analytical decisions based on information obtained by sensing system 620, and decision control system 630 further includes a vehicle controller 632 that controls the powertrain of vehicle 600, and a steering system 633, throttle 634, and brake system 635 for controlling vehicle 600.

The computing system 631 may be operable to process and analyze the various information acquired by the perception system 620 in order to identify objects, and/or features in the environment surrounding the vehicle 600. The targets may include pedestrians or animals, and the objects and/or features may include traffic signals, road boundaries, and obstacles. Computing system 631 may use object recognition algorithms, Motion from Motion (SFM) algorithms, video tracking, and like techniques. In some embodiments, the computing system 631 may be used to map an environment, track objects, estimate the speed of objects, and so forth. The computing system 631 may analyze the various information obtained and derive a control strategy for the vehicle.

The vehicle controller 632 may be used to perform coordinated control on the power battery and the engine 641 of the vehicle to improve the power performance of the vehicle 600.

The steering system 633 is operable to adjust the heading of the vehicle 600. For example, in one embodiment, a steering wheel system.

The throttle 634 is used to control the operating speed of the engine 641 and, in turn, the speed of the vehicle 600.

The brake system 635 is used to control the deceleration of the vehicle 600. The braking system 635 may use friction to slow the wheel 644. In some embodiments, the braking system 635 may convert the kinetic energy of the wheels 644 into electrical current. The braking system 635 may also take other forms to slow the rotational speed of the wheels 644 to control the speed of the vehicle 600.

The drive system 640 may include components that provide powered motion to the vehicle 600. In one embodiment, the drive system 640 may include an engine 641, an energy source 642, a transmission 643, and wheels 644. The engine 641 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a hybrid engine consisting of a gasoline engine and an electric motor, a hybrid engine consisting of an internal combustion engine and an air compression engine. The engine 641 converts the energy source 642 into mechanical energy.

Examples of energy sources 642 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electrical power. The energy source 642 may also provide energy to other systems of the vehicle 600.

The transmission 643 may transmit mechanical power from the engine 641 to the wheels 644. The transmission 643 may include a gearbox, a differential, and a drive shaft. In one embodiment, the transmission 643 may also include other devices, such as clutches. Wherein the drive shaft may include one or more axles that may be coupled to one or more wheels 644.

Some or all of the functionality of the vehicle 600 is controlled by the computing platform 650. Computing platform 650 can include at least one processor 651, and processor 651 can execute instructions 653 stored in a non-transitory computer-readable medium, such as memory 652. In some embodiments, computing platform 650 may also be a plurality of computing devices that control individual components or subsystems of vehicle 600 in a distributed manner.

The processor 651 may be any conventional processor, such as a commercially available CPU. Alternatively, the processor 651 may also include a processor such as a Graphics Processor Unit (GPU), a Field Programmable Gate Array (FPGA), a System On Chip (SOC), an Application Specific Integrated Circuit (ASIC), or a combination thereof. Although fig. 6 functionally illustrates a processor, memory, and other elements of a computer in the same block, those skilled in the art will appreciate that the processor, computer, or memory may actually comprise multiple processors, computers, or memories that may or may not be stored within the same physical housing. For example, the memory may be a hard drive or other storage medium located in a different enclosure than the computer. Thus, reference to a processor or computer will be understood to include reference to a collection of processors or computers or memories that may or may not operate in parallel. Rather than using a single processor to perform the steps described herein, some components, such as the steering component and the retarding component, may each have their own processor that performs only computations related to the component-specific functions.

In the disclosed embodiment, the processor 651 may perform the image processing method described above.

In various aspects described herein, the processor 651 may be located remotely from the vehicle and in wireless communication with the vehicle. In other aspects, some of the processes described herein are executed on a processor disposed within the vehicle and others are executed by a remote processor, including taking the steps necessary to perform a single maneuver.

In some embodiments, the memory 652 may contain instructions 653 (e.g., program logic), which instructions 653 may be executed by the processor 651 to perform various functions of the vehicle 600. The memory 652 may also contain additional instructions, including instructions to send data to, receive data from, interact with, and/or control one or more of the infotainment system 610, the perception system 620, the decision control system 630, the drive system 640.

In addition to instructions 653, memory 652 may also store data such as road maps, route information, the location, direction, speed, and other such vehicle data of the vehicle, as well as other information. Such information may be used by the vehicle 600 and the computing platform 650 during operation of the vehicle 600 in autonomous, semi-autonomous, and/or manual modes.

Computing platform 650 may control functions of vehicle 600 based on inputs received from various subsystems (e.g., drive system 640, perception system 620, and decision control system 630). For example, computing platform 650 may utilize input from decision control system 630 in order to control steering system 633 to avoid obstacles detected by perception system 620. In some embodiments, the computing platform 650 is operable to provide control over many aspects of the vehicle 600 and its subsystems.

Optionally, one or more of these components described above may be mounted or associated separately from the vehicle 600. For example, the memory 652 may exist partially or completely separate from the vehicle 600. The above components may be communicatively coupled together in a wired and/or wireless manner.

Optionally, the above components are only an example, in an actual application, components in the above modules may be added or deleted according to an actual need, and fig. 6 should not be construed as limiting the embodiment of the present disclosure.

An autonomous automobile traveling on a road, such as vehicle 600 above, may identify objects within its surrounding environment to determine an adjustment to the current speed. The object may be another vehicle, a traffic control device, or another type of object. In some examples, each identified object may be considered independently, and based on the respective characteristics of the object, such as its current speed, acceleration, separation from the vehicle, etc., may be used to determine the speed at which the autonomous vehicle is to be adjusted.

Optionally, the vehicle 600 or a sensory and computing device associated with the vehicle 600 (e.g., computing system 631, computing platform 650) may predict behavior of the identified object based on characteristics of the identified object and the state of the surrounding environment (e.g., traffic, rain, ice on the road, etc.). Optionally, each identified object depends on the behavior of each other, so it is also possible to predict the behavior of a single identified object taking all identified objects together into account. The vehicle 600 is able to adjust its speed based on the predicted behavior of the identified object. In other words, the autonomous vehicle is able to determine what steady state the vehicle will need to adjust to (e.g., accelerate, decelerate, or stop) based on the predicted behavior of the object. In this process, other factors may also be considered to determine the speed of the vehicle 600, such as the lateral position of the vehicle 600 in the road being traveled, the curvature of the road, the proximity of static and dynamic objects, and so forth.

In addition to providing instructions to adjust the speed of the autonomous vehicle, the computing device may also provide instructions to modify the steering angle of the vehicle 600 to cause the autonomous vehicle to follow a given trajectory and/or maintain a safe lateral and longitudinal distance from objects in the vicinity of the autonomous vehicle (e.g., vehicles in adjacent lanes on the road).

The vehicle 600 may be any type of vehicle, such as a car, a truck, a motorcycle, a bus, a boat, an airplane, a helicopter, a recreational vehicle, a train, etc., and the disclosed embodiment is not particularly limited.

In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the image processing method described above when executed by the programmable apparatus.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image processing method, applied to a vehicle, comprising:

acquiring a current running environment image of the vehicle;

2. The image processing method of claim 1, wherein the feature extraction module comprises an image slicing sub-module, an encoder, and a data storage sub-module, an output of the image slicing sub-module being coupled to an input of the encoder, an output of the encoder being coupled to an input of the data storage sub-module, an output of the data storage sub-module being coupled to each of the plurality of task decoding modules, respectively,

the encoder comprises a self-attention layer, a data storage submodule and a data storage submodule, wherein the self-attention layer is used for performing feature extraction on a plurality of feature vectors corresponding to a plurality of image areas through the self-attention layer to obtain first target feature data corresponding to the current driving environment image, and the first target feature data are input into the data storage submodule;

3. The image processing method according to claim 2, wherein the extracting, by the feature extraction module, first target feature data corresponding to the current driving environment image includes:

4. The image processing method according to claim 2, wherein the acquiring of the second target feature data corresponding to the historical driving environment image includes:

and reading the second target characteristic data corresponding to the historical driving environment image from the data storage submodule according to the identification information.

5. The image processing method according to claim 1, wherein the task decoding modules comprise one or more decoders, and the inputting the first target feature data and the second target feature data into each of the task decoding modules to obtain the task processing result output by each of the task decoding modules comprises:

6. The image processing method according to claim 1, wherein the plurality of task decoding modules includes at least one of a position detection-class task decoding module, an image segmentation-class task decoding module, and a category detection-class task decoding module,

the image segmentation task decoding module is used for determining a lane line position and/or a travelable area in a vehicle traveling environment according to the first target feature data and the second target feature data.

7. The image processing method according to claim 6,

the position detection task decoding module comprises one or more of a traffic light position detection task decoding module, a vehicle position detection task decoding module, a pedestrian position detection task decoding module, an obstacle position detection task decoding module, a lamp post position detection task decoding module and a traffic identification position detection task decoding module;

the category detection task decoding module comprises a weather category detection task decoding module or a driving road category detection task decoding module.

8. The image processing method according to any one of claims 1 to 7, wherein the multitask perception model is trained by:

9. An image processing apparatus, applied to a vehicle, comprising:

10. A vehicle, characterized by comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

acquiring a plurality of frames of driving environment images;

sequentially extracting target characteristic data corresponding to each frame of driving environment image through the characteristic extraction module to obtain a plurality of target characteristic data corresponding to the multi-frame driving environment images;

and inputting the plurality of target characteristic data into each task decoding module to obtain a task processing result output by each task decoding module.

11. A computer-readable storage medium, on which computer program instructions are stored, which program instructions, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 8.

12. A chip comprising a processor and an interface; the processor is configured to read instructions to perform the method of any one of claims 1-8.