CN117882117A

CN117882117A - Image processing method, device and system and movable platform

Info

Publication number: CN117882117A
Application number: CN202280057529.XA
Authority: CN
Inventors: 魏笑
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2024-04-12
Also published as: WO2023178510A1

Abstract

The embodiment of the disclosure provides an image processing method, an image processing device, an image processing system and a movable platform, wherein the method comprises the following steps: acquiring an N Zhang Yangben image, wherein the sample image is an image acquired by a vehicle for surrounding environment in the running process; determining a target pixel region in each of the sample images, the target pixel region being an imaging region of traffic elements in the surrounding environment associated with an autopilot decision of the vehicle; acquiring the information quantity of the target pixel area corresponding to each sample image in the N sample images; and selecting M sample images from the N sample images according to the information quantity of the target pixel area, wherein M is smaller than N, M and N are positive integers, and the M sample images are used for training a machine learning model related to automatic driving decision of a vehicle.

Description

Image processing method, device and system and movable platform

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to an image processing method, an image processing device, an image processing system and a movable platform.

Background

In order to improve the performance of the machine learning model, data mining is required, that is, extracting corner case (corner case) data, which causes the machine learning model to fail, perform poorly, or even fail, from a data pool to adjust model parameters of the machine learning model. The related art generally performs data mining based on the information amount of the data to be mined, however, such a data mining manner is greatly interfered by background noise, and the data mining accuracy is low.

Disclosure of Invention

In a first aspect, an embodiment of the present disclosure provides an image processing method, including:

acquiring an N Zhang Yangben image, wherein the sample image is an image acquired by a vehicle for surrounding environment in the running process;

determining a target pixel region in each of the sample images, the target pixel region being an imaging region of traffic elements in the surrounding environment associated with an autopilot decision of the vehicle;

acquiring the information quantity of the target pixel area corresponding to each sample image in the N sample images;

and selecting M sample images from the N sample images according to the information quantity of the target pixel area, wherein M is smaller than N, M and N are positive integers, and the M sample images are used for training a machine learning model related to automatic driving decision of a vehicle.

In a second aspect, an embodiment of the present disclosure provides an image processing apparatus, the apparatus including a processor for performing the steps of:

In a third aspect, embodiments of the present disclosure provide an image processing system, the system comprising:

the visual sensor is deployed on the vehicle and is used for acquiring images of surrounding environments in the running process of the vehicle to obtain N sample images;

a processor for determining a target pixel region in each of the sample images, the target pixel region being an imaged region of a traffic element in the surrounding environment associated with an autonomous driving decision of the vehicle; acquiring the information quantity of the target pixel area corresponding to each sample image in the N sample images; selecting M sample images from the N sample images according to the information quantity of the target pixel area, wherein M is smaller than N, and M and N are positive integers;

and the server is used for training the copy of the machine learning model of the vehicle based on the M Zhang Yangben image and deploying the trained machine learning model on the vehicle.

In a fourth aspect, embodiments of the present disclosure provide a movable platform comprising:

the visual sensor is used for collecting images of surrounding environments in the running process of the movable platform to obtain N sample images;

and the electronic control unit is used for carrying out automatic driving decision on the movable platform based on the output result of a machine learning model deployed on the movable platform, the machine learning model is used for training and obtaining based on M sample images determined from the N sample images, and the M sample images are obtained based on the method disclosed in any embodiment of the disclosure.

In a fifth aspect, embodiments of the present disclosure provide a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a method as described in any of the embodiments of the present disclosure.

The embodiment of the disclosure determines an imaging area of a traffic element associated with an automatic driving decision of the vehicle, namely a target pixel area, from a sample image, focuses on only the information amount of the target pixel area when acquiring the information amount, and performs data mining of the sample image based on the information amount of the target pixel area. Thus, the interference of elements irrelevant to the automatic driving decision of the vehicle is reduced when the information quantity is acquired, so that the interference of background noise to the data mining process is reduced, and the data mining accuracy is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.

FIG. 1 is a schematic diagram of a data mining process.

Fig. 2A and 2B are schematic diagrams of an uncertainty of an object in different images, respectively.

Fig. 3 is a flowchart of an image processing method of an embodiment of the present disclosure.

Fig. 4A, 4B, 4C, and 4D are schematic diagrams of determining a target pixel region based on characteristics of the pixel region according to an embodiment of the present disclosure, respectively.

Fig. 5A, 5B, and 5C are schematic diagrams of determining a target pixel region based on a feature of an object in an embodiment of the present disclosure, respectively.

Fig. 6A and 6B are schematic diagrams of determining a target pixel region based on a viewing angle of a vision sensor according to an embodiment of the present disclosure, respectively.

Fig. 7 is a schematic diagram of a system architecture of an embodiment of the present disclosure.

Fig. 8 is a schematic diagram of the overall flow of an embodiment of the present disclosure.

Fig. 9 is a schematic diagram of an application scenario of an embodiment of the present disclosure.

Fig. 10 is a schematic structural view of an image processing apparatus according to an embodiment of the present disclosure.

Fig. 11 is a schematic diagram of an image processing system of an embodiment of the present disclosure.

Fig. 12 is a schematic diagram of a movable platform of an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

Machine learning models (abbreviated as models) are typically composed of heterogeneous, functional neurons to perform specific machine learning tasks. The machine learning task may be a regression task, a classification task, or a combination of both. In general, the larger and more complex the model, the better its performance. The machine learning model needs to be trained with sample data before it can be used to perform machine learning tasks. However, the actual sample data collected is often repetitive, redundant, unbalanced for training of the machine learning model, and in many cases, a small fraction of the categories occupy most of the sample data, while a large fraction of the categories have very few sample data, a problem known as long tail problem of data. In order to improve the performance of the machine learning model, as shown in fig. 1, data mining is generally performed by extracting part of data from a data pool as mining results by a mining algorithm, wherein the mining results are expected to be corner case (corner case) data which causes the machine learning model to fail and not perform well or even not meet, and model parameters of the machine learning model are adjusted by using the mining results, so that a model with better performance is obtained.

The data pool refers to mass data to be mined, and generally refers to the sum of all acquired data input as a model in a certain task scene, and usually does not include or only includes limited annotation information. The types of data in the data pool are different according to different task scenes, including but not limited to various modes of data such as images, videos, audios, texts and the like, and the data in multiple modes can coexist in the same task scene. The data pool can be cloud or local. Either as a single node or as a distributed storage system. The data organization mode and the data structure in the data pool are not required, and only single-frame image output is supported. Individual attention mechanism algorithms may require samples that are continuous in time, in which case the data pool is required to hold and the physical time that the samples can be retrieved.

In view of the huge and complex data pool in practical situations, data mining is generally implemented by adopting a pure algorithm or a semi-manual data mining means. The mining algorithm comprises an uncertainty sampling (uncertainty sampling), a diversity sampling (diversity sampling), an objection sampling (disagreement based sampling) and other algorithms, wherein the information quantity of a sample to be mined is calculated through a sampling model, and then data mining is carried out according to the information quantity. For example, in an uncertainty sampling algorithm, the amount of information is proportional to the model predicted uncartinty size; in the diversity sampling algorithm, the information quantity is proportional to the level of the diversity of the data; in the objection sampling algorithm, the amount of information is proportional to the degree of objection between sampling models.

However, the inventors found that the above data mining methods all perform information amount estimation from the overall dimension of the sample, and do not provide sufficiently fine granularity to estimate the information amount of the sample when performing data mining, which introduces redundant information and noise, resulting in lower data mining accuracy.

For example, in the task of L2 autopilot object detection, some samples that are prone to false detection of cyclists due to missed detection need to be mined, and the amount of sample information is estimated by adopting an uncertainty sampling algorithm, as shown in fig. 2A, which gives a very high uncertainty for the area 201 including cyclists and a lower uncertainty for the area 202 including motor vehicles. However, this does not represent that the frame image will be mined with a high priority, since uncertainty noise due to background noise will result in other irrelevant samples having a higher uncertainty and thus a higher priority.

Shown in fig. 2B, within the dashed box 203 is a row of bicycles on a sidewalk. Although it is reasonable to give here a very high uncerty, since "bicycle object" and "cyclist" are easily confused, the decision of the L2 autopilot system is not affected even if the bicycle on the sidewalk is misdetected. The user is only concerned with the objects on the traffic lane preferentially, or hopefully, the missed detection false detection of the related objects on the traffic lane is mined with higher priority during data mining.

As can be seen from the above examples, different systems, different machine learning tasks may all have their own specific data mining requirements. For each sample to be mined, the user may be concerned about certain areas with higher priority. The data mining systems of the related art do not support this need well.

Based on this, the embodiment of the disclosure provides an image processing method, which performs data mining based on an attention mechanism, and can identify and divide a target pixel area which is artificially defined from each image sample by an automatic algorithm under the condition of having or not having expert knowledge prior, so as to be used as a minimum unit for calculating information quantity in the data mining, thereby achieving the purpose of eliminating the interference of unimportant factors by 'focusing attention' in the mining process. Referring to fig. 3, the method includes:

step 301: acquiring an N Zhang Yangben image, wherein the sample image is an image acquired by a vehicle for surrounding environment in the running process;

step 302: determining a target pixel region in each of the sample images, the target pixel region being an imaging region of traffic elements in the surrounding environment associated with an autopilot decision of the vehicle;

Step 303: acquiring the information quantity of the target pixel area corresponding to each sample image in the N sample images;

step 304: and selecting M sample images from the N sample images according to the information quantity of the target pixel area, wherein M is smaller than N, M and N are positive integers, and the M sample images are used for training a machine learning model related to automatic driving decision of a vehicle.

In step 301, the surrounding environment may be a road environment in which the vehicle is traveling or is parked, and one or more traffic elements may be included in the road environment, where the traffic elements in the road environment may include traffic elements associated with an automatic driving decision of the vehicle, and may include traffic elements unrelated to the automatic driving decision of the vehicle. In some embodiments, the traffic elements may include vehicle self elements and external traffic environment elements, which in turn include static environment elements, dynamic environment elements, traffic participant elements, and/or weather elements, among others. The vehicle itself elements include basic attributes of the vehicle itself (e.g., weight, geometric information, performance information, etc.), location information (e.g., coordinate information, lane information where it is located, etc.), movement state information (e.g., lateral movement state and longitudinal movement state), and/or driving task information (e.g., perception recognition, path planning, man-machine interaction, networking communications, etc.). The static environment element refers to an object in a static state in a traffic environment, and comprises roads, traffic facilities, surrounding landscapes, obstacles and the like. The dynamic environment element refers to an element in a traffic environment which is dynamically changed, and includes a dynamic indication facility (e.g., traffic signal lamp, variable traffic sign, traffic police, etc.) and communication environment information (e.g., signal strength information, electromagnetic interference information, signal delay information, etc.). Traffic participant elements include object information that affects decision-making of vehicles, such as pedestrians, animals, and/or other vehicles surrounding the vehicle. The meteorological elements include information such as ambient temperature, lighting conditions and/or weather conditions in the driving scene.

The surrounding environment can be subjected to image acquisition, and N sample images are obtained. The N sample images may include both images acquired by a vision sensor on the vehicle and images acquired by a monitoring device provided in a running environment of the vehicle. The number of the vehicles can be more than or equal to 1, and the collection efficiency of the sample images can be improved by commonly collecting the sample images through the visual sensors on the vehicles. The monitoring device may comprise several monitoring cameras arranged around the traffic lane when the vehicle is traveling on the traffic lane. The N sample images may include a single image or one or more video frames in a video.

In step 302, a target pixel region may be determined for each sample image. Wherein the target pixel area is an imaging area of a traffic element in the surrounding environment associated with an autonomous driving decision of the vehicle. The traffic elements associated with the autonomous driving decisions of the vehicle generally refer to traffic elements that may have an impact on the autonomous driving decisions. When a vehicle runs on a traffic lane, other vehicles on the traffic lane, pedestrians and animals passing on a zebra crossing in front of the traffic lane, traffic lights at intersections around the traffic lane, natural environment during running and other elements can influence automatic driving decisions. For example, when there are a vehicle a and a vehicle B on a traffic lane, the vehicle a needs to determine its own travel path and travel speed based on the position and the movement speed of the vehicle B to avoid collision with the vehicle B. For another example, when a traffic light exists at an intersection around a traffic lane, a vehicle needs to determine whether it can pass through the intersection based on the state of the traffic light.

One or more target pixel areas may or may not be included in one sample image. If a sample image does not include the target pixel region, the sample image may be discarded directly. If one sample image includes one or more target pixel regions, the sample image may be used for processing in a subsequent step. A specific manner of determining the target pixel region is exemplified below.

In some embodiments, a target pixel region may be determined based on features of each pixel region in the sample image, or a target pixel region may be determined based on features of an object included in the sample image, or a target pixel region may be determined based on a perspective of the vision sensor, and a task determination target pixel region based on the machine learning model is acquired. The target pixel region may also be determined in common based on two or more of the above. The following describes various ways of determining the target pixel region one by one.

(1) Determining a target pixel region based on characteristics of each pixel region in the sample image

In some embodiments, the characteristics of a pixel region include, but are not limited to, the location, depth, pixel values, and/or semantics of the pixel region. The position of a pixel region may be the position of the pixel region in the physical space, or the pixel position of the pixel region in the sample image, and the position may be an absolute position or a relative position. The depth may be a depth from a certain pixel point in the pixel region or a certain object to an image capturing device that captures a sample image to which the pixel region belongs. The pixel values may include pixel values of some or all of the pixel points in the pixel region. The semantics may be used to characterize the class of traffic elements (e.g., lane class, pavement class, traffic light class, etc.) corresponding to pixels in the pixel region.

In the case where the characteristic of the pixel region includes the position of the pixel region, the pixel region within the preset position range may be determined as the target pixel region. The predetermined location range may be a continuous location interval (e.g., greater than or equal to a certain lower location limit, and/or less than or equal to a certain upper location limit), or may be a discrete location point or points. Fig. 4A shows a schematic diagram when the position is a pixel position, and the preset position range is a central block of pixel area in the sample image, as indicated by a dashed box in the figure. It is assumed that the vehicle 401 is traveling on a road, and that the vehicle 401 is at a different position on the road at time T1 than at time T2. At time T1, a sample image P1 is acquired by a camera (not shown in the figure) on the right side of the vehicle 401, and the dog 402 is included in the sample image P1, and at time T2, a sample image P2 is acquired by the camera on the right side of the vehicle 401, and the pedestrian 403 is included in the sample image P2. It can be seen that, no matter where the pixel region in the dashed box is located in the physical space, and no matter what object is included in the pixel region in the dashed box, the same pixel region (i.e., the pixel region within the dashed box) is taken as the target pixel region in the acquired sample image. Of course, the preset position range may be other pixel regions centered in the sample image, and the size and number of the preset position ranges are not limited to those shown in the figure.

Fig. 4B shows a schematic diagram when the position is a position of the pixel region in the physical space, and the white oval region is a field of view of the camera 404, which is variable, and the gray oval region represents the preset position range. Assuming that at time T1, the dog 402 is in a preset position range within the field of view S1 of the camera 404, and at time T2, the pedestrian 403 is in a preset position range within the field of view S2 of the camera 404, the target pixel areas are shown as dashed boxes in the sample images P3 and P4 acquired at both times. It can be seen that, regardless of the variation of the field of view of the camera 404, in the acquired sample image, the corresponding pixel region in the sample image at the same physical location is taken as the target pixel region. Of course, the preset position range may be other areas than the area shown in the figure, and the size and number of the preset position ranges are not limited to those shown in the figure.

Where the characteristics of the pixel region include the depth of the pixel region, a pixel region within a preset depth range may be determined as the target pixel region, and the preset depth range may be a continuous depth interval (e.g., greater than or equal to a certain lower depth limit, and/or less than or equal to a certain upper depth limit), or may be a discrete depth point or points. As shown in fig. 4C, assuming that at a certain moment, the depths of the dog 402, the pedestrian 403 and the vehicle 401 are within the preset depth range, the pixel area including the dog 402 and the pixel area including the pedestrian 403 in the collected sample image P5 are both target pixel areas (as shown by the dashed boxes in the figure). The figure shows the case that two objects within the preset depth range are included in the same sample image, and in practical cases, the number of the objects within the preset depth range included in the same sample image may also be other numbers, and each object may be acquired by the same camera or may be acquired by different cameras.

In the case that the feature of the pixel region includes the semantics of the pixel region, a pixel region of a preset semantic category may be determined as the target pixel region. As shown in fig. 4D, the semantic categories of the pixel regions in the sample image include a motor vehicle lane category and a sidewalk category, one or both of which may be determined as target pixel regions. Of course, those skilled in the art will appreciate that the manner of classifying semantic categories is not limited to those illustrated in the figures, and that, for example, semantic categories may be more finely classified, for example, motor vehicle lanes may be further classified into a left-turn lane category, a straight-run lane category, a right-turn lane category, and the like. In addition to lane categories, semantic categories may include traffic light categories, pedestrian categories, ground indication line categories, and the like.

In the case where the feature of the pixel region includes a pixel value of the pixel region, a pixel region including a pixel point of a preset pixel value may be determined as the target pixel region. For example, a pixel region including red pixel points may be determined as the target pixel region.

(2) A target pixel region is determined based on a feature of an object included in the sample image.

Characteristics of an object include, but are not limited to, at least one of a class, a speed of movement, a size of the object. The class may be used to characterize what traffic element the object belongs to, the moving speed may be an absolute speed or a relative speed, and the size may be a pixel size or a size of the object in a physical space.

An object with a preset feature can be determined from the image, and a pixel area where the object with the preset feature is located in the sample image is determined as a target pixel area. The predetermined characteristic may be belonging to a predetermined category, a movement speed within a predetermined speed range, and/or a size within a predetermined size range. As shown in fig. 5A, assuming that the sample image includes an object of the "pedestrian" category and an object of the "dog" category, and the "pedestrian" category is a preset category, a pixel region where the object of the "pedestrian" category is located may be determined as a target pixel region.

In some embodiments, a target object having a preset feature may be identified from the sample image; and determining the pixel area where the target object is and the pixel areas where other objects with the same category as the target object are located in the sample image as target pixel areas. As shown in fig. 5B, a target object whose moving speed is not 0 may be identified from the sample image, and if the category of the target object is pedestrian a, then other pedestrians other than pedestrian a may be identified from the sample image, and if pedestrian B and pedestrian C are identified, then the pixel region where pedestrian a is located, the pixel region where pedestrian B is located, and the pixel region where pedestrian C is located may be determined as target pixel regions (as shown by the dashed-line boxes in the figure).

In some embodiments, the sample image comprises a multi-frame target video frame in a video. In this case, a target object having a preset feature may be identified from one of the frames of reference video; tracking the target object to determine a pixel area including the target object in each frame of target video frame; and determining a pixel area including the target object in each frame of target video frame as a target pixel area. As shown in fig. 5C, assuming that F1, F2, and F3 are multi-frame target video frames in the video, these target video frames may be continuous or discontinuous. If the preset feature is that the category belongs to the category of "pedestrian", the video frame F1 may be first identified, and if the pedestrian a is identified, then the pedestrian a may be tracked, so as to identify the pedestrian a in the video frames F2 and F3, respectively. Assuming that the pixel positions of the pedestrian a in F1, F2, and F3 are shown in the figure, respectively, the pixel regions including the pedestrian a in F1, F2, and F3 can be determined as target pixel regions, respectively, as shown by the dashed-line boxes in the figure.

In some embodiments, the preset features are determined based on the semantic category of the pixel region in which the object is located, i.e. different preset features may be determined for different pixel regions, respectively. Taking the preset characteristics as an example of preset categories, for a pixel area where a road is located, traffic elements in the pixel area, which influence automatic driving decisions of vehicles, are mainly objects in categories such as motor vehicles, non-motor vehicles, pedestrians and the like, so that one or more categories such as motor vehicles, non-motor vehicles, pedestrians and the like can be determined as preset categories corresponding to the pixel area where the road is located; whereas traffic elements within other pixel areas (other than the pixel area where the road is located) that affect the autonomous driving decisions of the vehicle may mainly include traffic signals, the traffic signal category may be determined as a preset category corresponding to the other pixel areas.

(3) A target pixel region is determined based on a viewing angle of the vision sensor. For example, a pixel region acquired by the vision sensor within a preset viewing angle range may be determined as the target pixel region. In some embodiments, the preset viewing angle range is less than the total viewing angle range of the vision sensor. As shown in fig. 6A, assume that the total view angle range of the vision sensor is α ₁ The view angle range enables imaging of the light gray region 601, and since the distortion level of the image edge is generally higher than that of the image center region, it can be determined that one is less than alpha ₁ Is a viewing angle range alpha of (2) ₂ Viewing angle range alpha ₂ The dark gray areas 602 can be imaged. The pixel area corresponding to the dark gray area 602 is thus the target pixel area.

In some embodiments, the preset viewing angle range may be an overlapping viewing angle range of two or more viewing angle sensors. As shown in fig. 6B, two vision sensors including overlapping viewing angle ranges are taken as an example, wherein oval areas 603 and 604 are the respective viewing angles of the two vision sensors, and the overlapping ranges of the viewing angles of the two sensors are shown as the area with oblique lines. The pixel region corresponding to the overlapping range may be determined as the target pixel region.

(4) The target pixel region is determined based on a data mining task. A data mining task may correspond to several regions and different data mining tasks may correspond to different regions. Multiple data mining tasks may be performed on the same set of data. For example, when the data mining task is "mining a blue car" or "mining a vehicle on a motor vehicle lane", a pixel region corresponding to the motor vehicle lane may be determined as the target pixel region; when the data mining task is "mining an object on a sidewalk", a pixel region corresponding to the sidewalk may be determined as a target pixel region.

When the target pixel area is actually determined, the target pixel area may be determined based on any one of the above manners, or may be determined based on at least two of the above manners, for example, a pixel area of an object that belongs to a preset semantic category and includes a preset feature may be determined as the target pixel area, and when the preset semantic category is a motor vehicle lane category and the preset feature is a bicycle category, a pixel area including a bicycle on the motor vehicle lane may be determined as the target pixel area. The target pixel area may also be determined in combination with at least any of the above and other ways, which are not further illustrated herein. The target pixel area can be determined in different modes under different scenes, so that the flexibility and the expandability of the scheme are improved. The mining algorithm can be adapted at very low cost when the definition of corner cases changes, i.e. the mining criteria changes.

In step 303, the information amount of the target pixel region may be determined in various ways of determining the information amount, including, but not limited to, the aforementioned uncertainty sampling, diversity sampling, or objection sampling. Since only the information amount of the target pixel area is focused when the information amount is acquired, data mining of the sample image is performed based on the information amount of the target pixel area. Thus, the interference of elements irrelevant to the automatic driving decision of the vehicle is reduced when the information quantity is acquired, so that the interference of background noise to the data mining process is reduced, and the data mining accuracy is improved.

In step 304, the sample image may be scored according to the information amount of the target pixel area in the sample image, to obtain a scoring value of the sample image; and selecting M sample images from the N sample images according to the scoring values of the N sample images. The scoring value of a sample image may be positively correlated or inversely correlated with the probability that the sample image is selected. Taking the positive correlation case as an example, the scoring values of the sample images may be ranked in order from large to small, and the M sample images ranked first may be selected from them. Of course, other manners may be used to select M sample images, which will not be described herein.

Aspects of embodiments of the present disclosure may be implemented using the architecture shown in fig. 7. Wherein a data pool (database) 701 is used to store sample images to be mined, which can be processed via an attention node (attention node) 702 to determine a target pixel region. The method for determining the target pixel area may be any of the foregoing methods, and specific algorithms may be a tracking (tracking) algorithm, a segmentation (segmentation) algorithm, or the like. In tracking algorithms, the user is only concerned with the characteristics of a certain dynamic object in the image, such as a car. The target vehicle is framed in the first frame of the time series data, then tracking algorithm is adopted to automatically track the frame in each frame of image, and the target pixel area is determined based on the tracking result. In the segmentation algorithm, the user only concerns features of certain areas in the picture, for example only the motor vehicle lane area. The image is segmented by a semantic segmentation (semantic segmentation) network, and only the region corresponding to the pixel points of the category of the "motor vehicle lane" is reserved as a target pixel region.

After determining the target pixel area, the target pixel area may be sent to a mining node 703, and the mining node 703 may determine the information amount of the target pixel area by adopting an uncertainty sampling, diversity sampling, and the like, and mine M sample images based on the information amount. The mined M sample images may be stored in the data pool 701, or may be output to another processing unit. The manner in which the target pixel region is determined, the algorithm employed by the attention node 702, and the algorithm employed by the mining node 703 may all be entered via a graphical user interface (Graphical User Interface, GUI) 704. The screened M sample images can be screened for a second time on the GUI, or the screened sample images can be directly stored in the data pool by inputting corresponding instructions on the GUI.

In some embodiments, the M sample images may be manually screened to obtain K sample images. Because a certain error may exist in the automatic mining mode, the embodiment of the disclosure further performs manual screening on the mined M sample images to obtain K sample images, and uses the K Zhang Yangben images to train a machine learning model related to an automatic driving decision of a vehicle so as to improve a training effect. Where K may be less than or equal to M. According to the embodiment of the disclosure, automatic data mining is performed on a large number of sample images in the data pool, manual screening is used as assistance, and meanwhile mining efficiency and accuracy of mining results are guaranteed.

As shown in fig. 8, the screened sample image may be used to train a machine learning model associated with the automated driving decisions of the vehicle. The automatic driving decision of the vehicle replaces a human driver to decide and control the driving state of the vehicle according to the perception information, so that the functions of lane keeping, lane departure early warning, vehicle distance keeping, obstacle warning and the like are realized. The autopilot decision may be implemented based on a machine learning model deployed on the vehicle, which may include, but is not limited to, various detection models, recognition models, classification models, and the like. For example, the traffic elements on the road can be identified by the identification model to determine the traffic light therein, so as to determine whether the current intersection can be passed or not according to the information of the traffic light. For another example, the distance between the preceding vehicle and the host vehicle may be detected by the detection model, so as to determine whether deceleration is required. Since autopilot decisions may involve a variety of machine learning tasks, a machine learning model deployed on a vehicle may include a plurality of machine learning models that perform different machine learning tasks.

The machine learning model deployed on the vehicle can be obtained by training based on the mined sample image and the description truth value corresponding to the traffic elements in the sample image, and the description truth value adopted when the machine learning model for executing different machine learning tasks is trained can be different. For example, the description truth value adopted by the machine learning model for performing the classification task is the category of each pixel point in the sample image, and the description truth value adopted by the machine learning model for performing the detection task is the distance from the vehicle detected in the sample image to the host vehicle.

In some embodiments, the M sample images may be input into a truth calibration system 801 to obtain descriptive truth values corresponding to the traffic elements in the M sample images; and training the machine learning model based on the M sample images and the description truth values corresponding to the traffic elements in the M sample images. The truth value calibration system 801 may obtain the description truth value corresponding to the traffic element in the sample image through automatic calibration, semi-automatic calibration or manual calibration. The calibration accuracy and calibration efficiency of different true value calibration systems are different, for example, manual calibration is less efficient but more accurate, while automatic or semi-automatic calibration is more efficient but less accurate. Therefore, the calibration efficiency and accuracy of the true calibration system need to be balanced.

In some automatic calibration systems, a machine learning model with better performance can be trained in the cloud in advance, the task executed by the machine learning model is the same as that executed by the machine learning model deployed on a vehicle, and the accuracy of the calibration result of the machine learning model is higher than a preset accuracy threshold, so that the output result of the machine learning model can be directly used as the description true value. For example, traffic signals can be identified from the sample image by an identification model deployed in the cloud, and descriptive truth values of the colors (red, yellow, green) of the traffic signals are output. The sample image and the descriptive truth of the colors of the traffic lights therein are then used to train a machine learning model deployed on the vehicle to enable the machine learning model deployed on the vehicle to accurately determine whether the intersection is capable of passing for the colors of the traffic lights.

In some semiautomatic calibration systems, the output result of the machine learning model deployed on the vehicle for the sample image may be obtained first, if the automatic driving decision result output by the decision system of the vehicle for the sample image is normal, the output result of the machine learning model deployed on the vehicle is used as the description truth value, otherwise, the description truth value corresponding to the traffic element in the sample image is determined by means of manual calibration. For example, the distance of the preceding vehicle from the host vehicle may be detected by a detection model deployed on the vehicle. If at a certain moment, an automatic driving decision result output by a decision system of the vehicle aiming at a sample image indicates that the vehicle runs forward at the current speed, but the situation that the vehicle collides with the front vehicle occurs, the automatic driving decision result is abnormal, so that the inaccuracy of the vehicle distance output by a machine learning model deployed on the vehicle can be determined, the distance between the front vehicle and the vehicle in the sample image can be determined in a manual calibration mode, and the manually calibrated distance is used as a corresponding description truth value.

In some embodiments, when performing manual calibration, each sample image in the M sample images may be displayed at a calibration interface, and the target pixel region in the sample image is identified; detecting the calibration operation of a user on the traffic element, and acquiring a true value calibration result based on the calibration operation; and taking the true value calibration result as the description true value. The calibration operation may include deleting, modifying, and adding the original calibration results. The precalibrated true value of the associated traffic element can be displayed on the calibration interface; and if the confirmation operation of the user on the pre-calibration true value is detected, determining the pre-calibration true value as the true value calibration result. Otherwise, if the adjustment operation of the user on the pre-calibration true value is detected, an adjusted calibration result is obtained, and the adjusted calibration result can be determined to be the true value calibration result.

For example, for the task of identifying traffic signals, a pre-calibration true value may be displayed on a display interface, which may be a bounding box of traffic signals in an image. And if the confirmation operation of the user on the bounding box is detected, determining the bounding box as the true value calibration result. Otherwise, if the adjustment operation of the bounding box by the user is detected, for example, the size and/or the position of the bounding box are adjusted, the adjusted bounding box is determined to be the true value calibration result.

In addition to the above-listed approaches, the true value descriptions may be obtained based on other approaches, which are not listed here.

After the description truth value of the sample images is obtained, the target pixel area can be intercepted from each sample image in the M sample images; and training the machine learning model based on the target pixel areas corresponding to the M sample images and the description truth values corresponding to the traffic elements in the M sample images. Or, the machine learning model may be trained directly based on the M sample images and the description truth values corresponding to the traffic elements in the M sample images. The trained machine learning model may be deployed to a vehicle.

In some embodiments, in addition to screening out sample images for training a machine learning model based on the amount of information of the target pixel region, the sample images may also be screened out based on other information. For example, a running state of the vehicle may be detected; and acquiring P Zhang Yangben images acquired before and/or after the moment of detecting the abnormal driving state, wherein P is a positive integer, and the M sample images and the P sample images are jointly used for training a machine learning model related to the automatic driving decision of the vehicle. The P sample images may be partially or completely included in the M sample images, or may be other images than the M sample images, that is, the P Zhang Yangben image may be partially or completely identical to the M sample images. The running state may include a running speed, a running direction, and the like, and when the running state includes a running speed, if the rate of change of the running speed exceeds a certain threshold (e.g., sudden braking of the vehicle), the running state may be considered abnormal. When the running state includes the running direction, if the rate of change of the running direction exceeds a certain threshold (e.g., a sharp turn), or the vehicle collides with an obstacle after turning, the running state may be considered abnormal. In addition, the driving state may include other states, and driving abnormality in various driving states may be determined based on actual scenes, which are not listed here.

For another example, a decision result output by a decision system of the vehicle may be obtained, where the decision result is used for performing decision planning on a driving state of the vehicle; q Zhang Yangben images acquired before and/or after the moment of outputting the wrong decision result are used for training a machine learning model related to automatic driving decision of the vehicle, wherein Q is a positive integer, and the M sample images and the Q sample images are used together. The Q sample images may be partially or completely included in the M sample images, or may be other images than the M sample images, that is, the Q Zhang Yangben image may be partially or completely identical to the M sample images. For example, if the decision result of the vehicle indicates that the vehicle collides with an obstacle after traveling at the current speed, or if the decision result indicates that the vehicle turns on a straight lane, an erroneous decision result is determined. In addition, the case of erroneous decision results may also include other cases, which are not listed here.

In some embodiments, the M Zhang Yangben image, the P sample images, and the Q sample images may be employed simultaneously to co-train a machine learning model of a vehicle. Since the sample images aimed at when the driving state is abnormal and the decision result is wrong may be sample images which make the machine learning model perform poorly, mining these sample images helps to improve the performance of the machine learning model.

In some embodiments, the vehicle is provided with a first automatic driving right; after training the machine learning model, setting an automatic driving authority of the vehicle to a second automatic driving authority, the second automatic driving authority being higher than the first automatic driving authority. For example, the first automatic driving authority may be an L2 automatic driving authority, and the second automatic driving authority may be an L3 automatic driving authority. After the machine learning model is trained, the trained machine learning model may be tested using the test image to determine a performance of the machine learning model, and the second autopilot authority may be determined based on the performance of the machine learning model. By adopting the embodiment, the automatic driving permission in the capacity range can be automatically set for the vehicle, and the safety of automatic driving is improved.

Fig. 9 is a schematic diagram of an application scenario according to an embodiment of the disclosure. In the initial state, the vehicle 901 is provided with a first automatic driving authority under which the vehicle 901 does not have an automatic path planning authority. Sample images can be acquired through a vision sensor on the vehicle 901 and sent to the cloud for screening, or the vehicle 901 can screen the sample images, and screened sample data can be used for training a machine learning model at the cloud. After training, the cloud may issue a machine learning model to the vehicle 901. Since the vehicle 901 has a certain capability of detecting and recognizing the surrounding environment, the second automatic driving authority can be set for the vehicle 901. Under this automatic driving authority, the vehicle 901 has an automatic route planning authority. The vehicle 901 may plan a path R based on the output result of the machine learning model, and perform automatic driving based on the path R.

The embodiment of the disclosure aims to solve the long tail problem of model iteration in machine learning model deployment, and simultaneously support the requirement of a user on focusing part of regions in data mining, thereby providing a data mining framework based on an attention mechanism and defining the software form of the data mining framework in production practice. The software system of the disclosed embodiments may provide a complete set of data mining functions. The present disclosure has the following advantages:

(1) When the sample information amount is estimated, the method can focus on the region (namely the target pixel region) which is interested by the user in each image sample, eliminate the interference of other background, noise and other non-concerned contents, and improve the quality of data mining.

(2) The application field is wide, and the regression and classification and the combination of the two are compatible with machine learning models and tasks.

(3) The semi-automatic and automatic data mining process is supported, and the artificial participation can be reduced as much as possible.

(4) The mining criteria are scalable and when the definition of corner cases changes, i.e., the mining criteria change, the mining algorithm can be adapted at very low cost.

The present disclosure also provides an image processing apparatus including a processor for performing the steps of:

In some embodiments, the processor is specifically configured to: and determining a target pixel region based on the characteristics of each pixel region in the sample image.

In some embodiments, the processor is specifically configured to: the characteristics of the pixel area comprise the position of the pixel area, and the target pixel area is a pixel area in a preset position range; the characteristics of the pixel region comprise the depth of the pixel region, and the target pixel region is a pixel region within a preset depth range; the characteristics of the pixel area comprise pixel values of the pixel area, and the target pixel area is a pixel area comprising pixel points with preset pixel values; the characteristics of the pixel areas comprise semantics of the pixel areas, and the target pixel areas are pixel areas with preset semantic categories.

In some embodiments, the processor is specifically configured to: a target pixel region is determined based on a feature of an object included in the sample image.

In some embodiments, the characteristics of one object include at least one of a category, a speed of movement, a size of the object.

In some embodiments, the sample image comprises a multi-frame target video frame in a video; the processor is specifically configured to: identifying a target object with preset characteristics from a frame of reference video frame in the video; tracking the target object to determine a pixel area including the target object in each frame of target video frame; and determining a pixel area including the target object in each frame of target video frame as a target pixel area.

In some embodiments, the processor is specifically configured to: identifying a target object with preset characteristics from the sample image; and determining the pixel area where the target object is and the pixel areas where other objects with the same category as the target object are located in the sample image as target pixel areas.

In some embodiments, the predetermined feature is determined based on a semantic category of a pixel region in which the object is located.

In some embodiments, the sample image is acquired by a vision sensor on the vehicle; the processor is specifically configured to: a target pixel region is determined based on a viewing angle of the vision sensor.

In some embodiments, the target pixel area is an image acquired by the vision sensor within a preset viewing angle range.

In some embodiments, the target pixel region is determined based on a data mining task.

In some embodiments, the processor is further configured to: detecting a running state of the vehicle; and acquiring P Zhang Yangben images acquired before and/or after the moment of detecting the abnormal driving state, wherein P is a positive integer, and the M sample images and the P sample images are jointly used for training a machine learning model related to the automatic driving decision of the vehicle.

In some embodiments, the processor is further configured to: obtaining a decision result output by a decision system of the vehicle, wherein the decision result is used for carrying out decision planning on the running state of the vehicle; q Zhang Yangben images acquired before and/or after the moment of outputting the wrong decision result are used for training a machine learning model related to automatic driving decision of the vehicle, wherein Q is a positive integer, and the M sample images and the Q sample images are used together.

In some embodiments, the processor is specifically configured to: scoring the sample image according to the information quantity of the target pixel area in the sample image to obtain a scoring value of the sample image; and selecting M sample images from the N sample images according to the scoring values of the N sample images.

In some embodiments, the processor is further configured to: and manually screening the M sample images to obtain K sample images, wherein K is a positive integer, K is smaller than or equal to M, and the K Zhang Yangben image is used for training a machine learning model related to automatic driving decision of the vehicle.

In some embodiments, the processor is further configured to: inputting the M sample images into a truth value calibration system to obtain description truth values corresponding to the traffic elements in the M sample images; and training the machine learning model based on the M sample images and the description truth values corresponding to the traffic elements in the M sample images.

In some embodiments, the processor is specifically configured to: intercepting the target pixel area from each of the M sample images; and training the machine learning model based on the target pixel areas corresponding to the M sample images and the description truth values corresponding to the traffic elements in the M sample images.

In some embodiments, the vehicle is provided with a first automatic driving right; the processor is further configured to: after training the machine learning model, the automatic driving permission of the vehicle is set to a second automatic driving permission, which is higher than the first automatic driving permission.

In some embodiments, the processor is specifically configured to: displaying each sample image in the M sample images on a calibration interface, and marking the target pixel area in the sample images; detecting the calibration operation of a user on the traffic element, and acquiring a true value calibration result based on the calibration operation; and taking the true value calibration result as the description true value.

In some embodiments, the processor is specifically configured to: displaying a pre-calibration true value of the associated traffic element on the calibration interface; if the confirmation operation of the user on the pre-calibration true value is detected, determining the pre-calibration true value as the true value calibration result; and/or; displaying a pre-calibration true value of the associated traffic element on the calibration interface; if the adjustment operation of the user on the pre-calibration true value is detected, an adjusted calibration result is obtained; and determining the adjusted calibration result as the true value calibration result.

The functions implemented by the processor in the above embodiment of the apparatus are detailed in the foregoing method embodiment, and are not repeated here.

Fig. 10 shows a schematic hardware configuration of an image processing apparatus, which may include: a processor 1001, a memory 1002, an input/output interface 1003, a communication interface 1004, and a bus 1005. Wherein the processor 1001, the memory 1002, the input/output interface 1003, and the communication interface 1004 realize communication connection between each other inside the device through the bus 1005.

The processor 1001 may be implemented by using a general-purpose CPU (Central Processing Unit ), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure. The processor 1001 may also include a graphics card, which may be an Nvidia titanium X graphics card, a 1080Ti graphics card, or the like.

The Memory 1002 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), a static storage device, a dynamic storage device, or the like. Memory 1002 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1002 and executed by processor 1001.

The input/output interface 1003 is used to connect with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

The communication interface 1004 is used to connect to a communication module (not shown in the figure) to enable the present device to interact with other devices through communication. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 1005 includes a path for transferring information between components of the device (e.g., processor 1001, memory 1002, input/output interface 1003, and communication interface 1004).

It should be noted that, although the above-described device only shows the processor 1001, the memory 1002, the input/output interface 1003, the communication interface 1004, and the bus 1005, in the implementation, the device may further include other components necessary to achieve normal operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.

As shown in fig. 11, the present disclosure also provides an image processing system, the system comprising:

the vision sensor 1101 is deployed on a vehicle and is used for acquiring images of surrounding environments in the running process of the vehicle to obtain N sample images;

a processor 1102 for determining a target pixel region in each of the sample images, the target pixel region being an imaged region of a traffic element in the surrounding environment associated with an autonomous driving decision of the vehicle; acquiring the information quantity of the target pixel area corresponding to each sample image in the N sample images; selecting M sample images from the N sample images according to the information quantity of the target pixel area, wherein M is smaller than N, and M and N are positive integers;

a server 1103 for training a copy of the machine learning model of the vehicle based on the M Zhang Yangben image and deploying the trained machine learning model onto the vehicle.

The vision sensor 1101 may be a monocular vision sensor, a binocular vision sensor, or other type of vision sensor. To improve the safety of the vehicle, a plurality of visual sensors 1101 may be deployed on the vehicle, with different visual sensors 1101 being located at different orientations of the vehicle. For example, one visual sensor 1101 may be disposed on each of left and right rear view mirrors of the vehicle, and one or more visual sensors 1101 may be disposed on the rear side of the vehicle. The processor 1102 may be deployed on a vehicle or at the cloud. The functions performed by the processor 1102 are detailed in the foregoing method embodiments, and are not described herein. The server 1103 may be deployed at the cloud, and may train a copy of the machine learning model of the vehicle by using the M sample images and the description truth values corresponding to the sample images, and deploy the trained machine learning model to the vehicle.

As shown in fig. 12, the present disclosure further provides a movable platform, wherein the movable platform includes:

the vision sensor 1201 is used for acquiring images of surrounding environment in the running process of the movable platform to obtain N sample images;

the electronic control unit 1202 is configured to perform an automatic driving decision on the mobile platform based on an output result of a machine learning model deployed on the mobile platform, where the machine learning model is obtained by training based on M sample images determined from the N sample images, and the M sample images are obtained based on the method according to any embodiment of the disclosure.

The movable platform can include, but is not limited to, various devices such as vehicles, airplanes, ships, movable robots and the like, and in some application scenarios, the movable platform is an automatic driving vehicle, unmanned aerial vehicle, unmanned ship and the like, and can realize autonomous movement by sensing and decision planning on surrounding environment, and can also move under the control of a user.

The vision sensor 1201 may be a monocular vision sensor, a binocular vision sensor, or other type of vision sensor. Multiple vision sensors 1101 may be deployed on a movable platform, with different vision sensors 1101 located at different orientations of the movable platform. The electronic control unit 1202 may be deployed on a mobile platform for decision-making planning of travel of the mobile platform, e.g. path planning, speed control, etc. of the mobile platform. The M images for training the movable platform may be obtained by the method in any of the foregoing embodiments, and specific details may be found in the foregoing method embodiments, which are not described herein.

The present description also provides a computer-readable storage medium having stored thereon computer instructions which, when executed, perform the steps of the method of any of the embodiments.

The various technical features in the above embodiments may be arbitrarily combined as long as there is no conflict or contradiction between the combinations of the features, but are not described in detail, so that the arbitrary combination of the various technical features in the above embodiments also falls within the scope of the disclosure of the present specification.

Embodiments of the present description may take the form of a computer program product embodied on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Computer-usable storage media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by the computing device.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

The foregoing description of the preferred embodiments of the present disclosure is not intended to limit the disclosure, but rather to cover all modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present disclosure.

Claims

An image processing method, the method comprising:

acquiring an N Zhang Yangben image, wherein the sample image is an image acquired by a vehicle for surrounding environment in the running process;

Determining a target pixel region in each of the sample images, the target pixel region being an imaging region of traffic elements in the surrounding environment associated with an autopilot decision of the vehicle;

acquiring the information quantity of the target pixel area corresponding to each sample image in the N sample images;

and selecting M sample images from the N sample images according to the information quantity of the target pixel area, wherein M is smaller than N, M and N are positive integers, and the M sample images are used for training a machine learning model related to automatic driving decision of a vehicle.
The method of claim 1, wherein said determining a target pixel region in each of said sample images comprises:

and determining a target pixel region based on the characteristics of each pixel region in the sample image.
The method of claim 2, wherein the determining a target pixel region based on the characteristics of each pixel region in the sample image comprises:

the characteristics of the pixel area comprise the position of the pixel area, and the target pixel area is a pixel area in a preset position range;

the characteristics of the pixel region comprise the depth of the pixel region, and the target pixel region is a pixel region within a preset depth range;

The characteristics of the pixel area comprise pixel values of the pixel area, and the target pixel area is a pixel area comprising pixel points with preset pixel values;

the characteristics of the pixel areas comprise semantics of the pixel areas, and the target pixel areas are pixel areas with preset semantic categories.
The method of claim 1, wherein said determining a target pixel region in each of said sample images comprises:

a target pixel region is determined based on a feature of an object included in the sample image.
The method of claim 4, wherein the characteristics of an object include at least one of a category, a speed of movement, and a size of the object.
The method of claim 4, wherein the sample image comprises a multi-frame target video frame in a video; the determining a target pixel region based on a feature of an object included in the sample image includes:

identifying a target object with preset characteristics from a frame of reference video frame in the video;

tracking the target object to determine a pixel area including the target object in each frame of target video frame;

And determining a pixel area including the target object in each frame of target video frame as a target pixel area.
The method of claim 4, wherein the determining a target pixel region based on the features of the object included in the sample image comprises:

identifying a target object with preset characteristics from the sample image;

and determining the pixel area where the target object is and the pixel areas where other objects with the same category as the target object are located in the sample image as target pixel areas.
The method according to claim 6 or 7, wherein the predetermined features are determined based on semantic categories of a pixel region in which the object is located.
The method of claim 1, wherein the sample image is acquired by a vision sensor on the vehicle; the determining a target pixel region in each sample image includes:

a target pixel region is determined based on a viewing angle of the vision sensor.
The method of claim 9, wherein the target pixel area is an image acquired by the vision sensor over a predetermined range of viewing angles.
The method of claim 1, wherein the target pixel region is determined based on a data mining task.
The method according to claim 1, wherein the method further comprises:

detecting a running state of the vehicle;

and acquiring P Zhang Yangben images acquired before and/or after the moment of detecting the abnormal driving state, wherein P is a positive integer, and the M sample images and the P sample images are jointly used for training a machine learning model related to the automatic driving decision of the vehicle.
The method according to claim 1, wherein the method further comprises:

obtaining a decision result output by a decision system of the vehicle, wherein the decision result is used for carrying out decision planning on the running state of the vehicle;

q Zhang Yangben images acquired before and/or after the moment of outputting the wrong decision result are used for training a machine learning model related to automatic driving decision of the vehicle, wherein Q is a positive integer, and the M sample images and the Q sample images are used together.
The method according to claim 1, wherein selecting M sample images among the N sample images according to the information amount of the target pixel region includes:

Scoring the sample image according to the information amount of the target pixel area in the sample image to obtain a scoring value of the sample image;

and selecting M sample images from the N sample images according to the scoring values of the N sample images.
The method according to claim 1, wherein the method further comprises:

and manually screening the M sample images to obtain K sample images, wherein K is a positive integer, K is smaller than or equal to M, and the K Zhang Yangben image is used for training a machine learning model related to automatic driving decision of the vehicle.
The method according to claim 1, wherein the method further comprises:

inputting the M sample images into a truth value calibration system to obtain description truth values corresponding to the traffic elements in the M sample images;

and training the machine learning model based on the M sample images and the description truth values corresponding to the traffic elements in the M sample images.
The method of claim 16, wherein training the machine learning model based on the M sample images and the descriptive truth values corresponding to the traffic elements in the M sample images comprises:

Intercepting the target pixel area from each of the M sample images;

and training the machine learning model based on the target pixel areas corresponding to the M sample images and the description truth values corresponding to the traffic elements in the M sample images.
The method of claim 16, wherein the vehicle is provided with a first autopilot authority; the method further comprises the steps of:

after training the machine learning model, the automatic driving permission of the vehicle is set to a second automatic driving permission, which is higher than the first automatic driving permission.
The method of claim 16, wherein the obtaining the description truth value corresponding to the traffic element in the M sample images comprises:

displaying each sample image in the M sample images on a calibration interface, and marking the target pixel area in the sample images;

detecting the calibration operation of a user on the traffic element, and acquiring a true value calibration result based on the calibration operation;

and taking the true value calibration result as the description true value.
The method according to claim 19, wherein said obtaining a true calibration result based on said calibration operation comprises:

displaying a pre-calibration true value of the associated traffic element on the calibration interface;

if the confirmation operation of the user on the pre-calibration true value is detected, determining the pre-calibration true value as the true value calibration result;

and/or;

displaying a pre-calibration true value of the associated traffic element on the calibration interface;

if the adjustment operation of the user on the pre-calibration true value is detected, an adjusted calibration result is obtained;

and determining the adjusted calibration result as the true value calibration result.
An image processing apparatus, the apparatus comprising a processor for performing the steps of:

acquiring an N Zhang Yangben image, wherein the sample image is an image acquired by a vehicle for surrounding environment in the running process;

determining a target pixel region in each of the sample images, the target pixel region being an imaging region of traffic elements in the surrounding environment associated with an autopilot decision of the vehicle;

acquiring the information quantity of the target pixel area corresponding to each sample image in the N sample images;

And selecting M sample images from the N sample images according to the information quantity of the target pixel area, wherein M is smaller than N, M and N are positive integers, and the M sample images are used for training a machine learning model related to automatic driving decision of a vehicle.
The apparatus of claim 21, wherein the processor is specifically configured to:

and determining a target pixel region based on the characteristics of each pixel region in the sample image.
The apparatus of claim 22, wherein the processor is specifically configured to:

the characteristics of the pixel area comprise the position of the pixel area, and the target pixel area is a pixel area in a preset position range;

the characteristics of the pixel region comprise the depth of the pixel region, and the target pixel region is a pixel region within a preset depth range;

the characteristics of the pixel area comprise pixel values of the pixel area, and the target pixel area is a pixel area comprising pixel points with preset pixel values;

the characteristics of the pixel areas comprise semantics of the pixel areas, and the target pixel areas are pixel areas with preset semantic categories.
The apparatus of claim 21, wherein the processor is specifically configured to:

A target pixel region is determined based on a feature of an object included in the sample image.
The apparatus of claim 24, wherein the characteristics of an object include at least one of a category, a speed of movement, and a size of the object.
The apparatus of claim 24, wherein the sample image comprises a multi-frame target video frame in a video; the processor is specifically configured to:

identifying a target object with preset characteristics from a frame of reference video frame in the video;

tracking the target object to determine a pixel area including the target object in each frame of target video frame;

and determining a pixel area including the target object in each frame of target video frame as a target pixel area.
The apparatus of claim 24, wherein the processor is specifically configured to:

identifying a target object with preset characteristics from the sample image;

and determining the pixel area where the target object is and the pixel areas where other objects with the same category as the target object are located in the sample image as target pixel areas.
The apparatus of claim 26 or 27, wherein the predetermined feature is determined based on a semantic category of a pixel region in which the object is located.
The apparatus of claim 21, wherein the sample image is acquired by a vision sensor on the vehicle; the processor is specifically configured to:

a target pixel region is determined based on a viewing angle of the vision sensor.
The apparatus of claim 29, wherein the target pixel area is an image acquired by the vision sensor over a predetermined range of viewing angles.
The apparatus of claim 21, wherein the target pixel region is determined based on a data mining task.
The apparatus of claim 21, wherein the processor is further configured to:

detecting a running state of the vehicle;

and acquiring P Zhang Yangben images acquired before and/or after the moment of detecting the abnormal driving state, wherein P is a positive integer, and the M sample images and the P sample images are jointly used for training a machine learning model related to the automatic driving decision of the vehicle.
The apparatus of claim 21, wherein the processor is further configured to:

obtaining a decision result output by a decision system of the vehicle, wherein the decision result is used for carrying out decision planning on the running state of the vehicle;

Q Zhang Yangben images acquired before and/or after the moment of outputting the wrong decision result are used for training a machine learning model related to automatic driving decision of the vehicle, wherein Q is a positive integer, and the M sample images and the Q sample images are used together.
The apparatus of claim 21, wherein the processor is specifically configured to:

scoring the sample image according to the information quantity of the target pixel area in the sample image to obtain a scoring value of the sample image;

and selecting M sample images from the N sample images according to the scoring values of the N sample images.
The apparatus of claim 21, wherein the processor is further configured to:

and manually screening the M sample images to obtain K sample images, wherein K is a positive integer, K is smaller than or equal to M, and the K Zhang Yangben image is used for training a machine learning model related to automatic driving decision of the vehicle.
The apparatus of claim 21, wherein the processor is further configured to:

inputting the M sample images into a truth value calibration system to obtain description truth values corresponding to the traffic elements in the M sample images;

And training the machine learning model based on the M sample images and the description truth values corresponding to the traffic elements in the M sample images.
The apparatus of claim 36, wherein the processor is specifically configured to:

intercepting the target pixel area from each of the M sample images;

and training the machine learning model based on the target pixel areas corresponding to the M sample images and the description truth values corresponding to the traffic elements in the M sample images.
The apparatus of claim 36, wherein the vehicle is provided with a first autopilot authority; the processor is further configured to:

after training the machine learning model, the automatic driving permission of the vehicle is set to a second automatic driving permission, which is higher than the first automatic driving permission.
The apparatus of claim 36, wherein the processor is specifically configured to:

displaying each sample image in the M sample images on a calibration interface, and marking the target pixel area in the sample images;

Detecting the calibration operation of a user on the traffic element, and acquiring a true value calibration result based on the calibration operation;

and taking the true value calibration result as the description true value.
The apparatus of claim 39, wherein the processor is specifically configured to:

displaying a pre-calibration true value of the associated traffic element on the calibration interface;

if the confirmation operation of the user on the pre-calibration true value is detected, determining the pre-calibration true value as the true value calibration result;

and/or;

displaying a pre-calibration true value of the associated traffic element on the calibration interface;

if the adjustment operation of the user on the pre-calibration true value is detected, an adjusted calibration result is obtained;

and determining the adjusted calibration result as the true value calibration result.
An image processing system, the system comprising:

the visual sensor is deployed on the vehicle and is used for carrying out image acquisition on the surrounding environment in the running process of the vehicle to obtain N sample images;

a processor for determining a target pixel region in each of the sample images, the target pixel region being an imaged region of a traffic element in the surrounding environment associated with an autonomous driving decision of the vehicle; acquiring the information quantity of the target pixel area corresponding to each sample image in the N sample images; selecting M sample images from the N sample images according to the information quantity of the target pixel area, wherein M is smaller than N, and M and N are positive integers;

And the server is used for training the copy of the machine learning model of the vehicle based on the M Zhang Yangben image and deploying the trained machine learning model on the vehicle.
A movable platform, the movable platform comprising:

the visual sensor is used for collecting images of surrounding environments in the running process of the movable platform to obtain N sample images;

the electronic control unit is used for carrying out automatic driving decision on the movable platform based on the output result of a machine learning model deployed on the movable platform, the machine learning model is used for training and obtaining based on M sample images determined from the N sample images, and the M sample images are obtained based on the method of any one of claims 1 to 20.
A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method of any of claims 1 to 20.