WO2023178510A1 - 图像处理方法、装置和系统、可移动平台 - Google Patents

图像处理方法、装置和系统、可移动平台 Download PDF

Info

Publication number
WO2023178510A1
WO2023178510A1 PCT/CN2022/082257 CN2022082257W WO2023178510A1 WO 2023178510 A1 WO2023178510 A1 WO 2023178510A1 CN 2022082257 W CN2022082257 W CN 2022082257W WO 2023178510 A1 WO2023178510 A1 WO 2023178510A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixel area
sample images
target pixel
vehicle
sample
Prior art date
Application number
PCT/CN2022/082257
Other languages
English (en)
French (fr)
Inventor
魏笑
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2022/082257 priority Critical patent/WO2023178510A1/zh
Priority to CN202280057529.XA priority patent/CN117882117A/zh
Publication of WO2023178510A1 publication Critical patent/WO2023178510A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management

Definitions

  • the present disclosure relates to the field of artificial intelligence technology, specifically, to image processing methods, devices and systems, and movable platforms.
  • an embodiment of the present disclosure provides an image processing method, which method includes:
  • N sample images which are images of the surrounding environment collected by the vehicle while driving;
  • the target pixel area is an imaging area of traffic elements in the surrounding environment associated with the automatic driving decision of the vehicle
  • M sample images are selected from the N sample images, where M is less than N, and both M and N are positive integers.
  • the M sample images are used for training with the vehicle.
  • Machine learning models related to autonomous driving decision-making.
  • an embodiment of the present disclosure provides an image processing device.
  • the device includes a processor, and the processor is configured to perform the following steps:
  • N sample images which are images of the surrounding environment collected by the vehicle while driving;
  • the target pixel area is an imaging area of traffic elements in the surrounding environment associated with the automatic driving decision of the vehicle
  • M sample images are selected from the N sample images, where M is less than N, and both M and N are positive integers.
  • the M sample images are used for training with the vehicle.
  • Machine learning models related to autonomous driving decision-making.
  • an image processing system which includes:
  • a visual sensor, deployed on the vehicle, is used to collect images of the surrounding environment while the vehicle is driving, and obtain N sample images;
  • a processor configured to determine a target pixel area in each sample image, where the target pixel area is an imaging area of traffic elements associated with the vehicle's automatic driving decision in the surrounding environment; obtain the N The information amount of the target pixel area corresponding to each sample image in the sample images; according to the information amount of the target pixel area, select M sample images from the N sample images, where M is less than N , M and N are both positive integers;
  • a server configured to train a copy of the machine learning model of the vehicle based on the M sample images, and deploy the trained machine learning model to the vehicle.
  • embodiments of the present disclosure provide a movable platform, the movable platform including:
  • a visual sensor used to collect images of the surrounding environment while the movable platform is traveling, and obtain N sample images
  • An electronic control unit configured to make automatic driving decisions on the movable platform based on the output results of a machine learning model deployed on the movable platform, where the machine learning model is used to determine from the N sample images
  • the M sample images are trained based on the method described in any embodiment of the present disclosure.
  • embodiments of the present disclosure provide a computer-readable storage medium on which computer instructions are stored. When the instructions are executed by a processor, the method described in any embodiment of the present disclosure is implemented.
  • the embodiments of the present disclosure determine the imaging area of the traffic element associated with the vehicle's automatic driving decision from the sample image, that is, the target pixel area.
  • the target pixel area When obtaining the information amount, only focus on the information amount of the target pixel area, and based on The amount of information in the target pixel area is used for data mining of sample images. In this way, the interference of elements irrelevant to the vehicle's autonomous driving decision-making is reduced when obtaining the amount of information, thereby reducing the interference of background noise on the data mining process and improving the accuracy of data mining.
  • Figure 1 is a schematic diagram of the data mining process.
  • Figure 2A and Figure 2B are schematic diagrams of uncertainty of objects in different images respectively.
  • Figure 3 is a flow chart of an image processing method according to an embodiment of the present disclosure.
  • 4A, 4B, 4C and 4D are respectively schematic diagrams of determining a target pixel area based on the characteristics of the pixel area according to an embodiment of the present disclosure.
  • 5A, 5B and 5C are respectively schematic diagrams of determining a target pixel area based on the characteristics of an object according to an embodiment of the present disclosure.
  • 6A and 6B are respectively schematic diagrams of determining a target pixel area based on the viewing angle of a visual sensor according to an embodiment of the present disclosure.
  • Figure 7 is a schematic diagram of the system architecture of an embodiment of the present disclosure.
  • Figure 8 is a schematic diagram of the overall process of an embodiment of the present disclosure.
  • Figure 9 is a schematic diagram of an application scenario of an embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of an image processing device according to an embodiment of the present disclosure.
  • Figure 11 is a schematic diagram of an image processing system according to an embodiment of the present disclosure.
  • Figure 12 is a schematic diagram of a movable platform according to an embodiment of the present disclosure.
  • first, second, third, etc. may be used in this disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other.
  • first information may also be called second information, and similarly, the second information may also be called first information.
  • word “if” as used herein may be interpreted as "when” or “when” or “in response to determining.”
  • Machine learning models are usually composed of neurons of different types and functions to perform specific machine learning tasks.
  • the machine learning task can be a regression task, a classification task, or a combination of both.
  • sample data needs to be used to train the machine learning model.
  • the actual collected sample data is often repetitive, redundant, and unbalanced for the training of machine learning models.
  • a small number of categories occupy the majority of the sample data, while most categories only have A very small number of sample data, this problem is called the long tail problem of data.
  • data mining is required.
  • data mining generally refers to extracting part of the data from the data pool as mining results through mining algorithms.
  • the expected mining results are those that cause the machine learning model to fail. , poor performance, or even unseen corner case data, use the mining results to adjust the model parameters of the machine learning model to obtain a model with better performance.
  • the data pool refers to the massive data to be mined. It usually refers to the sum of all data collected as input to the model in a certain task scenario. It usually does not include or only includes limited annotation information.
  • the types of data in the data pool vary according to different task scenarios, including but not limited to data in various modalities such as images, videos, audios, and texts, and data in multiple modalities can coexist in the same task scenario.
  • the data pool can be in the cloud or on-premises. It can be a single node or a distributed storage system. There are no requirements on the data organization method and data structure in the data pool, as long as it supports single-frame image output. Individual attention mechanism algorithms may require temporally consecutive samples, in which case the data pool requires physical time at which the samples can be saved and retrieved.
  • data mining is generally implemented using pure algorithms or semi-manual data mining methods.
  • Mining algorithms include uncertainty sampling (uncertainty sampling), diversity sampling (diversity sampling), disagreement based sampling (disagreement based sampling) and other algorithms. These three methods use a certain sampling model to calculate the amount of information of the sample to be mined, and then Conduct data mining based on the amount of information. For example, in the uncertainty sampling algorithm, the amount of information is proportional to the uncertainty predicted by the model; in the diversity sampling algorithm, the amount of information is proportional to the diversity degree of the data; in the dissent sampling algorithm, the amount of information is proportional to the difference between the sampling models Level of dissent.
  • an uncertainty sampling algorithm is used to estimate the amount of sample information, as shown in Figure 2A.
  • the algorithm includes Area 201 for cyclists is given a high uncertainty and area 202 including motor vehicles is given a low uncertainty. However, this does not mean that the frame image will be mined with a high priority, because the uncertainty noise caused by background noise will cause other irrelevant samples to have higher uncertainty and thus have a higher priority to be mined.
  • embodiments of the present disclosure provide an image processing method that performs data mining based on an attention mechanism and can identify and classify each image sample through an automatic algorithm with or without prior expert knowledge.
  • the artificially defined target pixel area is used as the smallest unit for data mining to calculate the amount of information, so as to achieve the purpose of eliminating interference from unimportant factors through "attention focus" during the mining process.
  • the method includes:
  • Step 301 Obtain N sample images, which are images of the surrounding environment collected by the vehicle while driving;
  • Step 302 Determine a target pixel area in each sample image, where the target pixel area is the imaging area of traffic elements associated with the vehicle's automatic driving decision in the surrounding environment;
  • Step 303 Obtain the information amount of the target pixel area corresponding to each of the N sample images
  • Step 304 According to the information amount of the target pixel area, select M sample images from the N sample images, where M is less than N, M and N are both positive integers, and the M sample images are used for training. Machine learning models relevant to autonomous driving decisions of vehicles.
  • the surrounding environment may be a road environment where the vehicle is running or parked.
  • the road environment may include one or more traffic elements.
  • the traffic elements in the road environment may include automatic driving decisions related to the vehicle.
  • the associated traffic elements may also include traffic elements that are not relevant to the vehicle's autonomous driving decision.
  • the traffic elements may include vehicle own elements and external traffic environment elements, which in turn include static environment elements, dynamic environment elements, traffic participant elements, and/or meteorological elements, etc.
  • the vehicle's own elements include the vehicle's own basic attributes (for example, weight, geometric information, performance information, etc.), location information (for example, coordinate information, lane information, etc.), motion status information (for example, lateral motion status and longitudinal motion status) ) and/or driving task information (for example, perception recognition, path planning, human-computer interaction, networked communication, etc.).
  • Static environmental elements refer to static objects in the traffic environment, including roads, traffic facilities, surrounding landscapes, and obstacles.
  • Dynamic environmental elements refer to dynamically changing elements in the traffic environment, including dynamic indication facilities (such as traffic lights, variable traffic signs, traffic police, etc.) and communication environment information (such as signal strength information, electromagnetic interference information, signal delay information) wait).
  • Traffic participant elements include object information such as pedestrians, animals, and/or other vehicles around the vehicle that affect the vehicle's decision-making planning.
  • Meteorological elements include information such as ambient temperature, lighting conditions and/or weather conditions in the driving scene.
  • Images of the surrounding environment can be collected to obtain N sample images.
  • the N sample images may include images collected by a visual sensor on the vehicle or images collected by a monitoring device installed in the driving environment of the vehicle.
  • the number of vehicles may be greater than or equal to 1.
  • the monitoring device may include several surveillance cameras arranged around the carriageway.
  • the N sample images may include a single image or one or more video frames in the video.
  • a target pixel area may be determined for each sample image.
  • the target pixel area is an imaging area of traffic elements in the surrounding environment that is associated with the vehicle's automatic driving decision.
  • the traffic elements associated with the vehicle's automatic driving decision generally refer to traffic elements that will affect the automatic driving decision.
  • vehicle A and vehicle B on the roadway vehicle A needs to determine its own driving path and speed based on the position and moving speed of vehicle B to avoid collision with vehicle B.
  • traffic lights at an intersection around the carriageway the vehicle needs to determine whether it can pass the intersection based on the status of the traffic light.
  • a sample image may include one or more target pixel areas, or may not include target pixel areas. If a sample image does not include the target pixel area, the sample image can be discarded directly. If a sample image includes one or more target pixel areas, the sample image can be used for processing in subsequent steps. The following is an example of a specific method of determining the target pixel area.
  • the target pixel area may be determined based on the characteristics of each pixel area in the sample image, or the target pixel area may be determined based on the characteristics of the object included in the sample image, or the target may be determined based on the viewing angle of the visual sensor.
  • the pixel area is obtained based on the task performed by the machine learning model to determine the target pixel area.
  • the target pixel area can also be jointly determined based on the above two or more methods. Various methods of determining the target pixel area are explained below.
  • the characteristics of the pixel area include, but are not limited to, the position, depth, pixel value and/or semantics of the pixel area.
  • the position of a pixel region may be the position of the pixel region in physical space, or the pixel position of the pixel region in the sample image.
  • the position may be an absolute position or a relative position.
  • the depth may be the depth from a certain pixel point or a certain object in the pixel area to the image acquisition device that captures the sample image to which the pixel area belongs.
  • the pixel values may include pixel values of some or all pixels in the pixel area.
  • the semantics can be used to characterize the category of the traffic element corresponding to the pixel point in the pixel area (for example, lane category, sidewalk category, traffic light category, etc.).
  • the pixel area within the preset position range may be determined as the target pixel area.
  • the preset position range may be a continuous position interval (for example, greater than or equal to a certain lower limit of the position, and/or less than or equal to a certain upper limit of the position), or it may be one or more discrete position points.
  • FIG. 4A shows a schematic diagram when the position is a pixel position, and the preset position range is a pixel area centered in the sample image, as shown by the dotted box in the figure. Assume that the vehicle 401 is driving on the road, and the vehicle 401 is in a different position on the road at time T1 and at time T2.
  • the sample image P1 is collected through the camera on the right side of the vehicle 401 (not shown in the figure), and the sample image P1 includes the dog 402.
  • the sample image P2 is collected through the camera on the right side of the vehicle 401, And the sample image P2 includes pedestrian 403. It can be seen that no matter where the pixel area in the dotted box is in the physical space, and no matter what kind of objects are included in the pixel area in the dotted box, in the collected sample images, the same pixel area (that is, the dotted line The pixel area within the frame) is used as the target pixel area.
  • the preset position range can also be other pixel areas centered in the image, and the size and number of the preset position range are not limited to those shown in the figure.
  • FIG. 4B shows a schematic diagram when the position is the position of the pixel area in physical space.
  • the white oval area is the field of view range of the camera 404, which is variable, and the gray oval area represents the preset position range.
  • the dog 402 is in the preset position range within the field of view S1 of the camera 404
  • the pedestrian 403 is in the preset position range within the field of view S2 of the camera 404
  • the sample images collected at the two moments In P3 and P4 the target pixel area is shown in the dotted box.
  • the preset position range can also be other areas within the camera's field of view, and the size and number of the preset position range are not limited to those shown in the figure.
  • a pixel area within a preset depth range may be determined as the target pixel area, and the preset depth range may be a continuous depth interval (for example, , greater than or equal to a certain lower depth limit, and/or less than or equal to a certain upper depth limit), or one or more discrete depth points.
  • the collected sample image P5 includes the pixel area of the dog 402 and the pixel area including the pedestrian 403. Both are target pixel areas (shown in the dotted box in the figure).
  • the figure shows a situation where the same sample image includes two objects within the preset depth range.
  • the number of objects within the preset depth range included in the same sample image can also be other. Quantity, each object can be collected by the same camera, or it can be collected by different cameras.
  • the pixel area of the preset semantic category may be determined as the target pixel area.
  • the semantic categories of the pixel areas in the sample image include motor vehicle lane categories and sidewalk categories, one or both of which can be determined as the target pixel area.
  • the classification method of semantic categories is not limited to that shown in the figure.
  • the semantic categories can be divided in more detail.
  • the motor vehicle lane can be further divided into a left-turn lane category and a straight lane category. and right-turn lane categories, etc.
  • semantic categories may also include traffic light categories, pedestrian categories, ground indicator line categories, etc.
  • a pixel area including pixel points with preset pixel values may be determined as the target pixel area.
  • a pixel area including red pixels may be determined as the target pixel area.
  • the characteristics of an object include but are not limited to at least one of the object's category, moving speed, and size.
  • the category can be used to characterize what kind of traffic element the object belongs to, the moving speed can be absolute speed or relative speed, and the size can be pixel size or the size of the object in physical space.
  • Objects with preset characteristics can be determined from the image, and the pixel area in the sample image where the object with preset characteristics is located is determined as the target pixel area.
  • the preset characteristics may include belonging to a preset category, moving speed within a preset speed range, and/or size within a preset size range. As shown in Figure 5A, assuming that the sample image includes objects of the "pedestrian" category and objects of the "dog" category, and the "pedestrian" category is the preset category, the pixel area where the "pedestrian" category object is located can be determined as Target pixel area.
  • a target object with preset characteristics can be identified from the sample image; the pixel area in the sample image where the target object is located and the pixels where other objects of the same category as the target object are located are The area is determined as the target pixel area.
  • the target object whose moving speed is not 0 can be identified from the sample image.
  • the category of the target object is pedestrian A
  • other pedestrians other than pedestrian A can be identified from the sample image.
  • pedestrians B and For pedestrian C the pixel area where pedestrian A is located, the pixel area where pedestrian B is located, and the pixel area where pedestrian C is located can all be determined as the target pixel area (as shown in the dotted box in the figure).
  • the sample images include multiple target video frames in the video.
  • a target object with preset characteristics can be identified from a reference video frame in the video; the target object is tracked to determine whether each target video frame includes the target object.
  • Pixel area determine the pixel area including the target object in each target video frame as the target pixel area.
  • F1, F2 and F3 are multi-frame target video frames in the video. These target video frames may be continuous or discontinuous.
  • video frame F1 can be identified first, assuming pedestrian A is identified, and then pedestrian A can be tracked to identify pedestrian A in video frames F2 and F3 respectively.
  • the pixel positions of pedestrian A in F1, F2 and F3 are as shown in the figure, the pixel areas including pedestrian A in F1, F2 and F3 can be determined as target pixel areas respectively, as shown in the dotted box in the figure.
  • the preset features are determined based on the semantic category of the pixel area where the object is located, that is, different preset features can be determined for different pixel areas.
  • the traffic elements in the pixel area that affect the vehicle's automatic driving decision are mainly motor vehicles, non-motor vehicles, pedestrians and other categories of objects. Therefore, one or more categories such as motor vehicles, non-motor vehicles, and pedestrians can be determined as the preset categories corresponding to the pixel area where the road is located; and other pixel areas (areas other than the pixel area where the road is located) that affect the vehicle
  • the traffic elements of autonomous driving decision-making may mainly include traffic lights. Therefore, the traffic light category can be determined as a preset category corresponding to other pixel areas.
  • the pixel area collected by the visual sensor within a preset viewing angle range can be determined as the target pixel area.
  • the preset viewing angle range is smaller than the total viewing angle range of the vision sensor.
  • the total viewing angle range of the visual sensor is ⁇ 1
  • this viewing angle range can image the light gray area 601. Since the degree of distortion at the edge of the image is generally higher than the degree of distortion in the central area of the image, it can be determined
  • a viewing angle range ⁇ 2 is smaller than ⁇ 1 , and the viewing angle range ⁇ 2 can image the dark gray area 602 . Therefore, the pixel area corresponding to the dark gray area 602 is the target pixel area.
  • the preset viewing angle range may be an overlapping viewing angle range of two or more viewing angle sensors.
  • elliptical areas 603 and 604 are respectively the viewing angles of the two visual sensors.
  • the overlapping range of the viewing angles of the two sensors is as shown in the figure. Shown by the slashed area.
  • the pixel area corresponding to the overlapping range may be determined as the target pixel area.
  • the target pixel area is determined based on the data mining task.
  • One data mining task may correspond to several areas, and different data mining tasks may correspond to different areas. Multiple data mining tasks may be performed on the same set of data. For example, when the data mining task is "mining blue cars" or “mining vehicles on the motorway”, the pixel area corresponding to the motorway can be determined as the target pixel area; when the data mining task is "mining objects on the sidewalk” ”, the pixel area corresponding to the sidewalk can be determined as the target pixel area.
  • the target pixel area can be determined based on any one of the above methods, or the target pixel area can be determined based on at least two of the above methods at the same time.
  • the target pixel area can be determined to belong to a preset semantic category and include a preset semantic category.
  • the pixel area of the characteristic object is determined as the target pixel area.
  • the preset semantic category is the motorway category and the preset feature is the bicycle category
  • the pixel area on the motorway including bicycles can be determined as the target pixel area.
  • the target pixel area can also be determined in combination with at least any of the above methods and other methods, which will not be listed here.
  • the target pixel area can be determined in different ways in different scenarios, thereby improving the flexibility and scalability of the solution.
  • step 303 various methods of determining the amount of information may be used to determine the amount of information in the target pixel area, including but not limited to the aforementioned uncertainty sampling, diversity sampling or dissent sampling. Because when obtaining the amount of information, we only focus on the amount of information in the target pixel area, and perform data mining of the sample image based on the amount of information in the target pixel area. In this way, the interference of elements irrelevant to the vehicle's autonomous driving decision-making is reduced when obtaining the amount of information, thereby reducing the interference of background noise on the data mining process and improving the accuracy of data mining.
  • the sample image can be scored according to the information amount of the target pixel area in the sample image to obtain a score value of the sample image; according to the score values of the N sample images, in M sample images are selected from the N sample images.
  • the score value of a sample image can be positively correlated or inversely correlated with the probability of the sample image being selected. Taking the case of positive correlation as an example, the score values of each sample image can be sorted from large to small, and the top-ranked M sample images can be selected. Of course, other methods can also be used to select M sample images, which will not be described again here.
  • the data pool (database) 701 is used to store sample images to be mined, and the sample images can be processed through the attention node (attention node) 702 to determine the target pixel area.
  • the method of determining the target pixel area can use any of the aforementioned methods, and the specific algorithm can use a tracking algorithm, a segmentation algorithm, etc.
  • the tracking algorithm the user only cares about the characteristics of a dynamic object in the image, such as a car.
  • the target vehicle is framed in the first frame of the time series data, and then the tracking algorithm is used to automatically track the frame in each subsequent frame of the image, and the target pixel area is determined based on the tracking results.
  • the user only cares about the characteristics of certain areas in the picture, for example, only the motorway area.
  • the image is segmented through a semantic segmentation network, and only the area corresponding to the pixels of the "motor lane" category is retained as the target pixel area.
  • the target pixel area can be sent to the mining node (mining node) 703.
  • the mining node 703 can use uncertainty sampling, diversity sampling, etc. to determine the amount of information in the target pixel area, and mine based on the amount of information.
  • the mined M sample images can be stored in the data pool 701 for storage, or can be output to other processing units.
  • the method of determining the target pixel area, the algorithm used by the attention node 702 and the algorithm used by the mining node 703 can all be input through a graphical user interface (Graphical User Interface, GUI) 704. You can also perform secondary screening on the screened M sample images on the GUI, or directly store the screened sample images into the data pool by entering corresponding instructions on the GUI.
  • GUI graphical User Interface
  • the M sample images can be manually screened to obtain K sample images.
  • the automatic mining method may have certain errors. Therefore, the embodiment of the present disclosure further performs manual screening on the mined M sample images to obtain K sample images, and uses these K sample images for training and vehicle automatic driving decision-making. Related machine learning models to improve training results. where K can be less than or equal to M.
  • the embodiments of the present disclosure perform automatic data mining on a large number of sample images in the data pool, and use manual screening as an assistant, while ensuring the mining efficiency and the accuracy of the mining results.
  • the filtered sample images can be used to train machine learning models related to the vehicle's autonomous driving decisions.
  • the vehicle's autonomous driving decision-making is based on sensory information to replace the human driver's decision-making and control of the vehicle's driving status, thereby realizing functions such as lane keeping, lane departure warning, vehicle distance maintenance, and obstacle warning.
  • Autonomous driving decisions can be implemented based on machine learning models deployed on the vehicle.
  • the machine learning models can include but are not limited to various detection models, recognition models, classification models, etc.
  • the recognition model can be used to identify traffic elements on the road to determine the traffic lights among them, so as to determine whether the current intersection can be passed based on the information of the traffic lights.
  • the detection model can be used to detect the distance between the vehicle in front and the own vehicle to determine whether deceleration is needed. Since autonomous driving decisions may involve multiple machine learning tasks, the machine learning model deployed on the vehicle may include multiple machine learning models that perform different machine learning tasks.
  • the machine learning model deployed on the vehicle can be trained based on the mined sample images and the description truth values corresponding to the traffic elements in the sample images.
  • the description truth values used when training machine learning models that perform different machine learning tasks may be different.
  • the true description value used by a machine learning model that performs a classification task is the category of each pixel in the sample image
  • the true description value used by a machine learning model that performs a detection task is the distance from the vehicle detected in the sample image to the own vehicle. distance.
  • the M sample images can be input into the true value calibration system 801 to obtain the description true value corresponding to the traffic element in the M sample images; based on the M sample images and the The machine learning model is trained by describing the true values corresponding to the traffic elements in the M sample images.
  • the true value calibration system 801 can obtain the description true value corresponding to the traffic element in the sample image through automatic calibration, semi-automatic calibration or manual calibration.
  • Different true value calibration systems have different calibration accuracy and calibration efficiency. For example, manual calibration is less efficient but more accurate, while automatic calibration or semi-automatic calibration is more efficient but less accurate. Therefore, the calibration efficiency and accuracy of the true value calibration system need to be weighed.
  • a machine learning model with better performance can be pre-trained in the cloud.
  • the machine learning model performs the same tasks as the machine learning model deployed on the vehicle, and the accuracy of the calibration results of the machine learning model is is higher than the preset accuracy threshold, so that the output result of the machine learning model can be directly used as the described true value.
  • a traffic light can be identified from a sample image through a recognition model deployed in the cloud, and a true description of the color of the traffic light (red light, yellow light, green light) can be output. Then, the sample image and the true value describing the color of the traffic light in it are used to train the machine learning model deployed on the vehicle, so that the machine learning model deployed on the vehicle can accurately determine whether the color of the traffic light can be Pass the intersection.
  • the output results of the machine learning model deployed on the vehicle for the sample image can first be obtained. If the automatic driving decision result output by the vehicle's decision-making system for the sample image is normal, the machine deployed on the vehicle will The output result of the learning model is used as the true description value. Otherwise, the true description value corresponding to the traffic element in the sample image is determined through manual calibration. For example, the distance between the preceding vehicle and the own vehicle can be detected through a detection model deployed on the vehicle. If at a certain moment, the autonomous driving decision result output by the vehicle's decision-making system for a sample image instructs the vehicle to drive forward at the current speed, but the vehicle collides with the vehicle in front, it means that the autonomous driving decision result is abnormal.
  • the distance between the preceding vehicle and the own vehicle in the sample image can be determined through manual calibration, and the manually calibrated distance is used as the corresponding true description value. .
  • each of the M sample images can be displayed on the calibration interface, and the target pixel area in the sample image can be identified; the user's target for the Calibration operation of traffic elements, obtaining a true value calibration result based on the calibration operation; using the true value calibration result as the description true value.
  • the calibration operation may include deleting, modifying original calibration results, and adding calibration results.
  • the pre-calibrated true value of the associated traffic element can be displayed on the calibration interface; if a user's confirmation operation on the pre-calibrated true value is detected, the pre-calibrated true value is determined as the true value calibration result. On the contrary, if the user's adjustment operation on the pre-calibrated true value is detected and the adjusted calibration result is obtained, the adjusted calibration result can be determined as the true value calibration result.
  • a pre-calibrated true value can be displayed on the display interface, and the pre-calibrated true value can be a bounding box of the traffic light in the image. If the user's confirmation operation for the bounding box is detected, the bounding box is determined to be the true value calibration result. On the contrary, if it is detected that the user adjusts the bounding box, for example, adjusts its size and/or position, the adjusted bounding box is determined as the true value calibration result.
  • the target pixel area may be intercepted from each of the M sample images; based on the target pixel area corresponding to the M sample images and the M
  • the machine learning model is trained by describing the true value corresponding to the traffic element in the sample image.
  • the machine learning model can be trained directly based on the M sample images and the description truth values corresponding to the traffic elements in the M sample images.
  • the trained machine learning model can be deployed to the vehicle.
  • the sample images in addition to filtering the sample images for training the machine learning model based on the amount of information in the target pixel area, the sample images can also be filtered based on other information.
  • the driving state of the vehicle can be detected; P sample images collected before and/or after the abnormal driving state is detected are obtained, P is a positive integer, and the M sample images and the P sample images are jointly used to train a machine learning model related to the vehicle's autonomous driving decision-making.
  • the P sample images may be partially or fully included in the M sample images, or may be other images other than the M sample images. That is, the P sample images and the M sample images may be Some or all of them are the same.
  • the driving status may include driving speed, driving direction, etc.
  • the driving status includes driving speed
  • a certain threshold for example, the vehicle brakes suddenly
  • the driving status may be considered abnormal.
  • the driving state includes the driving direction
  • a certain threshold for example, a sharp turn
  • the driving state can be considered abnormal.
  • the driving state may also include other states, and the abnormal driving conditions in various driving states can be determined based on actual scenarios, which will not be listed here.
  • the decision result output by the vehicle's decision-making system can be obtained, and the decision result is used to make decision planning for the driving state of the vehicle; the Q collected before and/or after the time when the wrong decision result is output is used.
  • sample images, Q is a positive integer
  • the M sample images and the Q sample images are jointly used to train a machine learning model related to the vehicle's automatic driving decision-making.
  • the Q sample images may be partially or fully included in the M sample images, or may be other images other than the M sample images. That is, the Q sample images and the M sample images may be Some or all of them are the same.
  • the decision result of the vehicle indicates that the vehicle hits an obstacle after traveling at the current speed, or the decision result indicates that the vehicle turns on the straight lane, it is determined to be an incorrect decision result.
  • the situation where the decision result is wrong can also include other situations, which are not listed here.
  • the M sample images, the P sample images, and the Q sample images may be used simultaneously to jointly train the machine learning model of the vehicle. Since the sample images targeted when the driving status is abnormal and the decision results are wrong may be sample images that make the machine learning model perform poorly, mining these sample images can help improve the performance of the machine learning model.
  • the vehicle is provided with a first automatic driving authority; after training the machine learning model, the automatic driving authority of the vehicle is set to a second automatic driving authority, and the second automatic driving authority
  • the authority is higher than the first autonomous driving authority.
  • the first automatic driving authority may be L2 automatic driving authority
  • the second automatic driving authority may be L3 automatic driving authority.
  • the trained machine learning model can be tested using test images to determine the performance of the machine learning model, and the second autonomous driving authority is determined based on the performance of the machine learning model.
  • FIG. 9 it is a schematic diagram of an application scenario according to an embodiment of the present disclosure.
  • the vehicle 901 is set with the first automatic driving authority. Under this automatic driving authority, the vehicle 901 does not have the automatic path planning authority.
  • Sample images can be collected through the visual sensor on the vehicle 901 and sent to the cloud for screening, or the vehicle 901 itself can be screened.
  • the filtered sample data can be used to train a machine learning model in the cloud.
  • the cloud can deliver the machine learning model to vehicle 901.
  • the second automatic driving authority can be set for the vehicle 901 .
  • vehicle 901 has automatic path planning authority.
  • the vehicle 901 can plan a path R based on the output results of the machine learning model, and perform automatic driving based on the path R.
  • the disclosed embodiments solve the long-tail problem of model iteration in machine learning model deployment, while supporting users' needs to focus on certain areas in data mining, thereby proposing a data mining framework based on an attention mechanism and defining its application in production practice.
  • software form in. The software system of the embodiment of the present disclosure can provide a complete set of data mining functions. This disclosure has the following advantages:
  • the mining standard is extensible. When the definition of corner cases changes, that is, when the mining standard changes, the mining algorithm can be adapted at a very low cost.
  • the present disclosure also provides an image processing device, the device includes a processor, the processor is configured to perform the following steps:
  • N sample images which are images of the surrounding environment collected by the vehicle while driving;
  • the target pixel area is an imaging area of traffic elements in the surrounding environment associated with the automatic driving decision of the vehicle
  • M sample images are selected from the N sample images, where M is less than N, and both M and N are positive integers.
  • the M sample images are used for training with the vehicle.
  • Machine learning models related to autonomous driving decision-making.
  • the processor is specifically configured to determine a target pixel area based on characteristics of each pixel area in the sample image.
  • the processor is specifically configured to: the characteristics of the pixel area include the position of the pixel area, and the target pixel area is a pixel area within a preset position range; the characteristics of the pixel area include The depth of the pixel area, the target pixel area is a pixel area within a preset depth range; the characteristics of the pixel area include the pixel value of the pixel area, the target pixel area is a pixel including a preset pixel value The pixel area of the point; the characteristics of the pixel area include the semantics of the pixel area, and the target pixel area is a pixel area of a preset semantic category.
  • the processor is specifically configured to determine a target pixel area based on characteristics of objects included in the sample image.
  • the characteristics of an object include at least one of the category, movement speed, and size of the object.
  • the sample image includes multiple target video frames in the video; the processor is specifically configured to: identify a target object with preset characteristics from a reference video frame in the video; The target object is tracked to determine the pixel area including the target object in each target video frame; the pixel area including the target object in each target video frame is determined as the target pixel area.
  • the processor is specifically configured to: identify a target object with preset characteristics from the sample image; compare the pixel area where the target object is located in the sample image with the category of the target object. The pixel area where the same other objects are located is determined as the target pixel area.
  • the preset feature is determined based on the semantic category of the pixel area where the object is located.
  • the sample image is collected by a visual sensor on the vehicle; the processor is specifically configured to determine the target pixel area based on the viewing angle of the visual sensor.
  • the target pixel area is an image collected by the visual sensor within a preset viewing angle range.
  • the target pixel area is determined based on a data mining task.
  • the processor is further configured to: detect the driving state of the vehicle; obtain P sample images collected before and/or after the moment when the abnormal driving state is detected, where P is positive An integer, the M sample images and the P sample images are jointly used to train a machine learning model related to the vehicle's autonomous driving decision-making.
  • the processor is further configured to: obtain a decision result output by the vehicle's decision-making system, the decision result being used for decision planning on the driving state of the vehicle; and outputting an incorrect decision result.
  • Q sample images collected before and/or after the moment Q is a positive integer
  • the M sample images and the Q sample images are jointly used to train a machine learning model related to the vehicle's automatic driving decision-making.
  • the processor is specifically configured to: score the sample image according to the information amount of the target pixel area in the sample image to obtain a score value of the sample image; according to the N Score values of sample images, select M sample images from the N sample images.
  • the processor is further configured to manually screen the M sample images to obtain K sample images, where K is a positive integer, K is less than or equal to M, and the K sample images are used for Train machine learning models relevant to the vehicle's autonomous driving decisions.
  • the processor is further configured to: input the M sample images into a true value calibration system to obtain the description true value corresponding to the traffic element in the M sample images; based on the M sample images The machine learning model is trained on the sample images and the description truth values corresponding to the traffic elements in the M sample images.
  • the processor is specifically configured to: intercept the target pixel area from each of the M sample images; and based on the target pixel area corresponding to the M sample images and the The machine learning model is trained by describing the true description values corresponding to the traffic elements in the M sample images.
  • the vehicle is provided with a first automatic driving authority; the processor is further configured to: after training the machine learning model, set the automatic driving authority of the vehicle to a second automatic driving authority. authority, the second automatic driving authority is higher than the first automatic driving authority.
  • the processor is specifically configured to: display each of the M sample images on a calibration interface, and identify the target pixel area in the sample image; detect the user's target for the The calibration operation of the traffic element is performed, and a true value calibration result is obtained based on the calibration operation; the true value calibration result is used as the description true value.
  • the processor is specifically configured to: display the pre-calibrated true value of the associated traffic element on the calibration interface; if a user's confirmation operation on the pre-calibrated true value is detected, set the pre-calibrated true value to the pre-calibrated true value.
  • the value is determined to be the true value calibration result; and/or; the pre-calibrated true value of the associated traffic element is displayed on the calibration interface; if a user's adjustment operation on the pre-calibrated true value is detected, the adjusted calibration result is obtained ; Determine the adjusted calibration result as the true value calibration result.
  • Figure 10 shows a schematic diagram of the hardware structure of an image processing device.
  • the device may include: a processor 1001, a memory 1002, an input/output interface 1003, a communication interface 1004 and a bus 1005.
  • the processor 1001, the memory 1002, the input/output interface 1003 and the communication interface 1004 implement communication connections between each other within the device through the bus 1005.
  • the processor 1001 can be implemented using a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related tasks. program to implement the technical solutions provided by the embodiments of this specification.
  • the processor 1001 may also include a graphics card, which may be an Nvidia titan X graphics card or a 1080Ti graphics card.
  • the memory 1002 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc.
  • the memory 1002 can store operating systems and other application programs.
  • the relevant program codes are stored in the memory 1002 and called and executed by the processor 1001.
  • the input/output interface 1003 is used to connect the input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, indicator lights, etc.
  • the communication interface 1004 is used to connect a communication module (not shown in the figure) to realize communication interaction between this device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 1005 includes a path that carries information between various components of the device (eg, processor 1001, memory 1002, input/output interface 1003, and communication interface 1004).
  • the above device only shows the processor 1001, the memory 1002, the input/output interface 1003, the communication interface 1004 and the bus 1005, during specific implementation, the device may also include necessary components for normal operation. Other components.
  • the above-mentioned device may only include components necessary to implement the embodiments of this specification, and does not necessarily include all components shown in the drawings.
  • an image processing system which includes:
  • Vision sensor 1101, deployed on the vehicle, is used to collect images of the surrounding environment while the vehicle is driving, and obtain N sample images;
  • Processor 1102 configured to determine a target pixel area in each of the sample images, where the target pixel area is an imaging area of traffic elements associated with the automatic driving decision of the vehicle in the surrounding environment; obtain the The information amount of the target pixel area corresponding to each sample image in the N sample images; according to the information amount of the target pixel area, select M sample images from the N sample images, where M is less than N, M and N are all positive integers;
  • the server 1103 is configured to train a copy of the machine learning model of the vehicle based on the M sample images, and deploy the trained machine learning model to the vehicle.
  • the vision sensor 1101 may be a monocular vision sensor, a binocular vision sensor or other types of vision sensors. In order to improve the safety of the vehicle, multiple visual sensors 1101 can be deployed on the vehicle, with different visual sensors 1101 located at different directions of the vehicle. For example, one visual sensor 1101 can be deployed on the left and right rearview mirrors of the vehicle, and one or more visual sensors 1101 can also be deployed on the rear side of the vehicle.
  • the processor 1102 can be deployed on the vehicle or in the cloud. The functions performed by the processor 1102 are detailed in the foregoing method embodiments and will not be described again here.
  • the server 1103 can be deployed in the cloud, and can train a copy of the vehicle's machine learning model by using the filtered M sample images and the description truth values corresponding to the sample images, and deploy the trained machine learning model to the vehicle. superior.
  • the present disclosure also provides a movable platform, characterized in that the movable platform includes:
  • the visual sensor 1201 is used to collect images of the surrounding environment while the movable platform is traveling, and obtain N sample images;
  • the electronic control unit 1202 is configured to make automatic driving decisions on the movable platform based on the output results of the machine learning model deployed on the movable platform.
  • the machine learning model is used to make automatic driving decisions based on the N sample images.
  • the M sample images are determined and obtained through training, and the M sample images are obtained based on the method described in any embodiment of the present disclosure.
  • the movable platform may include but is not limited to vehicles, aircraft, ships, movable robots and other equipment.
  • the movable platform is self-driving vehicles, drones, unmanned ships and other equipment.
  • the movable platform can realize autonomous movement by sensing the surrounding environment and making decisions and planning, or it can move under the control of the user.
  • the vision sensor 1201 may be a monocular vision sensor, a binocular vision sensor or other types of vision sensors. Multiple vision sensors 1101 can be deployed on the movable platform, with different vision sensors 1101 located at different orientations of the movable platform.
  • the electronic control unit 1202 can be deployed on the movable platform and used for decision-making and planning of the travel of the movable platform, for example, path planning, speed control, etc. of the movable platform.
  • the M images used for training the movable platform can be obtained using the method in any of the foregoing embodiments. For specific details, please refer to the foregoing method embodiments and will not be described again here.
  • Embodiments of this specification also provide a computer-readable storage medium.
  • the readable storage medium stores a number of computer instructions. When executed, the computer instructions implement the steps of the method described in any embodiment.
  • Embodiments of the present description may take the form of a computer program product implemented on one or more storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having program code embodied therein.
  • Storage media available for computers include permanent and non-permanent, removable and non-removable media, and can be implemented by any method or technology to store information.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to: phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • CD-ROM compact disc read-only memory
  • DVD digital versatile disc
  • Magnetic tape cassettes tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by

Abstract

本公开实施例提供一种图像处理方法、装置和系统、可移动平台,所述方法包括:获取N张样本图像,所述样本图像为车辆在行驶过程中对周围环境采集的图像;在每一所述样本图像中确定目标像素区域,所述目标像素区域为所述周围环境中与所述车辆的自动驾驶决策相关联的交通元素的成像区域;获取所述N张样本图像中每一所述样本图像对应的所述目标像素区域的信息量;根据所述目标像素区域的信息量,在所述N张样本图像中选择M张样本图像,其中,M小于N,M和N均为正整数,所述M张样本图像用于训练与车辆的自动驾驶决策相关的机器学习模型。

Description

图像处理方法、装置和系统、可移动平台 技术领域
本公开涉及人工智能技术领域,具体而言,涉及图像处理方法、装置和系统、可移动平台。
背景技术
为了提高机器学习模型的性能,需要进行数据挖掘,即从数据池中提取出导致机器学习模型失效的、表现不好的、甚至是没见过的边角案例(corner case)数据来调整机器学习模型的模型参数。相关技术一般基于待挖掘数据的信息量来进行数据挖掘,然而,这种数据挖掘方式受背景噪声的干扰较大,数据挖掘准确度较低。
发明内容
第一方面,本公开实施例提供一种图像处理方法,所述方法包括:
获取N张样本图像,所述样本图像为车辆在行驶过程中对周围环境采集的图像;
在每一所述样本图像中确定目标像素区域,所述目标像素区域为所述周围环境中与所述车辆的自动驾驶决策相关联的交通元素的成像区域;
获取所述N张样本图像中每一所述样本图像对应的所述目标像素区域的信息量;
根据所述目标像素区域的信息量,在所述N张样本图像中选择M张样本图像,其中,M小于N,M和N均为正整数,所述M张样本图像用于训练与车辆的自动驾驶决策相关的机器学习模型。
第二方面,本公开实施例提供一种图像处理装置,所述装置包括处理器,所述处理器用于执行以下步骤:
获取N张样本图像,所述样本图像为车辆在行驶过程中对周围环境采集的图像;
在每一所述样本图像中确定目标像素区域,所述目标像素区域为所述周围环境中与所述车辆的自动驾驶决策相关联的交通元素的成像区域;
获取所述N张样本图像中每一所述样本图像对应的所述目标像素区域的信息量;
根据所述目标像素区域的信息量,在所述N张样本图像中选择M张样本图像,其 中,M小于N,M和N均为正整数,所述M张样本图像用于训练与车辆的自动驾驶决策相关的机器学习模型。
第三方面,本公开实施例提供一种图像处理系统,所述系统包括:
视觉传感器,部署在车辆上,用于在所述车辆行驶过程中对周围环境进行图像采集,得到N张样本图像;
处理器,用于在每一所述样本图像中确定目标像素区域,所述目标像素区域为所述周围环境中与所述车辆的自动驾驶决策相关联的交通元素的成像区域;获取所述N张样本图像中每一所述样本图像对应的所述目标像素区域的信息量;根据所述目标像素区域的信息量,在所述N张样本图像中选择M张样本图像,其中,M小于N,M和N均为正整数;
服务器,用于基于所述M张样本图像对所述车辆的机器学习模型的副本进行训练,并将训练后的机器学习模型部署到所述车辆上。
第四方面,本公开实施例提供一种可移动平台,所述可移动平台包括:
视觉传感器,用于在所述可移动平台行驶过程中对周围环境进行图像采集,得到N张样本图像;
电子控制单元,用于基于所述可移动平台上部署的机器学习模型的输出结果,对所述可移动平台进行自动驾驶决策,所述机器学习模型用于基于从所述N张样本图像中确定的M张样本图像训练得到,所述M张样本图像基于本公开任一实施例所述的方法获取。
第五方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现本公开任一实施例所述的方法。
本公开实施例方案从样本图像中确定与所述车辆的自动驾驶决策相关联的交通元素的成像区域,即目标像素区域,在获取信息量时,仅聚焦于目标像素区域的信息量,并基于目标像素区域的信息量进行样本图像的数据挖掘。这样,在获取信息量时减少了与车辆的自动驾驶决策无关的元素的干扰,从而减少了背景噪声对数据挖掘过程的干扰,提高了数据挖掘准确度。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是数据挖掘过程的示意图。
图2A和图2B分别是不同图像中物体的uncertainty的示意图。
图3是本公开实施例的图像处理方法的流程图。
图4A、图4B、图4C和图4D分别是本公开实施例的基于像素区域的特征确定目标像素区域的示意图。
图5A、图5B和图5C分别是本公开实施例的基于物体的特征确定目标像素区域的示意图。
图6A和图6B分别是本公开实施例的基于视觉传感器的视角确定目标像素区域的示意图。
图7是本公开实施例的系统架构的示意图。
图8是本公开实施例的总体流程的示意图。
图9是本公开实施例的应用场景的示意图。
图10是本公开实施例的图像处理装置的结构示意图。
图11是本公开实施例的图像处理系统的示意图。
图12是本公开实施例的可移动平台的示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。 在本公开说明书和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
机器学习模型(简称模型)通常由不同种类、功能的神经元构成,用以执行特定的机器学习任务。所述机器学习任务可以是回归任务、分类任务,或者两者相结合。通常,模型越大越复杂,其性能越好。在采用机器学习模型执行机器学习任务之前,需要采用样本数据对机器学习模型进行训练。然而,实际采集的样本数据对于机器学习模型的训练来说往往是重复、冗余、不平衡的,在很多情况下,一小部分的类别占据了大多数的样本数据,而大部分的类别只有极少数的样本数据,这一问题称为数据的长尾问题。为了提高机器学习模型的性能,需要进行数据挖掘,如图1所示,数据挖掘一般是指通过挖掘算法从数据池中提取出部分数据作为挖掘结果,期望的挖掘结果为导致机器学习模型失效的、表现不好的、甚至是没见过的边角案例(corner case)数据,利用挖掘结果来调整机器学习模型的模型参数,从而获得性能较好的模型。
其中,数据池是指待挖掘的海量数据,通常指某一任务场景中所有采集到的作为模型输入的数据总和,通常不包括或者仅包括有限的标注信息。数据池中数据的类别根据任务场景不同而不同,包括但不限于图像、视频、音频、文字等各种模态的数据,并且在同一个任务场景中可以多种模态的数据共存。数据池可以是云端也可以是本地。可以是单一节点,也可以是分布式存储系统。数据池内的数据组织方式和数据结构不做要求,只要支持单帧图像输出即可。个别注意力机制算法可能要求时间上连续的样本,这种情况下要求数据池保存并可以检索样本的物理时间。
鉴于实际情况中数据池极为庞大和复杂,数据挖掘一般采用纯算法或者半人工的数据挖掘手段实现。挖掘算法包括不确定性采样(uncertainty sampling)、多样性采样(diversity sampling)、异议采样(disagreement based sampling)等算法,这三种方法都是通过某种采样模型计算待挖掘样本的信息量,然后根据信息量大小进行数据挖掘。例如,在不确定性采样算法中,信息量正比于模型预测的uncertainty大小;在多样性 采样算法中,信息量正比于数据的diverse程度;在异议采样算法中,信息量正比于采样模型之间异议的程度。
然而,发明人发现,上述数据挖掘方式都是从样本整体维度进行信息量估计,在进行数据挖掘时没有提供足够细的粒度去估计样本的信息量,会引入冗余信息和噪音,导致数据挖掘准确度较低。
举例来说,在L2自动驾驶物体检测任务中,需要挖掘一些容易导致漏检误检骑自行车人的样本,采取不确定性采样算法对样本信息量进行估计,如图2A所示,算法为包括骑自行车的人的区域201给出了很高的uncertainty,为包括机动车的区域202给出了较低的uncertainty。但是,这并不代表该帧图像会被以很高的优先级挖掘出来,因为背景噪声带来的uncertainty noise会导致其他无关样本有更高的uncertainty,从而有更高的优先级被挖掘。
例图2B所示,虚线框203内的是人行道上的一排自行车。虽然这里被给出很高的uncertainty是合理的,因为“自行车物体”和“骑自行车的人”很容易被混淆,但是,在人行道上的自行车哪怕被误检也不影响L2自动驾驶系统的决策。用户只优先关心行车道上的物体,或者说,在数据挖掘的时候,希望以更高的优先级挖掘行车道上相关物体的漏检误检。
由上例可以看出,不同的系统、不同的机器学习任务可能都有自身特定的数据挖掘需求。每张待挖掘的样本,用户会以更高优先级关心某些区域。相关技术中的数据挖掘系统并没有很好地支持这种需求。
基于此,本公开实施例提供一种图像处理方法,该方法基于注意力机制进行数据挖掘,可以在有或者没有专家知识先验的情况下,通过自动算法从每个图像样本中识别、划分出人为定义的目标像素区域,以作为数据挖掘计算信息量的最小单元,从而达到在挖掘过程中通过“注意力聚焦”以排除不重要因素干扰的目的。参见图3,所述方法包括:
步骤301:获取N张样本图像,所述样本图像为车辆在行驶过程中对周围环境采集的图像;
步骤302:在每一所述样本图像中确定目标像素区域,所述目标像素区域为所述周围环境中与所述车辆的自动驾驶决策相关联的交通元素的成像区域;
步骤303:获取所述N张样本图像中每一所述样本图像对应的所述目标像素区域 的信息量;
步骤304:根据所述目标像素区域的信息量,在所述N张样本图像中选择M张样本图像,其中,M小于N,M和N均为正整数,所述M张样本图像用于训练与车辆的自动驾驶决策相关的机器学习模型。
在步骤301中,所述周围环境可以是车辆行驶或停靠的道路环境,所述道路环境中可以包括一种或多种交通元素,道路环境中的交通元素可能包括与所述车辆的自动驾驶决策相关联的交通元素,也可能包括与所述车辆的自动驾驶决策无关的交通元素。在一些实施例中,所述交通元素可以包括车辆自身要素和外部交通环境要素,外部交通环境要素又包括静态环境要素、动态环境要素、交通参与者要素和/或气象要素等。车辆自身要素包括车辆自身的基础属性(例如,重量、几何信息、性能信息等)、位置信息(例如,坐标信息、所在的车道信息等)、运动状态信息(例如,横向运动状态和纵向运动状态)和/或驾驶任务信息(例如,感知识别、路径规划、人机交互、联网通信等)。静态环境要素是指交通环境中静止状态的物体,包括道路、交通设施、周围景观以及障碍物等。动态环境要素是指交通环境中处于动态变化的要素,包括动态指示设施(例如,交通信号灯、可变交通标志、交警等)和通信环境信息(例如,信号强度信息、电磁干扰信息、信号延迟信息等)。交通参与者要素包括车辆周围的行人、动物和/或其他车辆等对车辆的决策规划造成影响的对象信息。气象要素包括行驶场景中的环境温度、光照条件和/或天气情况等信息。
可以对周围环境进行图像采集,得到N张样本图像。所述N张样本图像既可以包括由车辆上的视觉传感器采集得到的图像,又可以包括由在车辆的行驶环境中设置的监控装置采集得到的图像。其中,所述车辆的数量可以大于或等于1,通过多台车辆上的视觉传感器共同采集样本图像,能够提高样本图像的采集效率。当车辆在行车道上行驶时,所述监控装置可以包括在行车道周围布设的若干监控摄像头。所述N张样本图像既可以包括单张图像,也可以包括视频中的一帧或多帧视频帧。
在步骤302中,可以针对每张样本图像确定目标像素区域。其中,目标像素区域为所述周围环境中与车辆的自动驾驶决策相关联的交通元素的成像区域。所述与车辆的自动驾驶决策相关联的交通元素一般是指会对自动驾驶决策产生影响的交通元素。当车辆在行车道上行驶时,行车道上的其他车辆、在行车道前方的斑马线上经过的行人和动物、行车道周围路口的交通信号灯、行驶时的自然环境等元素都可能会影响自动驾驶决策。例如,当行车道上存在车辆A和车辆B时,车辆A需要基于车辆B的 位置和移动速度确定自身的行驶路径和行驶速度,以避免与车辆B相撞。又例如,当行车道周围路口存在交通信号灯时,车辆需要基于交通信号灯的状态确定是否能够通过路口。
一张样本图像中可能包括一个或多个目标像素区域,也可能不包括目标像素区域。如果一张样本图像中不包括目标像素区域,可以直接将该样本图像丢弃。如果一张样本图像中包括一个或多个目标像素区域,可以将该样本图像用于后续步骤的处理。下面对确定目标像素区域的具体方式进行举例说明。
在一些实施例中,可以基于所述样本图像中各像素区域的特征确定目标像素区域,或者基于所述样本图像中包括的物体的特征确定目标像素区域,或者基于所述视觉传感器的视角确定目标像素区域,获取基于所述机器学习模型执行的任务确定目标像素区域。还可以基于上述两种或两种以上方式共同确定目标像素区域。下面对各种确定目标像素区域的方式进行逐一说明。
(1)基于所述样本图像中各像素区域的特征确定目标像素区域
在一些实施例中,像素区域的特征包括但不限于所述像素区域的位置、深度、像素值和/或语义。其中,一个像素区域的位置可以是像素区域在物理空间中的位置,也可以是该像素区域在样本图像中的像素位置,所述位置可以是绝对位置,也可以是相对位置。所述深度可以是所述像素区域中的某个像素点或者某个物体到拍摄所述像素区域所属的样本图像的图像采集装置的深度。所述像素值可以包括所述像素区域中部分或全部像素点的像素值。所述语义可用于表征所述像素区域中的像素点对应的交通元素的类别(例如,车道类别、人行道类别、交通信号灯类别等)。
在所述像素区域的特征包括所述像素区域的位置的情况下,可以将预设位置范围内的像素区域确定为所述目标像素区域。所述预设位置范围可以是连续的位置区间(例如,大于或等于某个位置下限,和/或小于或等于某个位置上限),也可以是离散的一个或多个位置点。图4A示出了所述位置为像素位置时的示意图,所述预设位置范围为样本图像中居中的一块像素区域,如图中虚线框所示。假设车辆401在道路上行驶,且车辆401在T1时刻与在T2时刻处于道路上不同的位置。在T1时刻,通过车辆401右侧的摄像头(图中未示出)采集到样本图像P1,且样本图像P1中包括狗402,在T2时刻,通过车辆401右侧的摄像头采集到样本图像P2,且样本图像P2中包括行人403。可以看出,无论虚线框中的像素区域在物理空间中处于哪一位置,也无论虚线框 中的像素区域中包括何种物体,在采集的样本图像中,均以同一块像素区域(即虚线框内的像素区域)作为目标像素区域。当然,所述预设位置范围除了可以是样本图像中居中的像素区域之外,也可以是本图像中居中的其他像素区域,且预设位置范围的尺寸和数量不限于图中所示。
图4B示出了所述位置为像素区域在物理空间中的位置时的示意图,白色椭圆形区域为摄像头404的视野范围,该视野范围可变,灰色椭圆形区域表示所述预设位置范围。假设在T1时刻,狗402处于摄像头404的视野范围S1内的预设位置范围,且在T2时刻,行人403处于摄像头404的视野范围S2内的预设位置范围,则两个时刻采集的样本图像P3和P4中,目标像素区域如虚线框所示。可以看出,无论摄像头404的视野范围如何变化,在采集的样本图像中,均以同一物理位置在样本图像中对应的像素区域作为目标像素区域。当然,所述预设位置范围除了可以是图中所示的区域之外,也可以是摄像头视野范围内的其他区域,且预设位置范围的尺寸和数量不限于图中所示。
在所述像素区域的特征包括所述像素区域的深度的情况下,可以将预设深度范围内的像素区域确定为所述目标像素区域,所述预设深度范围可以是连续的深度区间(例如,大于或等于某个深度下限,和/或小于或等于某个深度上限),也可以是离散的一个或多个深度点。如图4C所示,假设在某一时刻,狗402与行人403与车辆401的深度均在预设深度范围内,则采集的样本图像P5中包括狗402的像素区域以及包括行人403的像素区域均为目标像素区域(如图中虚线框所示)。图中示出了同一张样本图像中包括两个处于预设深度范围内的物体的情况,在实际情况下,同一张样本图像中包括的处于预设深度范围内的物体的数量也可以是其他数量,各个物体可以是同一个摄像头采集到的,也可以是不同摄像头采集到的。
在所述像素区域的特征包括所述像素区域的语义的情况下,可以将预设语义类别的像素区域确定为所述目标像素区域。如图4D所示,样本图像中的像素区域的语义类别包括机动车道类别和人行道类别,可以将其中一者或两者确定为目标像素区域。当然,本领域技术人员可以理解,语义类别的划分方式不限于图中所示,例如,可以对语义类别进行更为细致的划分,例如,将机动车道进一步划分为左转车道类别、直行车道类别和右转车道类别等。除了车道类别之外,语义类别还可能包括交通信号灯类别、行人类别、地面指示线类别等。
在所述像素区域的特征包括所述像素区域的像素值的情况下,可以将包括预设像 素值的像素点的像素区域确定为所述目标像素区域。例如,可以将包括红色像素点的像素区域确定为目标像素区域。
(2)基于所述样本图像中包括的物体的特征确定目标像素区域。
一个物体的特征包括但不限于所述物体的类别、移动速度、尺寸中的至少一者。其中,所述类别可用于表征物体属于何种交通元素,所述移动速度可以是绝对速度或者相对速度,所述尺寸可以是像素尺寸,也可以是物体在物理空间中的尺寸。
可以从图像中确定具有预设特征的物体,并将样本图像中所述预设特征的物体所在的像素区域确定为目标像素区域。所述具有预设特征可以是属于预设类别、移动速度在预设速度范围内和/或尺寸在预设尺寸范围内。如图5A所示,假设样本图像中包括“行人”类别的对象和“狗”类别的对象,且“行人”类别为预设类别,则可以将“行人”类别的对象所在的像素区域确定为目标像素区域。
在一些实施例中,可以从所述样本图像中识别具有预设特征的目标物体;将所述样本图像中所述目标物体所在的像素区域以及与所述目标物体类别相同的其他物体所在的像素区域确定为目标像素区域。如图5B所示,可以从样本图像中识别移动速度不为0的目标物体,假设目标物体的类别为行人A,则可以从样本图像中识别行人A以外的其他行人,假设识别到行人B和行人C,则可以将行人A所在的像素区域、行人B所在的像素区域以及行人C所在的像素区域均确定为目标像素区域(如图中虚线框所示)。
在一些实施例中,所述样本图像包括视频中的多帧目标视频帧。在这种情况下,可以从所述视频中的一帧参考视频帧中识别具有预设特征的目标物体;对所述目标物体进行跟踪,以确定每帧目标视频帧中包括所述目标物体的像素区域;将每帧目标视频帧中包括所述目标物体的像素区域确定为目标像素区域。如图5C所示,假设F1、F2和F3为视频中的多帧目标视频帧,这些目标视频帧可能是连续的,也可能是不连续的。假设所述预设特征为类别属于“行人”类别,则可以先对视频帧F1进行识别,假设识别到行人A,进而可以对行人A进行跟踪,以在视频帧F2和F3中分别识别行人A。假设行人A在F1、F2和F3中的像素位置分别如图中所示,则可以将F1、F2和F3中包括行人A的像素区域分别确定为目标像素区域,如图中虚线框所示。
在一些实施例中,所述预设特征基于所述物体所在的像素区域的语义类别确定,即,可以分别为不同的像素区域确定不同的预设特征。以所述预设特征是预设类别为 例,对于道路所在的像素区域而言,该像素区域内影响车辆的自动驾驶决策的交通元素主要是机动车、非机动车、行人等类别的物体,因此,可以将机动车、非机动车、行人等一个或多个类别确定为道路所在的像素区域对应的预设类别;而其他像素区域(除了道路所在的像素区域以外的区域)内影响车辆的自动驾驶决策的交通元素可能主要包括交通信号灯,因此,可以将交通信号灯类别确定为其他像素区域对应的预设类别。
(3)基于所述视觉传感器的视角确定目标像素区域。例如,可以将视觉传感器在预设视角范围内采集的像素区域确定为目标像素区域。在一些实施例中,所述预设视角范围小于视觉传感器的总的视角范围。如图6A所示,假设视觉传感器的总的视角范围为α 1,该视角范围能够对浅灰色区域601进行成像,由于图像边缘的畸变程度一般高于图像中心区域的畸变程度,因此,可以确定一个小于α 1的视角范围α 2,视角范围α 2能够对深灰色区域602进行成像。从而深灰色区域602对应的像素区域即为目标像素区域。
在一些实施例中,所述预设视角范围可以是两个或两个以上视角传感器的重叠的视角范围。如图6B所示,以包括重叠的视角范围的两个视觉传感器为例,其中,椭圆形区域603和604分别为两个视觉传感器各自的视角,这两个传感器的视角的重叠范围如图中带斜线的区域所示。可以将该重叠范围对应的像素区域确定为目标像素区域。
(4)所述目标像素区域基于数据挖掘任务确定。一种数据挖掘任务可能对应几种区域,不同的数据挖掘任务可能对应不同的区域。同一组数据上可能被执行多种数据挖掘任务。例如,在数据挖掘任务为“挖掘蓝色轿车”或者“挖掘机动车道上的车辆”时,可以将机动车道对应的像素区域确定为目标像素区域;在数据挖掘任务为“挖掘人行道上的物体”时,可以将人行道对应的像素区域确定为目标像素区域。
在实际确定目标像素区域时,可以基于上述任意一种方式确定目标像素区域,或者,也可以同时基于上述至少两种方式确定目标像素区域,例如,可以将属于预设语义类别,且包括预设特征的物体的像素区域确定为目标像素区域,在预设语义类别为机动车道类别,预设特征为自行车类别时,可以将机动车道上包括自行车的像素区域确定为目标像素区域。还可以结合上述至少任一方式与其他方式共同确定目标像素区域,此处不再一一列举。可以在不同的场景下采用不同的方式确定目标像素区域,从而提高方案的灵活性和可扩展性。当边角案例的定义发生变化,亦即挖掘标准变化时, 挖掘算法能够以很低的成本进行适配。
在步骤303中,可以采用各种确定信息量的方式来确定目标像素区域的信息量,所述确定信息量的方式包括但不限于前述不确定性采样、多样性采样或者异议采样。由于在获取信息量时,仅聚焦于目标像素区域的信息量,并基于目标像素区域的信息量进行样本图像的数据挖掘。这样,在获取信息量时减少了与车辆的自动驾驶决策无关的元素的干扰,从而减少了背景噪声对数据挖掘过程的干扰,提高了数据挖掘准确度。
在步骤304中,可以根据所述样本图像中所述目标像素区域的信息量,对所述样本图像进行评分,得到所述样本图像的评分值;根据所述N张样本图像的评分值,在所述N张样本图像中选择M张样本图像。其中,一张样本图像的评分值与该样本图像被选择的概率可以正相关,也可以反相关。以正相关的情况为例,则可以对各样本图像的评分值按照从大到小的顺序进行排序,并从中选择排序靠前的M张样本图像。当然,还可以采用其他方式来选择M张样本图像,此处不再赘述。
本公开实施例的方案可采用图7所示的架构实现。其中,数据池(database)701用于存储待挖掘的样本图像,样本图像可以经由注意力节点(attention node)702进行处理,以确定目标像素区域。其中,确定目标像素区域的方式可采用前述任意一种方式,具体的算法可采用跟踪(tracking)算法、分割(segmentation)算法等。在跟踪算法中,用户只关心图像中某个动态物体的特征,例如一辆车。则在时序数据的第一帧画框框出目标车辆,然后采用tracking算法在之后每帧图像中自动跟踪该框,并基于跟踪结果确定目标像素区域。在分割算法中,用户只关心画面中某些区域的特征,例如只关心机动车道区域。则通过一个语义分割(semantic segmentation)网络对图像进行分割,只保留“机动车道”类别的像素点对应的区域作为目标像素区域。
在确定目标像素区域之后,可以将目标像素区域发送至挖掘节点(mining node)703,挖掘节点703可以采用不确定性采样、多样性采样等方式确定目标像素区域的信息量,并基于信息量挖掘出M张样本图像。挖掘出的M张样本图像可以存入数据池701进行存储,也可以输出给其他处理单元。确定目标像素区域的方式、注意力节点702采用的算法以及挖掘节点703采用的算法均可以通过图形用户界面(Graphical User Interface,GUI)704输入。在GUI上还可以对筛选出的M张样本图像进行二次筛查,或者通过在GUI上输入相应指令以将筛选出的样本图像直接存入数据池。
在一些实施例中,可以对所述M张样本图像进行人工筛选,得到K张样本图像。自动挖掘方式可能存在一定的误差,因此,本公开实施例进一步对挖掘出的M张样本图像进行人工筛选,得到K张样本图像,并将这K张样本图像用于训练与车辆的自动驾驶决策相关的机器学习模型,以提高训练效果。其中K可以小于或等于M。本公开实施例对数据池中的大量样本图像进行自动化数据挖掘,并将人工筛选作为辅助,同时保证了挖掘效率与挖掘结果的准确度。
如图8所示,筛选出的样本图像可以用于训练与车辆的自动驾驶决策相关的机器学习模型。其中,车辆的自动驾驶决策依据感知信息来替代人类驾驶员对车辆的行驶状态进行决策和控制,从而实现车道保持、车道偏离预警、车距保持、障碍物警告等功能。自动驾驶决策可以基于部署在车辆上的机器学习模型实现,所述机器学习模型可以包括但不限于各种检测模型、识别模型、分类模型等。例如,可以通过识别模型对道路上的交通元素进行识别,以确定其中的交通信号灯,从而依据交通信号灯的信息确定是否能够通过当前路口。又例如,可以通过检测模型检测前车与本车的车距,从而确定是否需要减速。由于自动驾驶决策可能涉及多种机器学习任务,因此,部署在车辆上的机器学习模型可能包括执行不同机器学习任务的多个机器学习模型。
部署在车辆上的机器学习模型可以基于挖掘出的样本图像以及样本图像中交通元素对应的描述真值训练得到,执行不同机器学习任务的机器学习模型训练时采用的描述真值可能不同。例如,执行分类任务的机器学习模型所采用的描述真值为样本图像中各像素点的类别,执行检测任务的机器学习模型所采用的描述真值为样本图像中检测到的车辆到本车的距离。
在一些实施例中,可以将所述M张样本图像输入真值标定系统801,以获取所述M张样本图像中所述交通元素对应的描述真值;基于所述M张样本图像以及所述M张样本图像中所述交通元素对应的描述真值,对所述机器学习模型进行训练。其中,真值标定系统801可以通过自动标定、半自动标定或者人工标定方式来获取样本图像中所述交通元素对应的描述真值。不同的真值标定系统的标定准确性和标定效率不同,例如,人工标定方式效率较低但准确性较高,而自动标定或半自动标定的方式效率较高但准确性较低。因此,需要对真值标定系统的标定效率和准确性进行权衡。
在一些自动标定系统中,可以预先在云端训练一个性能较优的机器学习模型,该机器学习模型执行的任务与部署在车辆上的机器学习模型相同,且该机器学习模型的标定结果的准确度高于预设的准确度阈值,从而可以直接将该机器学习模型的输出结 果作为所述描述真值。例如,可以通过部署在云端的识别模型从样本图像中识别交通信号灯,并输出交通信号灯的颜色(红灯、黄灯、绿灯)的描述真值。然后,将该样本图像及其中的交通信号灯的颜色的描述真值用于训练部署在车辆上的机器学习模型,以便使部署在车辆上的机器学习模型能够准确地针对交通信号灯的颜色确定是否能够通过路口。
在一些半自动标定系统中,可以先获取部署在车辆上的机器学习模型针对样本图像的输出结果,如果车辆的决策系统针对该样本图像输出的自动驾驶决策结果正常,则将部署在车辆上的机器学习模型的输出结果作为所述描述真值,否则通过人工标定的方式确定该样本图像中所述交通元素对应的描述真值。例如,可以通过部署在车辆上的检测模型检测前车与本车的距离。如果在某一时刻,车辆的决策系统针对一张样本图像输出的自动驾驶决策结果指示车辆以当前速度向前行驶,但出现车辆与前车相撞的情况,则表示该自动驾驶决策结果异常,从而可以确定部署在车辆上的机器学习模型输出的车距不准确,因此,可以通过人工标定的方式确定该样本图像中前车与本车的距离,将人工标定的距离作为对应的描述真值。
在一些实施例中,在进行人工标定时,可以将所述M张样本图像中每一样本图像在标定界面进行展示,并标识所述样本图像中的所述目标像素区域;检测用户针对所述交通元素的标定操作,基于所述标定操作获取真值标定结果;将所述真值标定结果作为所述描述真值。所述标定操作可以包括删除、修改原标定结果以及添加标定结果。可以在所述标定界面展示关联交通元素的预标定真值;若检测到用户对所述预标定真值的确认操作,将所述预标定真值确定为所述真值标定结果。反之,若检测到用户对所述预标定真值的调整操作,获取调整后的标定结果,可以将调整后的标定结果确定为所述真值标定结果。
例如,针对识别交通信号灯的任务,可以在显示界面显示预标定真值,所述预标定真值可以是图像中交通信号灯的包围框。如果检测到用户针对该包围框的确认操作,将该包围框确定为所述真值标定结果。反之,若检测到用户对所述包围框的调整操作,例如,调整其大小和/或位置,则将调整后的包围框确定为所述真值标定结果。
除了上述列举的方式之外,还可以基于其他方式获取真值描述,此处不再一一列举。
在获取样本图像的描述真值之后,可以从所述M张样本图像中每一所述样本图像 中截取所述目标像素区域;基于所述M张样本图像对应的目标像素区域以及所述M张样本图像中所述交通元素对应的描述真值,对所述机器学习模型进行训练。或者,可以直接基于所述M张样本图像以及所述M张样本图像中所述交通元素对应的描述真值,对所述机器学习模型进行训练。训练好的机器学习模型可以部署到车辆上。
在一些实施例中,除了基于目标像素区域的信息量筛选出用于训练机器学习模型的样本图像之外,还可以基于其他信息筛选样本图像。例如,可以对所述车辆的行驶状态进行检测;获取检测到所述行驶状态异常的时刻之前和/或之后采集到的P张样本图像,P为正整数,所述M张样本图像和所述P张样本图像共同用于训练与车辆的自动驾驶决策相关的机器学习模型。所述P张样本图像可以部分或全部包括在所述M张样本图像中,也可以是所述M张样本图像以外的其他图像,即,所述P张样本图像与所述M张样本图像可以部分或全部相同。所述行驶状态可以包括行驶速度、行驶方向等,在行驶状态包括行驶速度时,如果行驶速度的变化率超过一定阈值(例如,车辆急刹车),可以认为行驶状态异常。在行驶状态包括行驶方向时,如果行驶方向的变化率超过一定阈值(例如,急转弯),或者转向后撞上障碍物,可以认为行驶状态异常。此外,行驶状态还可以包括其他状态,各种行驶状态下的行驶异常情况可以基于实际场景确定,此处不再一一列举。
又例如,可以获取所述车辆的决策系统输出的决策结果,所述决策结果用于对所述车辆的行驶状态进行决策规划;将输出错误的决策结果的时刻之前和/或之后采集到的Q张样本图像,Q为正整数,所述M张样本图像和所述Q张样本图像共同用于训练与车辆的自动驾驶决策相关的机器学习模型。所述Q张样本图像可以部分或全部包括在所述M张样本图像中,也可以是所述M张样本图像以外的其他图像,即,所述Q张样本图像与所述M张样本图像可以部分或全部相同。例如,车辆的决策结果指示按照当前速度行驶后撞上障碍物,或者决策结果指示在直行车道上转向,则确定为错误的决策结果。此外,决策结果错误的情况还可以包括其他情况,此处不再一一列举。
在一些实施例中,可以同时采用所述M张样本图像、所述P张样本图像以及所述Q张样本图像共同训练车辆的机器学习模型。由于行驶状态异常以及决策结果错误时针对的样本图像可能是使机器学习模型表现不好的样本图像,因此,将这些样本图像挖掘出来,有助于提高机器学习模型的性能。
在一些实施例中,所述车辆设置有第一自动驾驶权限;在对所述机器学习模型进行训练之后,将所述车辆的自动驾驶权限设置为第二自动驾驶权限,所述第二自动驾 驶权限高于所述第一自动驾驶权限。例如,所述第一自动驾驶权限可以是L2自动驾驶权限,所述第二自动驾驶权限可以是L3自动驾驶权限。可以在训练好机器学习模型之后,采用测试图像对训练好的机器学习模型进行测试,以确定机器学习模型的性能,并根据机器学习模型的性能确定所述第二自动驾驶权限。通过采用本实施例,能够为车辆自动设置其能力范围内的自动驾驶权限,提高了自动驾驶的安全性。
如图9所示,是本公开一实施例的应用场景的示意图。在初始状态下,车辆901设置有第一自动驾驶权限,在该自动驾驶权限下,车辆901不具备自动路径规划权限。可通过车辆901上的视觉传感器采集样本图像,并将样本图像发送至云端进行筛选,或者通过车辆901自身进行筛选,筛选后的样本数据可用于在云端训练机器学习模型。训练好之后,云端可以将机器学习模型下发至车辆901。由于此时车辆901已经对周围环境具备一定的检测、识别能力,因此,可以为车辆901设置第二自动驾驶权限。在该自动驾驶权限下,车辆901具备自动路径规划权限。车辆901可以基于机器学习模型的输出结果规划出路径R,并基于路径R进行自动驾驶。
本公开实施例为解决机器学习模型部署中模型迭代的长尾问题,同时支持用户在数据挖掘中聚焦部分区域的需求,从而提出一种基于注意力机制的数据挖掘框架,并定义其在生产实践中的软件形态。本公开实施例的软件系统可以提供整套数据挖掘功能。本公开具有以下优势:
(1)在估计样本信息量时,能够聚焦在每个图像样本中用户所感兴趣的区域(即目标像素区域),而排除其他背景、噪声等不关心内容的干扰,提高数据挖掘的质量。
(2)适用领域广,兼容回归、分类、二者结合的机器学习模型和任务。
(3)支持半自动和自动化的数据挖掘过程,能够尽可能减少人为参与。
(4)挖掘标准可扩展,当边角案例的定义发生变化,亦即挖掘标准变化时,挖掘算法可以很低的成本进行适配。
本公开还提供一种图像处理装置,所述装置包括处理器,所述处理器用于执行以下步骤:
获取N张样本图像,所述样本图像为车辆在行驶过程中对周围环境采集的图像;
在每一所述样本图像中确定目标像素区域,所述目标像素区域为所述周围环境中与所述车辆的自动驾驶决策相关联的交通元素的成像区域;
获取所述N张样本图像中每一所述样本图像对应的所述目标像素区域的信息量;
根据所述目标像素区域的信息量,在所述N张样本图像中选择M张样本图像,其中,M小于N,M和N均为正整数,所述M张样本图像用于训练与车辆的自动驾驶决策相关的机器学习模型。
在一些实施例中,所述处理器具体用于:基于所述样本图像中各像素区域的特征确定目标像素区域。
在一些实施例中,所述处理器具体用于:所述像素区域的特征包括所述像素区域的位置,所述目标像素区域为预设位置范围内的像素区域;所述像素区域的特征包括所述像素区域的深度,所述目标像素区域为预设深度范围内的像素区域;所述像素区域的特征包括所述像素区域的像素值,所述目标像素区域为包括预设像素值的像素点的像素区域;所述像素区域的特征包括所述像素区域的语义,所述目标像素区域为预设语义类别的像素区域。
在一些实施例中,所述处理器具体用于:基于所述样本图像中包括的物体的特征确定目标像素区域。
在一些实施例中,一个物体的特征包括所述物体的类别、移动速度、尺寸中的至少一者。
在一些实施例中,所述样本图像包括视频中的多帧目标视频帧;所述处理器具体用于:从所述视频中的一帧参考视频帧中识别具有预设特征的目标物体;对所述目标物体进行跟踪,以确定每帧目标视频帧中包括所述目标物体的像素区域;将每帧目标视频帧中包括所述目标物体的像素区域确定为目标像素区域。
在一些实施例中,所述处理器具体用于:从所述样本图像中识别具有预设特征的目标物体;将所述样本图像中所述目标物体所在的像素区域以及与所述目标物体类别相同的其他物体所在的像素区域确定为目标像素区域。
在一些实施例中,所述预设特征基于所述物体所在的像素区域的语义类别确定。
在一些实施例中,所述样本图像由所述车辆上的视觉传感器采集得到;所述处理器具体用于:基于所述视觉传感器的视角确定目标像素区域。
在一些实施例中,所述目标像素区域为所述视觉传感器在预设视角范围内采集的图像。
在一些实施例中,所述目标像素区域基于数据挖掘任务确定。
在一些实施例中,所述处理器还用于:对所述车辆的行驶状态进行检测;获取检测到所述行驶状态异常的时刻之前和/或之后采集到的P张样本图像,P为正整数,所述M张样本图像和所述P张样本图像共同用于训练与车辆的自动驾驶决策相关的机器学习模型。
在一些实施例中,所述处理器还用于:获取所述车辆的决策系统输出的决策结果,所述决策结果用于对所述车辆的行驶状态进行决策规划;将输出错误的决策结果的时刻之前和/或之后采集到的Q张样本图像,Q为正整数,所述M张样本图像和所述Q张样本图像共同用于训练与车辆的自动驾驶决策相关的机器学习模型。
在一些实施例中,所述处理器具体用于:根据所述样本图像中所述目标像素区域的信息量,对所述样本图像进行评分,得到所述样本图像的评分值;根据所述N张样本图像的评分值,在所述N张样本图像中选择M张样本图像。
在一些实施例中,所述处理器还用于:对所述M张样本图像进行人工筛选,得到K张样本图像,K为正整数,K小于或等于M,所述K张样本图像用于训练与车辆的自动驾驶决策相关的机器学习模型。
在一些实施例中,所述处理器还用于:将所述M张样本图像输入真值标定系统,以获取所述M张样本图像中所述交通元素对应的描述真值;基于所述M张样本图像以及所述M张样本图像中所述交通元素对应的描述真值,对所述机器学习模型进行训练。
在一些实施例中,所述处理器具体用于:从所述M张样本图像中每一所述样本图像中截取所述目标像素区域;基于所述M张样本图像对应的目标像素区域以及所述M张样本图像中所述交通元素对应的描述真值,对所述机器学习模型进行训练。
在一些实施例中,所述车辆设置有第一自动驾驶权限;所述处理器还用于:在对所述机器学习模型进行训练之后,将所述车辆的自动驾驶权限设置为第二自动驾驶权限,所述第二自动驾驶权限高于所述第一自动驾驶权限。
在一些实施例中,所述处理器具体用于:将所述M张样本图像中每一样本图像在标定界面进行展示,并标识所述样本图像中的所述目标像素区域;检测用户针对所述交通元素的标定操作,基于所述标定操作获取真值标定结果;将所述真值标定结果作为所述描述真值。
在一些实施例中,所述处理器具体用于:在所述标定界面展示关联交通元素的预标定真值;若检测到用户对所述预标定真值的确认操作,将所述预标定真值确定为所述真值标定结果;和/或;在所述标定界面展示关联交通元素的预标定真值;若检测到用户对所述预标定真值的调整操作,获取调整后的标定结果;将调整后的标定结果确定为所述真值标定结果。
上述装置实施例中处理器实现的功能详见前述方法实施例,此处不再赘述。
图10示出了一种图像处理装置的硬件结构示意图,该装置可以包括:处理器1001、存储器1002、输入/输出接口1003、通信接口1004和总线1005。其中处理器1001、存储器1002、输入/输出接口1003和通信接口1004通过总线1005实现彼此之间在设备内部的通信连接。
处理器1001可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。处理器1001还可以包括显卡,所述显卡可以是Nvidia titan X显卡或者1080Ti显卡等。
存储器1002可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1002可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器1002中,并由处理器1001来调用执行。
输入/输出接口1003用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口1004用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线1005包括一通路,在设备的各个组件(例如处理器1001、存储器1002、输入/输出接口1003和通信接口1004)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器1001、存储器1002、输入/输出接口1003、通信接口1004以及总线1005,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。
如图11所示,本公开还提供一种图像处理系统,所述系统包括:
视觉传感器1101,部署在车辆上,用于在所述车辆行驶过程中对周围环境进行图像采集,得到N张样本图像;
处理器1102,用于在每一所述样本图像中确定目标像素区域,所述目标像素区域为所述周围环境中与所述车辆的自动驾驶决策相关联的交通元素的成像区域;获取所述N张样本图像中每一所述样本图像对应的所述目标像素区域的信息量;根据所述目标像素区域的信息量,在所述N张样本图像中选择M张样本图像,其中,M小于N,M和N均为正整数;
服务器1103,用于基于所述M张样本图像对所述车辆的机器学习模型的副本进行训练,并将训练后的机器学习模型部署到所述车辆上。
所述视觉传感器1101可以是单目视觉传感器、双目视觉传感器或者其他类型的视觉传感器。为了提高车辆的安全性,可以在车辆上部署多个视觉传感器1101,不同的视觉传感器1101位于车辆的不同方位。例如,可以在车辆的左、右后视镜上各部署一个视觉传感器1101,还可以在车辆的后侧部署一个或多个视觉传感器1101。所述处理器1102可以部署在车辆上,也可以部署在云端。处理器1102执行的功能详见前述方法实施例,此处不再赘述。所述服务器1103可以部署在云端,可以通过采用筛选出的M张样本图像以及样本图像对应的描述真值来训练车辆的机器学习模型的副本,并将训练后的机器学习模型部署到所述车辆上。
如图12所示,本公开还提供一种可移动平台,其特征在于,所述可移动平台包括:
视觉传感器1201,用于在所述可移动平台行驶过程中对周围环境进行图像采集,得到N张样本图像;
电子控制单元1202,用于基于所述可移动平台上部署的机器学习模型的输出结果,对所述可移动平台进行自动驾驶决策,所述机器学习模型用于基于从所述N张样本图像中确定的M张样本图像训练得到,所述M张样本图像基于本公开任一实施例所述 的方法获取。
其中,所述可移动平台可以包括但不限于车辆、飞机、船只、可移动机器人等各种设备,在一些应用场景下,可移动平台为自动驾驶车辆、无人机、无人船等设备,可移动平台可以通过对周围环境进行感知和决策规划以实现自主移动,也可以在用户的操纵下移动。
所述视觉传感器1201可以是单目视觉传感器、双目视觉传感器或者其他类型的视觉传感器。可以在可移动平台上部署多个视觉传感器1101,不同的视觉传感器1101位于可移动平台的不同方位。电子控制单元1202可以部署在可移动平台上,用于对可移动平台的行驶进行决策规划,例如,对可移动平台进行路径规划、速度控制等。用于训练可移动平台的M张图像可以采用前述任一实施例中的方法获取,具体细节可参见前述方法实施例,此处不再赘述。
本说明书实施例还提供一种计算机可读存储介质,所述可读存储介质上存储有若干计算机指令,所述计算机指令被执行时实任一实施例所述方法的步骤。
以上实施例中的各种技术特征可以任意进行组合,只要特征之间的组合不存在冲突或矛盾,但是限于篇幅,未进行一一描述,因此上述实施方式中的各种技术特征的任意进行组合也属于本说明书公开的范围。
本说明书实施例可采用在一个或多个其中包含有程序代码的存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。计算机可用存储介质包括永久性和非永久性、可移动和非可移动媒体,可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括但不限于:相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。
本领域技术人员在考虑说明书及实践这里公开的说明书后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的 公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。
以上所述仅为本公开的较佳实施例而已,并不用以限制本公开,凡在本公开的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开保护的范围之内。

Claims (43)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获取N张样本图像,所述样本图像为车辆在行驶过程中对周围环境采集的图像;
    在每一所述样本图像中确定目标像素区域,所述目标像素区域为所述周围环境中与所述车辆的自动驾驶决策相关联的交通元素的成像区域;
    获取所述N张样本图像中每一所述样本图像对应的所述目标像素区域的信息量;
    根据所述目标像素区域的信息量,在所述N张样本图像中选择M张样本图像,其中,M小于N,M和N均为正整数,所述M张样本图像用于训练与车辆的自动驾驶决策相关的机器学习模型。
  2. 根据权利要求1所述的方法,其特征在于,所述在每一所述样本图像中确定目标像素区域,包括:
    基于所述样本图像中各像素区域的特征确定目标像素区域。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述样本图像中各像素区域的特征确定目标像素区域,包括:
    所述像素区域的特征包括所述像素区域的位置,所述目标像素区域为预设位置范围内的像素区域;
    所述像素区域的特征包括所述像素区域的深度,所述目标像素区域为预设深度范围内的像素区域;
    所述像素区域的特征包括所述像素区域的像素值,所述目标像素区域为包括预设像素值的像素点的像素区域;
    所述像素区域的特征包括所述像素区域的语义,所述目标像素区域为预设语义类别的像素区域。
  4. 根据权利要求1所述的方法,其特征在于,所述在每一所述样本图像中确定目标像素区域,包括:
    基于所述样本图像中包括的物体的特征确定目标像素区域。
  5. 根据权利要求4所述的方法,其特征在于,一个物体的特征包括所述物体的类别、移动速度、尺寸中的至少一者。
  6. 根据权利要求4所述的方法,其特征在于,所述样本图像包括视频中的多帧目标视频帧;所述基于所述样本图像中包括的物体的特征确定目标像素区域,包括:
    从所述视频中的一帧参考视频帧中识别具有预设特征的目标物体;
    对所述目标物体进行跟踪,以确定每帧目标视频帧中包括所述目标物体的像素区域;
    将每帧目标视频帧中包括所述目标物体的像素区域确定为目标像素区域。
  7. 根据权利要求4所述的方法,其特征在于,所述基于所述样本图像中包括的物体的特征确定目标像素区域,包括:
    从所述样本图像中识别具有预设特征的目标物体;
    将所述样本图像中所述目标物体所在的像素区域以及与所述目标物体类别相同的其他物体所在的像素区域确定为目标像素区域。
  8. 根据权利要求6或7所述的方法,其特征在于,所述预设特征基于所述物体所在的像素区域的语义类别确定。
  9. 根据权利要求1所述的方法,其特征在于,所述样本图像由所述车辆上的视觉传感器采集得到;所述在每一所述样本图像中确定目标像素区域,包括:
    基于所述视觉传感器的视角确定目标像素区域。
  10. 根据权利要求9所述的方法,其特征在于,所述目标像素区域为所述视觉传感器在预设视角范围内采集的图像。
  11. 根据权利要求1所述的方法,其特征在于,所述目标像素区域基于数据挖掘任务确定。
  12. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    对所述车辆的行驶状态进行检测;
    获取检测到所述行驶状态异常的时刻之前和/或之后采集到的P张样本图像,P为正整数,所述M张样本图像和所述P张样本图像共同用于训练与车辆的自动驾驶决策相关的机器学习模型。
  13. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取所述车辆的决策系统输出的决策结果,所述决策结果用于对所述车辆的行驶状态进行决策规划;
    将输出错误的决策结果的时刻之前和/或之后采集到的Q张样本图像,Q为正整数,所述M张样本图像和所述Q张样本图像共同用于训练与车辆的自动驾驶决策相关的机器学习模型。
  14. 根据权利要求1所述的方法,其特征在于,所述根据所述目标像素区域的信息量,在所述N张样本图像中选择M张样本图像,包括:
    根据所述样本图像中所述目标像素区域的信息量,对所述样本图像进行评分,得 到所述样本图像的评分值;
    根据所述N张样本图像的评分值,在所述N张样本图像中选择M张样本图像。
  15. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    对所述M张样本图像进行人工筛选,得到K张样本图像,K为正整数,K小于或等于M,所述K张样本图像用于训练与车辆的自动驾驶决策相关的机器学习模型。
  16. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    将所述M张样本图像输入真值标定系统,以获取所述M张样本图像中所述交通元素对应的描述真值;
    基于所述M张样本图像以及所述M张样本图像中所述交通元素对应的描述真值,对所述机器学习模型进行训练。
  17. 根据权利要求16所述的方法,其特征在于,所述基于所述M张样本图像以及所述M张样本图像中所述交通元素对应的描述真值,对所述机器学习模型进行训练,包括:
    从所述M张样本图像中每一所述样本图像中截取所述目标像素区域;
    基于所述M张样本图像对应的目标像素区域以及所述M张样本图像中所述交通元素对应的描述真值,对所述机器学习模型进行训练。
  18. 根据权利要求16所述的方法,其特征在于,所述车辆设置有第一自动驾驶权限;所述方法还包括:
    在对所述机器学习模型进行训练之后,将所述车辆的自动驾驶权限设置为第二自动驾驶权限,所述第二自动驾驶权限高于所述第一自动驾驶权限。
  19. 根据权利要求16所述的方法,其特征在于,所述获取所述M张样本图像中所述交通元素对应的描述真值,包括:
    将所述M张样本图像中每一样本图像在标定界面进行展示,并标识所述样本图像中的所述目标像素区域;
    检测用户针对所述交通元素的标定操作,基于所述标定操作获取真值标定结果;
    将所述真值标定结果作为所述描述真值。
  20. 根据权利要求19所述的方法,其特征在于,所述基于所述标定操作获取真值标定结果,包括:
    在所述标定界面展示关联交通元素的预标定真值;
    若检测到用户对所述预标定真值的确认操作,将所述预标定真值确定为所述真值标定结果;
    和/或;
    在所述标定界面展示关联交通元素的预标定真值;
    若检测到用户对所述预标定真值的调整操作,获取调整后的标定结果;
    将调整后的标定结果确定为所述真值标定结果。
  21. 一种图像处理装置,其特征在于,所述装置包括处理器,所述处理器用于执行以下步骤:
    获取N张样本图像,所述样本图像为车辆在行驶过程中对周围环境采集的图像;
    在每一所述样本图像中确定目标像素区域,所述目标像素区域为所述周围环境中与所述车辆的自动驾驶决策相关联的交通元素的成像区域;
    获取所述N张样本图像中每一所述样本图像对应的所述目标像素区域的信息量;
    根据所述目标像素区域的信息量,在所述N张样本图像中选择M张样本图像,其中,M小于N,M和N均为正整数,所述M张样本图像用于训练与车辆的自动驾驶决策相关的机器学习模型。
  22. 根据权利要求21所述的装置,其特征在于,所述处理器具体用于:
    基于所述样本图像中各像素区域的特征确定目标像素区域。
  23. 根据权利要求22所述的装置,其特征在于,所述处理器具体用于:
    所述像素区域的特征包括所述像素区域的位置,所述目标像素区域为预设位置范围内的像素区域;
    所述像素区域的特征包括所述像素区域的深度,所述目标像素区域为预设深度范围内的像素区域;
    所述像素区域的特征包括所述像素区域的像素值,所述目标像素区域为包括预设像素值的像素点的像素区域;
    所述像素区域的特征包括所述像素区域的语义,所述目标像素区域为预设语义类别的像素区域。
  24. 根据权利要求21所述的装置,其特征在于,所述处理器具体用于:
    基于所述样本图像中包括的物体的特征确定目标像素区域。
  25. 根据权利要求24所述的装置,其特征在于,一个物体的特征包括所述物体的类别、移动速度、尺寸中的至少一者。
  26. 根据权利要求24所述的装置,其特征在于,所述样本图像包括视频中的多帧目标视频帧;所述处理器具体用于:
    从所述视频中的一帧参考视频帧中识别具有预设特征的目标物体;
    对所述目标物体进行跟踪,以确定每帧目标视频帧中包括所述目标物体的像素区域;
    将每帧目标视频帧中包括所述目标物体的像素区域确定为目标像素区域。
  27. 根据权利要求24所述的装置,其特征在于,所述处理器具体用于:
    从所述样本图像中识别具有预设特征的目标物体;
    将所述样本图像中所述目标物体所在的像素区域以及与所述目标物体类别相同的其他物体所在的像素区域确定为目标像素区域。
  28. 根据权利要求26或27所述的装置,其特征在于,所述预设特征基于所述物体所在的像素区域的语义类别确定。
  29. 根据权利要求21所述的装置,其特征在于,所述样本图像由所述车辆上的视觉传感器采集得到;所述处理器具体用于:
    基于所述视觉传感器的视角确定目标像素区域。
  30. 根据权利要求29所述的装置,其特征在于,所述目标像素区域为所述视觉传感器在预设视角范围内采集的图像。
  31. 根据权利要求21所述的装置,其特征在于,所述目标像素区域基于数据挖掘任务确定。
  32. 根据权利要求21所述的装置,其特征在于,所述处理器还用于:
    对所述车辆的行驶状态进行检测;
    获取检测到所述行驶状态异常的时刻之前和/或之后采集到的P张样本图像,P为正整数,所述M张样本图像和所述P张样本图像共同用于训练与车辆的自动驾驶决策相关的机器学习模型。
  33. 根据权利要求21所述的装置,其特征在于,所述处理器还用于:
    获取所述车辆的决策系统输出的决策结果,所述决策结果用于对所述车辆的行驶状态进行决策规划;
    将输出错误的决策结果的时刻之前和/或之后采集到的Q张样本图像,Q为正整数,所述M张样本图像和所述Q张样本图像共同用于训练与车辆的自动驾驶决策相关的机器学习模型。
  34. 根据权利要求21所述的装置,其特征在于,所述处理器具体用于:
    根据所述样本图像中所述目标像素区域的信息量,对所述样本图像进行评分,得到所述样本图像的评分值;
    根据所述N张样本图像的评分值,在所述N张样本图像中选择M张样本图像。
  35. 根据权利要求21所述的装置,其特征在于,所述处理器还用于:
    对所述M张样本图像进行人工筛选,得到K张样本图像,K为正整数,K小于或等于M,所述K张样本图像用于训练与车辆的自动驾驶决策相关的机器学习模型。
  36. 根据权利要求21所述的装置,其特征在于,所述处理器还用于:
    将所述M张样本图像输入真值标定系统,以获取所述M张样本图像中所述交通元素对应的描述真值;
    基于所述M张样本图像以及所述M张样本图像中所述交通元素对应的描述真值,对所述机器学习模型进行训练。
  37. 根据权利要求36所述的装置,其特征在于,所述处理器具体用于:
    从所述M张样本图像中每一所述样本图像中截取所述目标像素区域;
    基于所述M张样本图像对应的目标像素区域以及所述M张样本图像中所述交通元素对应的描述真值,对所述机器学习模型进行训练。
  38. 根据权利要求36所述的装置,其特征在于,所述车辆设置有第一自动驾驶权限;所述处理器还用于:
    在对所述机器学习模型进行训练之后,将所述车辆的自动驾驶权限设置为第二自动驾驶权限,所述第二自动驾驶权限高于所述第一自动驾驶权限。
  39. 根据权利要求36所述的装置,其特征在于,所述处理器具体用于:
    将所述M张样本图像中每一样本图像在标定界面进行展示,并标识所述样本图像中的所述目标像素区域;
    检测用户针对所述交通元素的标定操作,基于所述标定操作获取真值标定结果;
    将所述真值标定结果作为所述描述真值。
  40. 根据权利要求39所述的装置,其特征在于,所述处理器具体用于:
    在所述标定界面展示关联交通元素的预标定真值;
    若检测到用户对所述预标定真值的确认操作,将所述预标定真值确定为所述真值标定结果;
    和/或;
    在所述标定界面展示关联交通元素的预标定真值;
    若检测到用户对所述预标定真值的调整操作,获取调整后的标定结果;
    将调整后的标定结果确定为所述真值标定结果。
  41. 一种图像处理系统,其特征在于,所述系统包括:
    视觉传感器,部署在车辆上,用于在所述车辆行驶过程中对周围环境进行图像采 集,得到N张样本图像;
    处理器,用于在每一所述样本图像中确定目标像素区域,所述目标像素区域为所述周围环境中与所述车辆的自动驾驶决策相关联的交通元素的成像区域;获取所述N张样本图像中每一所述样本图像对应的所述目标像素区域的信息量;根据所述目标像素区域的信息量,在所述N张样本图像中选择M张样本图像,其中,M小于N,M和N均为正整数;
    服务器,用于基于所述M张样本图像对所述车辆的机器学习模型的副本进行训练,并将训练后的机器学习模型部署到所述车辆上。
  42. 一种可移动平台,其特征在于,所述可移动平台包括:
    视觉传感器,用于在所述可移动平台行驶过程中对周围环境进行图像采集,得到N张样本图像;
    电子控制单元,用于基于所述可移动平台上部署的机器学习模型的输出结果,对所述可移动平台进行自动驾驶决策,所述机器学习模型用于基于从所述N张样本图像中确定的M张样本图像训练得到,所述M张样本图像基于权利要求1至20任意一项所述的方法获取。
  43. 一种计算机可读存储介质,其特征在于,其上存储有计算机指令,该指令被处理器执行时实现权利要求1至20任意一项所述的方法。
PCT/CN2022/082257 2022-03-22 2022-03-22 图像处理方法、装置和系统、可移动平台 WO2023178510A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/082257 WO2023178510A1 (zh) 2022-03-22 2022-03-22 图像处理方法、装置和系统、可移动平台
CN202280057529.XA CN117882117A (zh) 2022-03-22 2022-03-22 图像处理方法、装置和系统、可移动平台

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/082257 WO2023178510A1 (zh) 2022-03-22 2022-03-22 图像处理方法、装置和系统、可移动平台

Publications (1)

Publication Number Publication Date
WO2023178510A1 true WO2023178510A1 (zh) 2023-09-28

Family

ID=88099543

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/082257 WO2023178510A1 (zh) 2022-03-22 2022-03-22 图像处理方法、装置和系统、可移动平台

Country Status (2)

Country Link
CN (1) CN117882117A (zh)
WO (1) WO2023178510A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599773A (zh) * 2016-10-31 2017-04-26 清华大学 用于智能驾驶的深度学习图像识别方法、系统及终端设备
CN110478911A (zh) * 2019-08-13 2019-11-22 苏州钛智智能科技有限公司 基于机器学习的智能游戏车无人驾驶方法及智能车、设备
CN112987707A (zh) * 2019-11-29 2021-06-18 北京京东乾石科技有限公司 一种车辆的自动驾驶控制方法及装置
US20220051058A1 (en) * 2018-09-12 2022-02-17 Beijing Sankuai Online Technology Co., Ltd Unmanned driving behavior decision-making and model training

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599773A (zh) * 2016-10-31 2017-04-26 清华大学 用于智能驾驶的深度学习图像识别方法、系统及终端设备
US20220051058A1 (en) * 2018-09-12 2022-02-17 Beijing Sankuai Online Technology Co., Ltd Unmanned driving behavior decision-making and model training
CN110478911A (zh) * 2019-08-13 2019-11-22 苏州钛智智能科技有限公司 基于机器学习的智能游戏车无人驾驶方法及智能车、设备
CN112987707A (zh) * 2019-11-29 2021-06-18 北京京东乾石科技有限公司 一种车辆的自动驾驶控制方法及装置

Also Published As

Publication number Publication date
CN117882117A (zh) 2024-04-12

Similar Documents

Publication Publication Date Title
US20230290136A1 (en) Brake Light Detection
US11840239B2 (en) Multiple exposure event determination
CN108388834A (zh) 利用循环神经网络和级联特征映射的对象检测
KR102448358B1 (ko) 자율주행 차량들을 위한 카메라 평가 기술들
CN111932901B (zh) 道路车辆跟踪检测设备、方法及存储介质
JP2016027490A (ja) マルチキュー・オブジェクトの検出および分析のための方法、システム、製品、およびコンピュータ・プログラム(マルチキュー・オブジェクトの検出および分析)
Kim et al. Deep traffic light detection for self-driving cars from a large-scale dataset
US20210114627A1 (en) Neural networks for navigation of autonomous vehicles based upon predicted human intents
CN110415544B (zh) 一种灾害天气预警方法及汽车ar-hud系统
CN111094095B (zh) 自动地感知行驶信号的方法、装置及运载工具
GB2560625A (en) Detecting vehicles in low light conditions
CN104036253A (zh) 一种车道线追踪方法及系统
US11745749B2 (en) Vehicular system for testing performance of object detection algorithms
GB2562018A (en) A method and system for analyzing the movement of bodies in a traffic system
CN111967396A (zh) 障碍物检测的处理方法、装置、设备及存储介质
JP2019154027A (ja) ビデオ監視システムのパラメータ設定方法、装置及びビデオ監視システム
JP2021149863A (ja) 物体状態識別装置、物体状態識別方法及び物体状態識別用コンピュータプログラムならびに制御装置
JP7226368B2 (ja) 物体状態識別装置
Matsuda et al. A system for real-time on-street parking detection and visualization on an edge device
WO2023178510A1 (zh) 图像处理方法、装置和系统、可移动平台
EP4145398A1 (en) Systems and methods for vehicle camera obstruction detection
Kezebou et al. A deep neural network approach for detecting wrong-way driving incidents on highway roads
Koetsier et al. Trajectory extraction for analysis of unsafe driving behaviour
CN114972731A (zh) 交通灯检测识别方法及装置、移动工具、存储介质
DE102020202342A1 (de) Cloud-Plattform für automatisierte Mobilität und computerimplementiertes Verfahren zur Bereitstellung einer Cloud basierten Datenanreicherung an die automatisierte Mobilität

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22932579

Country of ref document: EP

Kind code of ref document: A1