CN115830399B - Classification model training method, device, equipment, storage medium and program product - Google Patents

Classification model training method, device, equipment, storage medium and program product Download PDF

Info

Publication number
CN115830399B
CN115830399B CN202211722251.1A CN202211722251A CN115830399B CN 115830399 B CN115830399 B CN 115830399B CN 202211722251 A CN202211722251 A CN 202211722251A CN 115830399 B CN115830399 B CN 115830399B
Authority
CN
China
Prior art keywords
long
tail
image
traffic
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211722251.1A
Other languages
Chinese (zh)
Other versions
CN115830399A (en
Inventor
王梦琪
李果
张璐
吴广力
方涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Woya Technology Co ltd
Original Assignee
Guangzhou Woya Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Woya Technology Co ltd filed Critical Guangzhou Woya Technology Co ltd
Priority to CN202211722251.1A priority Critical patent/CN115830399B/en
Publication of CN115830399A publication Critical patent/CN115830399A/en
Application granted granted Critical
Publication of CN115830399B publication Critical patent/CN115830399B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The application relates to a classification model training method, a device, equipment, a storage medium and a program product, wherein the method comprises the following steps: performing recognition processing on target traffic images corresponding to various different traffic scenes based on the long tail recognition model to obtain recognition results; determining a long-tail traffic image from the target traffic image based on the identification result, wherein the long-tail traffic image is an image in a long-tail traffic scene; training an initial classification model based on the long-tail traffic image to obtain a target classification model, wherein the target classification model is used for classifying objects. According to the method, the long-tail traffic image in the long-tail traffic scene is determined from the target traffic images corresponding to the different traffic scenes through the long-tail recognition model, and the target classification model is obtained based on the long-tail traffic image training, so that the target classification model can be used for rapidly classifying objects in the long-tail traffic scene, and the classification efficiency in the long-tail traffic scene is improved.

Description

Classification model training method, device, equipment, storage medium and program product
Technical Field
The present application relates to the field of automatic driving technology, and in particular, to a classification model training method, apparatus, device, storage medium, and program product.
Background
With the development of automatic driving technology, the related art has been able to solve most of the problems occurring in daily life. There are still some long-tail traffic scenarios that are difficult to encounter in everyday life, such as fragmented scenarios, extreme cases, and unpredictable human behavior. According to the long tail effect, even though these long tail traffic scenarios are rare, the accumulated total amount already constitutes a great threat to the safety of automatic driving. The object classification is an important link in automatic driving, such as the task of predicting the behavior of a vehicle, planning the track, etc. is closely related to the result of object classification, and the task is to accurately predict the object class of the detected object in the road, such as the vehicle, pedestrian, bicycle, etc. Therefore, how to solve the problem of target classification in long-tail traffic scenes is an important research direction facing automatic driving at present.
In the traditional technology, different rule conditions are set for different long-tail traffic scenes, and target classification processing is performed respectively. However, long-tail traffic scenes are rich in variety and numerous, and the method can effectively solve the problem of target classification in a specific long-tail traffic scene, but is low in efficiency, so that the problem of target classification in various long-tail traffic scenes is difficult to solve rapidly.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a classification model training method, apparatus, device, storage medium, and program product that can improve classification efficiency in long-tail traffic scenarios.
In a first aspect, the present application provides a classification model training method. The method comprises the following steps: performing recognition processing on target traffic images corresponding to various different traffic scenes based on the long tail recognition model to obtain recognition results; determining a long-tail traffic image from the target traffic image based on the identification result, wherein the long-tail traffic image is an image in a long-tail traffic scene; training the initial classification model based on the long-tail traffic image to obtain a target classification model, wherein the target classification model is used for classifying objects.
In one embodiment, the long tail recognition model is used for recognizing an image in a long tail traffic scene with the object category as a target category; determining a long-tail traffic image from the target traffic images based on the recognition result, including: and acquiring a long-tail traffic image with the identification result indicating that the class of the object in the image is the target class from the target traffic image.
In one embodiment, training the initial classification model based on the long-tail traffic image to obtain the target classification model includes: performing target image processing on an original traffic image corresponding to the long-tail traffic image, wherein the target image processing is image processing related to an upstream task of the object classification task; training the initial classification model based on the image characteristics obtained by the target image processing to obtain a target classification model.
In one embodiment, before performing the target image processing on the original traffic image corresponding to the long-tail traffic image, the method further includes: splicing long-tail traffic images belonging to the same traffic scene to obtain at least one long-tail image sequence; for each long-tail image sequence, determining an upstream task according to a traffic scene to which the long-tail image sequence belongs, and determining an image processing category according to the upstream task; correspondingly, performing target image processing on an original traffic image corresponding to the long-tail traffic image, including: and for each long-tail image sequence, performing target image processing on the original traffic image corresponding to the long-tail image sequence based on the corresponding image processing category.
In one embodiment, training the initial classification model based on image features resulting from target image processing to obtain a target classification model includes: determining an object category label corresponding to the image feature based on the image feature and the recognition result; and training the initial classification model based on the image features and object class labels corresponding to the image features to obtain a target classification model.
In one embodiment, the image features include a feature detection box, the feature detection box is used for identifying objects in the original traffic image, the identification result includes an identification detection box, the identification detection box is used for identifying objects belonging to a target class in the long-tail traffic image, and determining object class labels corresponding to the image features based on the image features and the identification result includes: performing matching processing on the characteristic detection frame and the identification detection frame; and if the certain feature detection frame and the identification detection frame are matched with each other, the target category is taken as an object category label corresponding to the certain feature detection frame.
In one embodiment, the matching processing for the feature detection frame and the identification detection frame includes: mapping the feature detection frame to a coordinate system where the identification detection frame is located based on the coordinate conversion matrix; and carrying out matching processing according to the overlapping degree between the mapped feature detection frame and the identification detection frame.
In one embodiment, the method further includes, before the recognition result is obtained, performing recognition processing on traffic images corresponding to various different traffic scenes based on the long tail recognition model: the method comprises the steps that long-tail sample data are obtained, the long-tail sample data comprise long-tail sample images and sample detection frames, the long-tail sample images are sample images in long-tail traffic scenes, sample objects in the long-tail sample images are target categories, and the sample detection frames are used for identifying the sample objects in the long-tail sample images; training the initial recognition model based on long tail sample data to obtain a long tail recognition model.
In a second aspect, the application further provides a classification model training device. The device comprises: the recognition module is used for recognizing and processing target traffic images corresponding to various different traffic scenes based on the long tail recognition model to obtain recognition results; the determining module is used for determining a long-tail traffic image from the target traffic image based on the identification result, wherein the long-tail traffic image is an image in a long-tail traffic scene; the training module is used for training the initial classification model based on the long-tail traffic image to obtain a target classification model, and the target classification model is used for classifying objects.
In one embodiment, the long tail recognition model is used for recognizing an image in a long tail traffic scene with the object category as a target category; the determining module is specifically configured to obtain, from the target traffic image, a long-tail traffic image with a category of the object in the identification result indication image being the target category.
In one embodiment, the training module is specifically configured to perform target image processing on an original traffic image corresponding to the long-tail traffic image, where the target image processing is image processing related to an upstream task of the object classification task; training the initial classification model based on the image characteristics obtained by the target image processing to obtain a target classification model.
In one embodiment, the apparatus further comprises: the splicing module is used for carrying out splicing treatment on long-tail traffic images belonging to the same traffic scene to obtain at least one long-tail image sequence; the image processing category determining module is used for determining an upstream task according to the traffic scene to which each long-tail image sequence belongs for each long-tail image sequence and determining an image processing category according to the upstream task; correspondingly, the training module is further used for carrying out target image processing on the original traffic image corresponding to the long-tail image sequences based on the corresponding image processing category for each long-tail image sequence.
In one embodiment, the training module is further configured to determine an object class label corresponding to the image feature based on the image feature and the recognition result; and training the initial classification model based on the image features and object class labels corresponding to the image features to obtain a target classification model.
In one embodiment, the image features include a feature detection frame, the feature detection frame is used for identifying objects in the original traffic image, the identification result includes an identification detection frame, the identification detection frame is used for identifying objects belonging to a target class in the long-tail traffic image, and the training module is further used for carrying out matching processing on the feature detection frame and the identification detection frame; and if the certain feature detection frame and the identification detection frame are matched with each other, the target category is taken as an object category label corresponding to the certain feature detection frame.
In one embodiment, the training module is further configured to map the feature detection frame to a coordinate system in which the identification detection frame is located based on the coordinate transformation matrix; and carrying out matching processing according to the overlapping degree between the mapped feature detection frame and the identification detection frame.
In one embodiment, the apparatus further comprises: the long tail recognition model training module is used for acquiring long tail sample data, the long tail sample data comprises a long tail sample image and a sample detection frame, the long tail sample image is a sample image in a long tail traffic scene, a sample object in the long tail sample image is a target class, and the sample detection frame is used for identifying the sample object in the long tail sample image; training the initial recognition model based on long tail sample data to obtain a long tail recognition model.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method according to any of the first aspects above when the processor executes the computer program.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any of the first aspects above.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to any of the first aspects above.
According to the classification model training method, the device, the equipment, the storage medium and the program product, the recognition results are obtained by recognizing the target traffic images corresponding to various different traffic scenes based on the long-tail recognition model, the long-tail traffic images are determined from the target traffic images based on the recognition results, the long-tail traffic images are images in the long-tail traffic scenes, then the initial classification model is trained based on the long-tail traffic images to obtain the target classification model, and the target classification model is used for object classification.
Drawings
FIG. 1 is a flow chart of a classification model training method according to one embodiment;
FIG. 2 is a flow diagram of another classification model training method in one embodiment;
FIG. 3 is a block diagram of a classification model training apparatus in one embodiment;
FIG. 4 is a block diagram of another classification model training apparatus in accordance with another embodiment;
FIG. 5 is a block diagram of a classification model training apparatus according to yet another embodiment;
FIG. 6 is an internal block diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The target classification is a very important link in automatic driving, because tasks such as behavior prediction and track planning of vehicles are closely related to the result of the target classification, the technology of the target classification of common traffic scenes is mature at present, but the target classification of long-tail traffic scenes still needs to be studied. At present, aiming at long-tail traffic scenes, different rule conditions are set for different long-tail traffic scenes, and target classification processing is respectively carried out, however, the long-tail traffic scenes are rich in variety and numerous, and the method can effectively solve the problem of target classification in a specific long-tail traffic scene, but has lower efficiency, and is difficult to quickly solve the problem of target classification in various long-tail traffic scenes; the other is to solve the problem of long-tail traffic scenes through vehicle-road cooperation, but the vehicle-road cooperation has higher requirements on basic hardware facilities and has larger time delay and still has certain difficulty, so that a technical means capable of efficiently, effectively and low-cost classifying the long-tail traffic scenes is necessary to be provided.
In one embodiment, as shown in fig. 1, a flow chart of a classification model training method is provided, and this embodiment is illustrated by applying the method to a server. The method comprises the following steps:
and step 101, carrying out recognition processing on target traffic images corresponding to various different traffic scenes based on the long tail recognition model to obtain a recognition result.
The Long tail recognition model is a neural network classification model trained based on Long tail sample data, and specifically may be a trained LSTM (Long Short-Term Memory network) model, where the Long tail recognition model can accurately recognize images of a target class from multiple images, and the target class is a class learned by the Long tail recognition model when training the Long tail recognition model, such as a pedestrian class, a vehicle class, and a tree class. A long tail recognition model can recognize images of one type and can also recognize images of multiple types. If the long tail recognition model can only recognize one type of image, the long tail recognition model can be multiple; if the long tail recognition model can recognize images of various categories, the long tail recognition model can be one in the embodiment of the application.
Various traffic scenes include common traffic scenes and long-tail traffic scenes, wherein the long-tail traffic scenes are such as people walking with an umbrella, people moving boxes behind a vehicle, trees falling in the center of a road and the like; the target traffic image refers to any image frame in a video corresponding to a traffic scene, and one traffic scene corresponds to a plurality of target traffic images.
Optionally, each traffic scene corresponds to a unique scene ID (Identity document, identity number), the server inputs a plurality of target traffic images corresponding to various traffic scenes into a long-tail recognition model, and for each target traffic image, the long-tail recognition model judges whether a learned object exists in the target traffic image, if so, the target traffic image is marked with a recognition detection frame, then the target traffic image with the recognition detection frame is output, and the target category corresponding to the object in the recognition detection frame is also output, that is, the recognition result includes: a target traffic image with an identification detection frame, a target category; if the target traffic image does not exist, the target traffic image is not marked by the identification detection frame, the target traffic image with the identification detection frame is not output, and the target category corresponding to the object in the identification detection frame is also obtained.
The automatic driving vehicle can save a large amount of traffic scene data during the drive test, but only a small amount of long-tail traffic scenes can be marked and recorded manually, and the rest of long-tail traffic scenes can be ignored due to larger workload, so that a long-tail recognition model is obtained by training long-tail sample data corresponding to the small amount of manually marked long-tail traffic scenes, and then target traffic images corresponding to various different traffic scenes are recognized by using the long-tail recognition model, so that target traffic images corresponding to the long-tail traffic scenes can be recognized from the target traffic images corresponding to various different traffic scenes, the aim of mining target traffic images corresponding to the long-tail traffic scenes from the target traffic images corresponding to various different traffic scenes is fulfilled, and further the data amount is increased for the input of a training target classification model, and the accuracy of the target classification model can be improved.
Step 102, determining a long-tail traffic image from the target traffic image based on the identification result, wherein the long-tail traffic image is an image in a long-tail traffic scene.
Optionally, according to step 101, the recognition result includes a target traffic image with a recognition detection frame, and the target class, and since the long tail recognition model is trained based on long tail sample data, the long tail recognition model can only recognize long tail traffic scenes in various traffic scenes, so that the target traffic image with the recognition detection frame output by the long tail recognition model is the long tail traffic image.
And step 103, training the initial classification model based on the long-tail traffic image to obtain a target classification model, wherein the target classification model is used for classifying objects.
The initial classification model is a neural network classification model, which may be an LSTM model, and correspondingly, the target classification model is a trained neural network classification model, which may be a trained LSTM model.
Optionally, the long-tail recognition model can recognize long-tail traffic images of target categories from target traffic images corresponding to various different traffic scenes, then the long-tail traffic images and the target categories corresponding to the long-tail traffic images are input into the initial classification model together for training, and the obtained target classification model can classify the long-tail traffic images corresponding to the long-tail traffic scenes.
Optionally, the initial classification model may be trained based on the long-tail traffic image and the common traffic image corresponding to the common traffic scene, so as to obtain a target classification model, where the target classification model is used for classifying the object. Specifically, the long-tail traffic image and the target category corresponding to the long-tail traffic image, and the common traffic image and the target category corresponding to the common traffic image are input into the initial classification model together for training, so that the obtained target classification model can classify the traffic images corresponding to various different traffic scenes. It should be noted that, the common traffic image and the target category corresponding to the common traffic image may be obtained from the historical traffic data repository.
In addition, the object classification model may be applied to an automatically driven vehicle to classify objects in front of the vehicle based on the object classification model.
In summary, the recognition result is obtained by performing recognition processing on the target traffic images corresponding to various different traffic scenes based on the long-tail recognition model, the long-tail traffic image is determined from the target traffic images based on the recognition result, the long-tail traffic image is an image in the long-tail traffic scene, then the initial classification model is trained based on the long-tail traffic image to obtain the target classification model, and the target classification model is used for object classification.
In one embodiment, the long tail recognition model is used for recognizing an image in a long tail traffic scene with the object category as a target category; determining a long-tail traffic image from the target traffic images based on the recognition result, including: and acquiring a long-tail traffic image with the identification result indicating that the class of the object in the image is the target class from the target traffic image.
The object in the image refers to a target object in the image, and the long-tail traffic image input to the long-tail recognition model is an image with a detection frame and a target category corresponding to the object in the detection frame in a training stage of the long-tail recognition model. Therefore, the long tail recognition model can recognize the image in the long tail traffic scene with the object category as the target category in the image.
Optionally, after identifying the target traffic images corresponding to various different traffic scenes based on the long tail identification model to obtain the category of each target traffic image, selecting the target traffic images with the categories of the target traffic images as target categories, and then the selected target traffic images are long tail traffic images.
In one embodiment, training the initial classification model based on the long-tail traffic image to obtain the target classification model includes: performing target image processing on an original traffic image corresponding to the long-tail traffic image, wherein the target image processing is image processing related to an upstream task of the object classification task; training the initial classification model based on the image characteristics obtained by the target image processing to obtain a target classification model.
The original traffic image refers to an image extracted from a historical traffic data storage according to a traffic scene ID corresponding to the long-tail traffic image. The image processing related to the upstream task refers to detection, segmentation, and the like of an image. The image features comprise a feature detection frame, a target speed, a target pose, a size, point cloud coordinates and the like, and each original traffic image corresponds to one image feature.
Optionally, according to the original traffic image, map information and the position of the vehicle, the sensor performs image processing related to upstream tasks such as detection and segmentation on the sensor, so as to obtain image features, and then the long-tail traffic image obtained in step 102 and the image features in the embodiment are input into an initial classification model for training, so as to obtain a target classification model.
In the target classification task of the automatic driving system, the input is not directly from the original scene data, such as the observation of a radar and a camera, but from the output after the processing of the upstream task, so in the embodiment, the original traffic image is processed to simulate the image processing related to the upstream task, so that the image characteristics obtained through the target image processing are matched with the actual image, and the target classification model obtained by training the initial classification model by utilizing the image characteristics can more accurately classify the objects in the image, thereby achieving the aim of improving the accuracy of the target classification of the long-tail traffic scene.
In one embodiment, before performing the target image processing on the original traffic image corresponding to the long-tail traffic image, the method further includes: splicing long-tail traffic images belonging to the same traffic scene to obtain at least one long-tail image sequence; for each long-tail image sequence, determining an upstream task according to a traffic scene to which the long-tail image sequence belongs, and determining an image processing category according to the upstream task; correspondingly, performing target image processing on an original traffic image corresponding to the long-tail traffic image, including: and for each long-tail image sequence, performing target image processing on the original traffic image corresponding to the long-tail image sequence based on the corresponding image processing category.
Optionally, in the training stage of the long-tail recognition model, the long-tail traffic image input to the long-tail recognition model is an image with a detection frame and a target category corresponding to an object in the detection frame, so that after the long-tail recognition model inputs the target traffic images corresponding to various traffic scenes into the long-tail recognition model, the long-tail recognition model can output the long-tail traffic image, and the recognition detection frame and the target category corresponding to the object in the long-tail traffic image, wherein the target traffic images all have IDs, and the IDs of the target traffic images belonging to the same traffic scene are the same. After the long-tail traffic image and the identification detection frame and the target class corresponding to the object in the long-tail traffic image are obtained, the long-tail traffic images belonging to the same ID are spliced according to the time stamp to obtain at least one initial long-tail image sequence, and each initial long-tail image sequence comprises the ID, the time stamp, the identification detection frame, the target class and the like. And then, carrying out manual auditing on the initial long-tail images, and removing the long-tail recognition models which mistakenly recognize the common traffic images as long-tail traffic images so as to splice the obtained initial long-tail image sequences, so as to obtain at least one long-tail image sequence, wherein each long-tail image sequence also comprises an ID, a timestamp, a recognition detection frame, a target category and the like.
Optionally, after each long-tail image sequence is obtained, extracting an original traffic image of a traffic scene to which the long-tail image sequence belongs from a historical traffic data storage library according to an ID of the long-tail image sequence, and a data record of the upstream task on the original traffic image processing, and determining an image processing category according to the data record, wherein the image processing category comprises detection, segmentation and the like, so that target image processing can be performed on the original traffic image corresponding to each long-tail image sequence according to the image category, and image characteristics of the image after the upstream task processing are obtained.
In one embodiment, training the initial classification model based on image features resulting from target image processing to obtain a target classification model includes: determining an object category label corresponding to the image feature based on the image feature and the recognition result; and training the initial classification model based on the image features and object class labels corresponding to the image features to obtain a target classification model.
In one embodiment, the image features include a feature detection box, the feature detection box is used for identifying objects in the original traffic image, the identification result includes an identification detection box, the identification detection box is used for identifying objects belonging to a target class in the long-tail traffic image, and determining object class labels corresponding to the image features based on the image features and the identification result includes: performing matching processing on the characteristic detection frame and the identification detection frame; and if the certain feature detection frame and the identification detection frame are matched with each other, the target category is taken as an object category label corresponding to the certain feature detection frame.
In one embodiment, the matching processing for the feature detection frame and the identification detection frame includes: mapping the feature detection frame to a coordinate system where the identification detection frame is located based on the coordinate conversion matrix; and carrying out matching processing according to the overlapping degree between the mapped feature detection frame and the identification detection frame.
The overlapping degree (Intersection over Union, ioU) is a standard for measuring accuracy of detecting the corresponding object in the specific data set, and specifically, the overlapping degree between the mapped feature detection frame and the identification detection frame is calculated by calculating the area of the mapped feature detection frame and the area of the identification detection frame, and the area of the mapped feature detection frame is divided by the area of the identification detection frame and multiplied by 100%, which is the overlapping degree.
Optionally, the image features obtained by performing target image processing on the original traffic image include: the recognition results obtained by performing recognition processing on the target traffic images corresponding to various different traffic scenes based on the long tail recognition model comprise the following steps of: long-tail traffic images, recognition detection frames and target categories corresponding to objects in the long-tail traffic images, and the like. The coordinate conversion matrix is calculated from vehicle information including the pose of the vehicle and the like and camera information including the parameters of the camera, the position of the camera relative to the vehicle and the like. And then converting the coordinates of the feature detection frame from a global coordinate system to an image coordinate system according to the coordinate conversion matrix, so that the feature detection frame and the identification detection frame are positioned in the same coordinate system. And calculating the overlapping degree of the feature detection frame and the identification detection frame, wherein if the overlapping degree is larger than or equal to a specific threshold value, the feature detection frame is matched with the identification detection frame, and if the overlapping degree is smaller than the specific threshold value, the feature detection frame is not matched with the identification detection frame. And regarding the matched feature detection frame and the identification detection frame, taking the target category as an object category label corresponding to the feature detection frame. And finally, inputting the image features, object class labels corresponding to the image features, common image features corresponding to common traffic scenes and object class labels corresponding to the common image features into an initial classification model for training to obtain a target classification model. The common image features corresponding to the common traffic scene and the object class labels corresponding to the common image features can be obtained from the historical traffic data storage library.
Optionally, performing the matching process according to the overlapping degree between the mapped feature detection frame and the identification detection frame includes: for each long-tail traffic scene, at least one feature detection frame and at least one identification detection frame are arranged, for each feature detection frame, the overlapping degree of the feature detection frame and the at least one identification detection frame is calculated, and if at least one overlapping degree is greater than or equal to a specific threshold value, the feature detection frame is matched with the identification detection frame; if all the overlapping degrees are smaller than the specific threshold value, the feature detection frame is not matched with the identification detection frame.
Because the recognition result obtained by data mining through the long-tail recognition model is not manually marked, there may be a situation that the recognition result is wrong, and the accuracy of the target classification model obtained by training is low because the long-tail traffic image and the corresponding target class are directly obtained by using the recognition result as the object class label required by training the initial classification model. Therefore, the matching processing of the feature detection frame and the identification detection frame is needed to obtain the object class label corresponding to the image feature, and the object class label corresponding to the image feature is used as the object class label required by training the initial classification model, so that the purpose of improving the accuracy of the target classification model can be achieved.
In one embodiment, the method further includes, before the recognition result is obtained, performing recognition processing on traffic images corresponding to various different traffic scenes based on the long tail recognition model: the method comprises the steps that long-tail sample data are obtained, the long-tail sample data comprise long-tail sample images and sample detection frames, the long-tail sample images are sample images in long-tail traffic scenes, sample objects in the long-tail sample images are target categories, and the sample detection frames are used for identifying the sample objects in the long-tail sample images; training the initial recognition model based on long tail sample data to obtain a long tail recognition model.
The initial recognition model is a neural network classification model, which may be an LSTM model, and the long tail recognition model is a trained neural network classification model, which may be a trained LSTM model.
Optionally, the long tail sample image is manually marked and recorded in advance, and can be obtained from a historical traffic data storage library. The sample detection frame is obtained through at least one interactive retrieval mode of active learning, multi-element retrieval and manual labeling. And taking the long tail sample image with the sample detection frame and the target category of the sample object in the sample detection frame as the input of the initial recognition model to obtain the long tail recognition model.
In summary, as shown in fig. 2, a flow chart of another classification model training method is provided, first, long-tail sample data is obtained, a long-tail recognition model is obtained based on the long-tail sample data training, data mining is performed by using the long-tail recognition model, and long-tail traffic images, recognition detection frames, target categories and the like are obtained from target traffic images corresponding to various traffic scenes. And then, splicing the long-tail traffic images of the same traffic scene to obtain at least one long-tail image sequence. And then for each long-tail image sequence, determining an upstream task according to a traffic scene to which the long-tail image sequence belongs, determining an image processing category according to the upstream task, and performing target image processing on an original traffic image corresponding to the long-tail image sequence based on the corresponding image processing category to obtain image features, wherein the image features comprise a feature detection frame, a target speed, a target pose, a size, point cloud coordinates and the like. And mapping the feature detection frame to a coordinate system where the identification detection frame is located based on the coordinate conversion matrix, performing matching processing according to the overlapping degree between the mapped feature detection frame and the identification detection frame, and if a certain feature detection frame and the identification detection frame are matched with each other, using a target class as an object class label corresponding to the certain feature detection frame, namely, associating the target class of the long-tail traffic image obtained by data mining with the feature detection frame after target image processing. And finally, carrying out model training by taking the image features corresponding to the matched feature detection frames and the object class labels corresponding to the feature detection frames as the input of an initial classification model to obtain a target classification model for object classification, wherein in order to obviously distinguish the image features corresponding to the matched feature detection frames from the common image features, the image features corresponding to the matched feature detection frames can be called long-tail image features. Based on the mode, the data quantity and the richness of model training input can be improved, so that the target classification model can classify various traffic scenes, and the classification efficiency is improved. Compared with a vehicle-road cooperation mode for solving the long-tail traffic scene, the method does not need the required infrastructure hardware facilities and labor cost for vehicle-road cooperation, realizes the effective and low-cost solution for the long-tail traffic scene, and has higher feasibility and expansibility.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a classification model training device for realizing the above-mentioned classification model training method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the classification model training device or devices provided below may be referred to the limitation of the classification model training method hereinabove, and will not be described herein.
In one embodiment, as shown in fig. 3, there is provided a block diagram of a classification model training apparatus 300, comprising: an identification module 301, a determination module 302, and a training module 303, wherein:
the recognition module 301 is configured to perform recognition processing on target traffic images corresponding to various traffic scenes based on the long tail recognition model, so as to obtain a recognition result.
The determining module 302 is configured to determine a long-tail traffic image from the target traffic images based on the recognition result, where the long-tail traffic image is an image in a long-tail traffic scene.
The training module 303 is configured to train the initial classification model based on the long-tail traffic image to obtain a target classification model, where the target classification model is used for classifying the object.
In one embodiment, the long tail recognition model is used for recognizing an image in a long tail traffic scene with the object category as a target category; the determining module 302 is specifically configured to obtain, from the target traffic image, a long-tail traffic image whose recognition result indicates that the class of the object in the image is the target class.
In one embodiment, the training module 303 is specifically configured to perform target image processing on an original traffic image corresponding to the long-tail traffic image, where the target image processing is image processing related to an upstream task of the object classification task; training the initial classification model based on the image characteristics obtained by the target image processing to obtain a target classification model.
In one embodiment, as shown in fig. 4, there is provided a block diagram of another classification model training apparatus, and the classification model training apparatus 300 further includes: the stitching module 401 is configured to stitch long-tail traffic images belonging to the same traffic scene to obtain at least one long-tail image sequence; an image processing category determining module 402, configured to determine, for each long-tail image sequence, an upstream task according to a traffic scene to which the long-tail image sequence belongs, and determine an image processing category according to the upstream task; correspondingly, the training module 303 is further configured to perform, for each long-tail image sequence, target image processing on an original traffic image corresponding to the long-tail image sequence based on the corresponding image processing category.
In one embodiment, the training module 303 is further configured to determine an object class label corresponding to the image feature based on the image feature and the recognition result; and training the initial classification model based on the image features and object class labels corresponding to the image features to obtain a target classification model.
In one embodiment, the image features include a feature detection frame, where the feature detection frame is used to identify an object in the original traffic image, the recognition result includes a recognition detection frame, the recognition detection frame is used to identify an object belonging to the target class in the long-tail traffic image, and the training module 303 is further used to perform matching processing on the feature detection frame and the recognition detection frame; and if the certain feature detection frame and the identification detection frame are matched with each other, the target category is taken as an object category label corresponding to the certain feature detection frame.
In one embodiment, the training module 303 is further configured to map the feature detection frame to a coordinate system where the identification detection frame is located based on the coordinate transformation matrix; and carrying out matching processing according to the overlapping degree between the mapped feature detection frame and the identification detection frame.
In one embodiment, as shown in fig. 5, a block diagram of a classification model training apparatus is provided, and the classification model training apparatus 300 further includes: the long tail recognition model training module 501 is configured to obtain long tail sample data, where the long tail sample data includes a long tail sample image and a sample detection frame, the long tail sample image is a sample image in a long tail traffic scene, a sample object in the long tail sample image is a target class, and the sample detection frame is configured to identify the sample object in the long tail sample image; training the initial recognition model based on long tail sample data to obtain a long tail recognition model.
The above-described respective modules in the classification model training apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing the original traffic image, long tail sample data and the like. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a classification model training method.
It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (9)

1. A method of training a classification model, the method comprising:
performing recognition processing on target traffic images corresponding to various different traffic scenes based on the long tail recognition model to obtain recognition results;
determining a long-tail traffic image from the target traffic image based on the identification result, wherein the long-tail traffic image is an image in a long-tail traffic scene;
Training an initial classification model based on the long-tail traffic image to obtain a target classification model, wherein the target classification model is used for classifying objects;
training the initial classification model based on the long-tail traffic image to obtain a target classification model, wherein the training comprises the following steps:
splicing the long-tail traffic images belonging to the same traffic scene to obtain at least one long-tail image sequence; for each long-tail image sequence, determining an upstream task of an object classification task according to a traffic scene to which the long-tail image sequence belongs, and determining an image processing category according to the upstream task;
for each long-tail image sequence, performing target image processing on an original traffic image corresponding to the long-tail image sequence based on the corresponding image processing category, wherein the target image processing is the image processing related to the upstream task;
training the initial classification model based on the image features obtained by the target image processing to obtain the target classification model.
2. The method of claim 1, wherein the long tail recognition model is used to recognize images in long tail traffic scenes in which the class of the object in the images is a target class; the determining the long tail traffic image from the target traffic images based on the identification result comprises the following steps:
And acquiring the long tail traffic image with the category of the object in the identification result indicating image as the target category from the target traffic image.
3. The method of claim 1, wherein the training the initial classification model based on the image features resulting from the target image processing to result in the target classification model comprises:
determining an object category label corresponding to the image feature based on the image feature and the identification result;
and training the initial classification model based on the image features and object class labels corresponding to the image features to obtain the target classification model.
4. A method according to claim 3, wherein the image features include a feature detection box for identifying objects in the original traffic image, the recognition result includes a recognition detection box for identifying objects in the long-tail traffic image that belong to the target class, and the determining an object class label corresponding to the image features based on the image features and the recognition result includes:
performing matching processing on the characteristic detection frame and the identification detection frame;
And if the certain characteristic detection frame and the identification detection frame are matched with each other, taking the target category as an object category label corresponding to the certain characteristic detection frame.
5. The method of claim 4, wherein the matching the feature detection box and the identification detection box comprises:
mapping the feature detection frame to a coordinate system where the identification detection frame is located based on a coordinate transformation matrix;
and carrying out matching processing according to the overlapping degree between the mapped feature detection frame and the identification detection frame.
6. The method according to any one of claims 1 to 5, wherein the long tail recognition model is used for recognizing traffic images corresponding to various different traffic scenes, and before the recognition result is obtained, the method further comprises:
acquiring long-tail sample data, wherein the long-tail sample data comprises a long-tail sample image and a sample detection frame, the long-tail sample image is a sample image in a long-tail traffic scene, a sample object in the long-tail sample image is a target class, and the sample detection frame is used for identifying the sample object in the long-tail sample image;
training an initial recognition model based on the long tail sample data to obtain the long tail recognition model.
7. A classification model training apparatus, the apparatus comprising:
the recognition module is used for recognizing and processing target traffic images corresponding to various different traffic scenes based on the long tail recognition model to obtain recognition results;
the determining module is used for determining a long-tail traffic image from the target traffic image based on the identification result, wherein the long-tail traffic image is an image in a long-tail traffic scene;
the training module is used for training the initial classification model based on the long-tail traffic image to obtain a target classification model, and the target classification model is used for classifying objects;
the splicing module is used for carrying out splicing treatment on the long-tail traffic images belonging to the same traffic scene to obtain at least one long-tail image sequence; for each long-tail image sequence, determining an upstream task of an object classification task according to a traffic scene to which the long-tail image sequence belongs, and determining an image processing category according to the upstream task;
the training module is specifically configured to perform, for each long-tail image sequence, target image processing on an original traffic image corresponding to the long-tail image sequence based on the corresponding image processing category, where the target image processing is image processing related to the upstream task; training the initial classification model based on the image features obtained by the target image processing to obtain the target classification model.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202211722251.1A 2022-12-30 2022-12-30 Classification model training method, device, equipment, storage medium and program product Active CN115830399B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211722251.1A CN115830399B (en) 2022-12-30 2022-12-30 Classification model training method, device, equipment, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211722251.1A CN115830399B (en) 2022-12-30 2022-12-30 Classification model training method, device, equipment, storage medium and program product

Publications (2)

Publication Number Publication Date
CN115830399A CN115830399A (en) 2023-03-21
CN115830399B true CN115830399B (en) 2023-09-12

Family

ID=85519623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211722251.1A Active CN115830399B (en) 2022-12-30 2022-12-30 Classification model training method, device, equipment, storage medium and program product

Country Status (1)

Country Link
CN (1) CN115830399B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117058564B (en) * 2023-10-11 2023-12-22 光轮智能(北京)科技有限公司 Virtual perception data acquisition method and long tail scene data mining method
CN117612140B (en) * 2024-01-19 2024-04-19 福思(杭州)智能科技有限公司 Road scene identification method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291841A (en) * 2020-05-13 2020-06-16 腾讯科技(深圳)有限公司 Image recognition model training method and device, computer equipment and storage medium
US10970577B1 (en) * 2017-09-29 2021-04-06 Snap Inc. Machine learned single image icon identification
CN113611008A (en) * 2021-07-30 2021-11-05 广州文远知行科技有限公司 Vehicle driving scene acquisition method, device, equipment and medium
CN113688760A (en) * 2021-08-31 2021-11-23 广州文远知行科技有限公司 Automatic driving data identification method and device, computer equipment and storage medium
CN114692715A (en) * 2020-12-30 2022-07-01 华为技术有限公司 Sample labeling method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110879950A (en) * 2018-09-06 2020-03-13 北京市商汤科技开发有限公司 Multi-stage target classification and traffic sign detection method and device, equipment and medium
US11580851B2 (en) * 2020-11-17 2023-02-14 Uatc, Llc Systems and methods for simulating traffic scenes

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10970577B1 (en) * 2017-09-29 2021-04-06 Snap Inc. Machine learned single image icon identification
CN111291841A (en) * 2020-05-13 2020-06-16 腾讯科技(深圳)有限公司 Image recognition model training method and device, computer equipment and storage medium
CN114692715A (en) * 2020-12-30 2022-07-01 华为技术有限公司 Sample labeling method and device
CN113611008A (en) * 2021-07-30 2021-11-05 广州文远知行科技有限公司 Vehicle driving scene acquisition method, device, equipment and medium
CN113688760A (en) * 2021-08-31 2021-11-23 广州文远知行科技有限公司 Automatic driving data identification method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《Deep super-class learning for long-tail distributed image classification》;Yucan Zhou等;《Pattern Recognition》;第80卷;全文 *

Also Published As

Publication number Publication date
CN115830399A (en) 2023-03-21

Similar Documents

Publication Publication Date Title
CN115830399B (en) Classification model training method, device, equipment, storage medium and program product
CN109086811B (en) Multi-label image classification method and device and electronic equipment
US8620026B2 (en) Video-based detection of multiple object types under varying poses
CN111767878B (en) Deep learning-based traffic sign detection method and system in embedded device
CN111767927A (en) Lightweight license plate recognition method and system based on full convolution network
CN111274926B (en) Image data screening method, device, computer equipment and storage medium
CN112801236B (en) Image recognition model migration method, device, equipment and storage medium
CN113343461A (en) Simulation method and device for automatic driving vehicle, electronic equipment and storage medium
CN112541372B (en) Difficult sample screening method and device
CN109857878B (en) Article labeling method and device, electronic equipment and storage medium
US20210256738A1 (en) Computer-implemented method and system for generating a virtual vehicle environment
CN115115825B (en) Method, device, computer equipment and storage medium for detecting object in image
CN114157829A (en) Model training optimization method and device, computer equipment and storage medium
CN111898418A (en) Human body abnormal behavior detection method based on T-TINY-YOLO network
Isa et al. Real-time traffic sign detection and recognition using Raspberry Pi
CN112528058B (en) Fine-grained image classification method based on image attribute active learning
CN116964588A (en) Target detection method, target detection model training method and device
CN114596435A (en) Semantic segmentation label generation method, device, equipment and storage medium
CN113408356A (en) Pedestrian re-identification method, device and equipment based on deep learning and storage medium
CN112529116A (en) Scene element fusion processing method, device and equipment and computer storage medium
CN110659384B (en) Video structured analysis method and device
CN112418020A (en) Attention mechanism-based YOLOv3 illegal billboard intelligent detection method
Ogawa et al. Identifying Parking Lot Occupancy with YOLOv5
CN112131418A (en) Target labeling method, target labeling device and computer-readable storage medium
US20230094252A1 (en) Method and system for automatically annotating sensor data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant