CN116644296A - Data enhancement method and device - Google Patents

Data enhancement method and device Download PDF

Info

Publication number
CN116644296A
CN116644296A CN202310928849.4A CN202310928849A CN116644296A CN 116644296 A CN116644296 A CN 116644296A CN 202310928849 A CN202310928849 A CN 202310928849A CN 116644296 A CN116644296 A CN 116644296A
Authority
CN
China
Prior art keywords
sample
data
sampling
class
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310928849.4A
Other languages
Chinese (zh)
Other versions
CN116644296B (en
Inventor
严海旭
兰晓松
刘羿
何贝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinian Zhijia Technology Co ltd
Original Assignee
Beijing Sinian Zhijia Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinian Zhijia Technology Co ltd filed Critical Beijing Sinian Zhijia Technology Co ltd
Priority to CN202310928849.4A priority Critical patent/CN116644296B/en
Publication of CN116644296A publication Critical patent/CN116644296A/en
Application granted granted Critical
Publication of CN116644296B publication Critical patent/CN116644296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • G06F18/15Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a data enhancement method and a device, comprising the following steps: generating a sample data set based on at least one frame of point cloud data, wherein the sample data set comprises sample data corresponding to a plurality of sample categories; carrying out statistical analysis on the sample data set, and determining sampling parameters of each sample category, wherein the sampling parameters comprise at least one of sampling conditions and sampling quantity; screening target sample data conforming to sampling parameters of the sample category from sample data corresponding to the sample category aiming at each sample category; sampling, for each sample class, the sample class based on the target sample data; a training data set is generated based on the sample data sampled for each sample class. Therefore, a training data set with higher data quality can be generated before model training is performed through screening, so that the problem of model overfitting is avoided while the training data sample is balanced.

Description

Data enhancement method and device
Technical Field
The present application relates to the field of machine learning technologies, and in particular, to a data enhancement method and apparatus.
Background
The effect of machine learning is greatly dependent on training data. Taking 3D object detection in the automatic driving field as an example, when data are collected, due to the high cost of manually labeling the data, only some common objects can be collected, but for some unusual objects, enough data are difficult to collect, and the problem of unbalanced data samples occurs, so that the model tends to learn more categories during training, and ignores less categories, thereby affecting the generalization capability of the model. For example, in a real operating scenario of autopilot, some categories of objects occur very infrequently, such as pedestrians occurring very infrequently compared to vehicles and road signs; however, if the recognition accuracy of the model to the pedestrians is low, the automatic driving vehicle can not timely avoid the pedestrians, so that accidents can be caused. Therefore, solving the data sample imbalance problem is very important for machine learning.
To address this problem, a common approach in the industry today is resampling, which includes both upsampling and downsampling methods. However, the downsampling scheme wastes a large amount of quality data samples; the upsampling scheme randomly occurs in the model training process, lacks control over the quality of the sampled data and cannot ensure whether the upsampled generated data is reasonable, which can negatively affect the training of the model. Sometimes the model is countered, but the model is over-fitted. Therefore, both of these existing schemes fail to meet the data enhancement requirements.
Disclosure of Invention
In view of the above, the present application is directed to a data enhancement method and apparatus, which determines sampling parameters of each sample class by performing statistical analysis on a sample dataset; sample data of each sample category is screened in advance according to the sampling parameters, and sampling is carried out in the screened sample data to generate a training data set. Therefore, a training data set with higher data quality can be generated before model training is carried out, and the problem of model overfitting is avoided while the training data sample is balanced.
The embodiment of the application provides a data enhancement method, which comprises the following steps:
generating a sample data set based on at least one frame of point cloud data, wherein the sample data set comprises sample data corresponding to a plurality of sample categories;
carrying out statistical analysis on the sample data set, and determining sampling parameters of each sample category, wherein the sampling parameters comprise at least one of sampling conditions and sampling quantity;
screening target sample data conforming to sampling parameters of the sample category from sample data corresponding to the sample category aiming at each sample category;
sampling, for each sample class, the sample class based on the target sample data;
a training data set is generated based on the sample data sampled for each sample class.
Further, performing statistical analysis on the sample data set to determine sampling parameters of each sample class, including:
carrying out statistical analysis on the sample data set, and determining the sample number of each sample class and/or the characteristic distribution of each sample class on a data characteristic item;
determining the sampling number of each sample category according to the sample number of each sample category; and/or determining the sampling condition of each sample category according to the characteristic distribution of each sample category on the data characteristic item.
Further, determining the number of samples for each sample class based on the number of samples for each sample class includes:
determining class balance reference quantity according to the number of the sample classes and the number of samples of each sample class;
determining the number grade of each sample class according to the sample number of each sample class and the class balance reference quantity;
and determining the sampling quantity of each sample class according to the sampling rule corresponding to the sample quantity and the quantity grade of each sample class.
Further, the number levels at least include a high frequency level, a medium frequency level and a low frequency level, and each number level corresponds to a sample number interval without overlapping ranges;
the sampling rule is as follows:
the sampling number corresponding to the high frequency level is equal to 0;
the sampling number corresponding to the intermediate frequency level enables the sampled sample number to reach the lower limit value of the sample number interval corresponding to the high frequency level;
and the sampling number corresponding to the low-frequency level enables the sampled sample number to reach the lower limit value of the sample number interval corresponding to the intermediate-frequency level.
Further, the data characteristic items comprise the number of point clouds corresponding to each sample and/or the distance distribution of each sample relative to the target position;
determining sampling conditions of each sample category according to the characteristic distribution of each sample category on the data characteristic item, wherein the sampling conditions comprise:
according to the characteristic distribution of each sample category on the data characteristic item, determining the distance distribution range and/or the point cloud quantity threshold value of the samples in each sample category;
and determining the sampling condition of each sample category according to the distance distribution range and/or the point cloud quantity threshold.
Further, for each sample class, sampling the sample class based on the target sample data, including:
determining the corresponding frame sampling quantity of the sample class in each frame of point cloud data according to the frame number of the at least one frame of point cloud data;
and aiming at each frame of point cloud data, sampling in the target sample data according to the frame sampling number corresponding to the sample class, and obtaining sample data obtained after sampling the sample class.
Further, generating a training data set based on the sample data sampled for each sample class includes:
and adding the sample data obtained after sampling each sample type into the frame point cloud data to obtain training frame data, wherein the training data set comprises at least one training frame data.
The embodiment of the application also provides a data enhancement device, which comprises:
the first generation module is used for generating a sample data set based on at least one frame of point cloud data, wherein the sample data set comprises sample data corresponding to a plurality of sample categories;
the analysis module is used for carrying out statistical analysis on the sample data set and determining sampling parameters of each sample category, wherein the sampling parameters comprise at least one of sampling conditions and sampling quantity;
the screening module is used for screening target sample data which accords with the sampling parameters of the sample category from the sample data corresponding to the sample category aiming at each sample category;
the sampling module is used for sampling each sample category based on the target sample data;
and the second generation module is used for generating a training data set based on the sample data obtained by sampling for each sample category.
The embodiment of the application also provides electronic equipment, which comprises: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating over the bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of a data enhancement method as described above.
The embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a data enhancement method as described above.
According to the data enhancement method and device provided by the embodiment of the application, the sampling parameters of each sample type are determined by carrying out statistical analysis on the sample data set; sample data of each sample category is screened in advance according to the sampling parameters, and sampling is carried out in the screened sample data to generate a training data set. Therefore, a training data set with higher data quality can be generated before model training is carried out, and the problem of model overfitting is avoided while the training data sample is balanced.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data enhancement method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a data enhancement device according to an embodiment of the present application;
fig. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, every other embodiment obtained by a person skilled in the art without making any inventive effort falls within the scope of protection of the present application.
It was found that the effect of machine learning is greatly dependent on training data. The following description will take 3D object detection in the field of automatic driving as an example.
3D object detection is an important link in the field of autopilot. Specifically, the 3D target detection technology aims at identifying roads, vehicles, pedestrians and the like in running by utilizing three-dimensional point cloud data produced by a laser radar, so that an automatic driving vehicle has environment sensing capability, and the safety and the intelligence of the automatic driving vehicle are improved. However, when data are collected, due to the high cost of manually labeling the data and other reasons, only some common objects can be collected, but for some unusual objects, enough data are difficult to collect, and the problem of unbalanced data samples occurs, so that the model tends to learn more categories during training, and ignores less categories, thereby influencing the generalization capability of the model. For example, in a real operating scenario of autopilot, some categories of objects occur very infrequently, such as pedestrians occurring very infrequently compared to vehicles and road signs; however, if the recognition accuracy of the model to the pedestrians is low, the automatic driving vehicle can not timely avoid the pedestrians in the application stage, so that accidents can be caused. Therefore, solving the data sample imbalance problem is very important for 3D object detection.
To address this problem, a common approach in the industry today is resampling, which includes both upsampling and downsampling methods. However, the downsampling scheme wastes a large amount of quality data samples; the upsampling scheme randomly occurs in the model training process, lacks control over the quality of the sampled data and cannot ensure whether the upsampled generated data is reasonable, which can negatively affect the training of the model. Sometimes the model is countered, but the model is over-fitted. Therefore, both of these existing schemes cannot meet the data enhancement requirements in 3D object detection.
Based on the above, the embodiment of the application provides a data enhancement method and a data enhancement device, which determine sampling parameters of each sample category by carrying out statistical analysis on a sample data set; sample data of each sample category is screened in advance according to the sampling parameters, and sampling is carried out in the screened sample data to generate a training data set. Therefore, a training data set with higher data quality can be generated before model training is performed through pre-screening, so that the problem of model overfitting is avoided while the training data sample is balanced.
Referring to fig. 1, fig. 1 is a flowchart of a data enhancement method according to an embodiment of the present application. As shown in fig. 1, a method provided by an embodiment of the present application includes:
s101, generating a sample data set based on at least one frame of point cloud data.
Here, at least one frame of point cloud data of the surrounding environment may be collected by a lidar in a perception system of the autonomous vehicle, forming an initial dataset; each frame of point cloud data comprises point cloud coordinate data and corresponding point cloud annotation data; the point cloud annotation data comprises at least one annotation frame, and each annotation frame and the corresponding in-frame point cloud are regarded as one sample, so that each frame of point cloud data comprises sample data of at least one sample. Different samples may belong to different sample categories, such as transportation categories, pedestrians, vehicles, etc.
In this step, the point cloud data of each frame is statistically classified according to the sample category of each sample included in the point cloud data of each frame, and a sample data set in units of a single sample can be generated. The sample data set comprises sample data corresponding to a plurality of sample categories.
S102, carrying out statistical analysis on the sample data set, and determining sampling parameters of each sample type.
The sample parameters of each sample class can be determined by carrying out statistical analysis on the sample data set, so that the sample data are filtered according to the sample parameters for sampling to balance the sample data corresponding to different sample classes. The sampling parameter includes at least one of a sampling condition and a sampling number.
In one possible implementation, step S102 may include:
s1021, carrying out statistical analysis on the sample data set, and determining the sample number of each sample type and/or the characteristic distribution of each sample type on the data characteristic item.
The data characteristic items comprise the number of point clouds corresponding to each sample and/or the distance distribution of each sample relative to the target position. In specific implementation, the content of the statistical analysis may include information such as the number of samples in each sample category, the number of point clouds in each sample frame, the distance distribution from each point of the point clouds to the target position, and the distance variation range; and (3) taking the sample category as a unit for a statistical analysis result of the sample data set, and determining the sample number of each sample category and/or the characteristic distribution of each sample category on the data characteristic item.
S1022, determining the sampling number of each sample category according to the sample number of each sample category; and/or determining the sampling condition of each sample category according to the characteristic distribution of each sample category on the data characteristic item.
In a first possible implementation, determining the number of samples for each sample class in S1022 may include:
step 1, determining class balance reference quantity according to the number of sample classes and the number of samples of each sample class.
Specifically, category balance reference;/>The total number of samples in the sample dataset may be determined by summing the number of samples for each sample class; />Representing the number of sample categories.
And 2, determining the number grade of each sample class according to the sample number of each sample class and the class balance reference quantity.
Specifically, can be provided withSetting a classification thresholdThen balancing the reference quantity and the grading threshold value according to the number of samples of each sample class, said class>The number level of each sample class is determined. Wherein the number level includes at least a high frequency levelIntermediate frequency class->And low frequency class->Each number level corresponds to a sample number interval having no overlapping range with each other. The formula is:
in the method, in the process of the application,indicate->A number level of the individual sample categories; />Indicate->Number of samples of the individual sample classes.
And step 3, determining the sampling quantity of each sample class according to the sampling rules corresponding to the sample quantity and the quantity grade of each sample class. Wherein, the sampling rule is:
the sampling number corresponding to the high frequency level is equal to 0; i.e. the high frequency level does not require sampling.
The intermediate frequencyThe number of samples corresponding to the level enables the number of samples after sampling to reach the lower limit value of the sample number interval corresponding to the high-frequency level; i.e. the number of additional samples of the intermediate frequency class is
The sampling number corresponding to the low-frequency level enables the sampled sample number to reach the lower limit value of the sample number interval corresponding to the intermediate-frequency level; i.e. the number of additional samples of the low frequency class is
Therefore, for sample types with a small number, the sampling number is determined and sampled in the mode, so that the sample number can be expanded, the problem of unbalanced sample number is avoided, and the generalization capability of the model is indirectly improved. While the number of samples is expanded, the sample types with different number levels can still keep the relative number relation after sampling, so that the model can correctly distribute learning during training.
In a second possible implementation manner, the data characteristic item includes the number of point clouds corresponding to each sample and/or the distance distribution of each sample relative to the target position. In general, the target location may be a self-vehicle location of an autonomous vehicle. Taking a cone barrel sample in a sample data set of a port as an example, statistical analysis shows that the cone barrel sample in the sample data set is generally distributed on two sides of a vehicle, and the cone barrel sample is most distributed near 15-20 meters away from the vehicle, and the distribution of the number of point clouds is relatively uniform up to about 100 meters at the most.
The determining of the sampling condition for each sample class in S1022 may include:
according to the characteristic distribution of each sample category on the data characteristic item, determining the distance distribution range and/or the point cloud quantity threshold value of the samples in each sample category; and determining the sampling condition of each sample category according to the distance distribution range and/or the point cloud quantity threshold.
For example, the sampling condition of a certain sample class may be that the distance from the center point in the sample frame to the own vehicle is within a preset distance distribution range, and the number of point clouds in the sample frame is greater than a threshold value of the number of point clouds.
S103, screening target sample data which accords with sampling parameters of the sample type from sample data corresponding to the sample type according to each sample type.
In the step, target sample data conforming to the sampling parameters of each sample category can be screened from sample data corresponding to the sample category in advance, so that the target sample data with high data quality and reasonable data quality can be screened, and the collected error noise sample data can be filtered.
S104, sampling each sample category based on the target sample data.
In one possible implementation, step S104 may include:
determining the corresponding frame sampling quantity of the sample class in each frame of point cloud data according to the frame number of the at least one frame of point cloud data; and aiming at each frame of point cloud data, sampling in the target sample data according to the frame sampling number corresponding to the sample class, and obtaining sample data obtained after sampling the sample class.
Specifically, for any one sample classThe corresponding number of frame samples in each frame of point cloud data, namely the sample category which needs to be increased in each frame of point cloud data ∈>Can be expressed as: />;/>Representing the number of frames. Thereafter, according to the number of frame samples +.>In the sample class->Sampling is carried out in the corresponding target sample data.
S105, generating a training data set based on sample data obtained by sampling for each sample type.
In one possible implementation, step S105 may include: and adding the sample data obtained after sampling each sample type into the frame point cloud data to obtain training frame data, wherein the training data set comprises at least one training frame data. The training data set can be used for training in a model training stage to obtain a 3D target detection model.
In this way, the pre-sampling mode used in the embodiment of the application generates the training data set in the data preparation stage before the model training stage, the model uses the regenerated training data set for training, and no other sampling mode is needed for data enhancement during training. By advancing the sampling process, the training speed is increased, the randomness is reduced, and the model performance can be improved in stability.
The embodiment of the application provides a data enhancement method, which comprises the following steps: generating a sample data set based on at least one frame of point cloud data, wherein the sample data set comprises sample data corresponding to a plurality of sample categories; carrying out statistical analysis on the sample data set, and determining sampling parameters of each sample category, wherein the sampling parameters comprise at least one of sampling conditions and sampling quantity; screening target sample data conforming to sampling parameters of the sample category from sample data corresponding to the sample category aiming at each sample category; sampling, for each sample class, the sample class based on the target sample data; a training data set is generated based on the sample data sampled for each sample class.
Determining sampling parameters of each sample category by carrying out statistical analysis on the sample data set; sample data of each sample category is screened in advance according to the sampling parameters, and sampling is carried out in the screened sample data to generate a training data set. Therefore, a training data set with higher data quality can be generated before model training is performed through screening, so that the problem of model overfitting is avoided while the training data sample is balanced. In addition, the random sampling process is advanced, so that multiple rounds of random sampling during model training in the prior art are converted into random sampling under the limitation of a single round rule before training, the model training speed is improved, and the randomness is reduced.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a data enhancement device according to an embodiment of the application. As shown in fig. 2, the apparatus 300 includes:
a first generation module 310, configured to generate a sample data set based on at least one frame of point cloud data, where the sample data set includes sample data corresponding to a plurality of sample categories;
an analysis module 320, configured to perform statistical analysis on the sample data set, and determine a sampling parameter of each sample class, where the sampling parameter includes at least one of a sampling condition and a sampling number;
a screening module 330, configured to screen, for each sample class, target sample data that accords with sampling parameters of the sample class from sample data corresponding to the sample class;
a sampling module 340, configured to sample, for each sample class, the sample class based on the target sample data;
the second generating module 350 is configured to generate a training data set based on the sample data obtained by sampling for each sample class.
Further, the analysis module 320 is configured to, when configured to perform statistical analysis on the sample data set, determine sampling parameters of each sample class, the analysis module 320 is configured to:
carrying out statistical analysis on the sample data set, and determining the sample number of each sample class and/or the characteristic distribution of each sample class on a data characteristic item;
determining the sampling number of each sample category according to the sample number of each sample category; and/or determining the sampling condition of each sample category according to the characteristic distribution of each sample category on the data characteristic item.
Further, the analysis module 320 is configured to, when configured to determine the number of samples of each sample class according to the number of samples of each sample class, the analysis module 320 is configured to:
determining class balance reference quantity according to the number of the sample classes and the number of samples of each sample class;
determining the number grade of each sample class according to the sample number of each sample class and the class balance reference quantity;
and determining the sampling quantity of each sample class according to the sampling rule corresponding to the sample quantity and the quantity grade of each sample class.
Further, the number levels at least include a high frequency level, a medium frequency level and a low frequency level, and each number level corresponds to a sample number interval without overlapping ranges;
the sampling rule is as follows:
the sampling number corresponding to the high frequency level is equal to 0;
the sampling number corresponding to the intermediate frequency level enables the sampled sample number to reach the lower limit value of the sample number interval corresponding to the high frequency level;
and the sampling number corresponding to the low-frequency level enables the sampled sample number to reach the lower limit value of the sample number interval corresponding to the intermediate-frequency level.
Further, the data characteristic items comprise the number of point clouds corresponding to each sample and/or the distance distribution of each sample relative to the target position;
the analysis module 320 is configured to, when configured to determine a sampling condition of each sample class according to a feature distribution of each sample class on the data feature item, the analysis module 320 is configured to:
according to the characteristic distribution of each sample category on the data characteristic item, determining the distance distribution range and/or the point cloud quantity threshold value of the samples in each sample category;
and determining the sampling condition of each sample category according to the distance distribution range and/or the point cloud quantity threshold.
Further, when the sampling module 340 is configured to sample each sample class based on the target sample data, the sampling module 340 is configured to:
determining the corresponding frame sampling quantity of the sample class in each frame of point cloud data according to the frame number of the at least one frame of point cloud data;
and aiming at each frame of point cloud data, sampling in the target sample data according to the frame sampling number corresponding to the sample class, and obtaining sample data obtained after sampling the sample class.
Further, when the second generating module 350 is configured to generate a training data set based on the sample data obtained by sampling for each sample class, the second generating module 350 is configured to:
and adding the sample data obtained after sampling each sample type into the frame point cloud data to obtain training frame data, wherein the training data set comprises at least one training frame data.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the application. As shown in fig. 3, the electronic device 400 includes a processor 410, a memory 420, and a bus 430.
The memory 420 stores machine-readable instructions executable by the processor 410, and when the electronic device 400 is running, the processor 410 communicates with the memory 420 through the bus 430, and when the machine-readable instructions are executed by the processor 410, a step of a data enhancement method in the method embodiment shown in fig. 1 may be executed, and a specific implementation may refer to the method embodiment and will not be described herein.
The embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program may execute the steps of a data enhancement method in the method embodiment shown in fig. 1 when the computer program is executed by a processor, and the specific implementation manner may refer to the method embodiment and will not be described herein.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (10)

1. A method of data enhancement, the method comprising:
generating a sample data set based on at least one frame of point cloud data, wherein the sample data set comprises sample data corresponding to a plurality of sample categories;
carrying out statistical analysis on the sample data set, and determining sampling parameters of each sample category, wherein the sampling parameters comprise at least one of sampling conditions and sampling quantity;
screening target sample data conforming to sampling parameters of the sample category from sample data corresponding to the sample category aiming at each sample category;
sampling, for each sample class, the sample class based on the target sample data;
a training data set is generated based on the sample data sampled for each sample class.
2. The method of claim 1, wherein statistically analyzing the sample dataset to determine sampling parameters for each sample class comprises:
carrying out statistical analysis on the sample data set, and determining the sample number of each sample class and/or the characteristic distribution of each sample class on a data characteristic item;
determining the sampling number of each sample category according to the sample number of each sample category; and/or determining the sampling condition of each sample category according to the characteristic distribution of each sample category on the data characteristic item.
3. The method of claim 2, wherein determining the number of samples for each sample class based on the number of samples for each sample class comprises:
determining class balance reference quantity according to the number of the sample classes and the number of samples of each sample class;
determining the number grade of each sample class according to the sample number of each sample class and the class balance reference quantity;
and determining the sampling quantity of each sample class according to the sampling rule corresponding to the sample quantity and the quantity grade of each sample class.
4. The method of claim 3, wherein the step of,
the number levels at least comprise a high frequency level, a medium frequency level and a low frequency level, and each number level corresponds to a sample number interval without overlapping range;
the sampling rule is as follows:
the sampling number corresponding to the high frequency level is equal to 0;
the sampling number corresponding to the intermediate frequency level enables the sampled sample number to reach the lower limit value of the sample number interval corresponding to the high frequency level;
and the sampling number corresponding to the low-frequency level enables the sampled sample number to reach the lower limit value of the sample number interval corresponding to the intermediate-frequency level.
5. The method of claim 2, wherein the step of determining the position of the substrate comprises,
the data characteristic items comprise the number of point clouds corresponding to each sample and/or the distance distribution of each sample relative to the target position;
determining sampling conditions of each sample category according to the characteristic distribution of each sample category on the data characteristic item, wherein the sampling conditions comprise:
according to the characteristic distribution of each sample category on the data characteristic item, determining the distance distribution range and/or the point cloud quantity threshold value of the samples in each sample category;
and determining the sampling condition of each sample category according to the distance distribution range and/or the point cloud quantity threshold.
6. The method of claim 1, wherein for each sample class, sampling that sample class based on the target sample data comprises:
determining the corresponding frame sampling quantity of the sample class in each frame of point cloud data according to the frame number of the at least one frame of point cloud data;
and aiming at each frame of point cloud data, sampling in the target sample data according to the frame sampling number corresponding to the sample class, and obtaining sample data obtained after sampling the sample class.
7. The method of claim 6, wherein generating a training data set based on the sampled sample data for each sample class comprises:
and adding the sample data obtained after sampling each sample type into the frame point cloud data to obtain training frame data, wherein the training data set comprises at least one training frame data.
8. A data enhancement device, the device comprising:
the first generation module is used for generating a sample data set based on at least one frame of point cloud data, wherein the sample data set comprises sample data corresponding to a plurality of sample categories;
the analysis module is used for carrying out statistical analysis on the sample data set and determining sampling parameters of each sample category, wherein the sampling parameters comprise at least one of sampling conditions and sampling quantity;
the screening module is used for screening target sample data which accords with the sampling parameters of the sample category from the sample data corresponding to the sample category aiming at each sample category;
the sampling module is used for sampling each sample category based on the target sample data;
and the second generation module is used for generating a training data set based on the sample data obtained by sampling for each sample category.
9. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating via said bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of a data enhancement method according to any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of a data enhancement method according to any of claims 1 to 7.
CN202310928849.4A 2023-07-27 2023-07-27 Data enhancement method and device Active CN116644296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310928849.4A CN116644296B (en) 2023-07-27 2023-07-27 Data enhancement method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310928849.4A CN116644296B (en) 2023-07-27 2023-07-27 Data enhancement method and device

Publications (2)

Publication Number Publication Date
CN116644296A true CN116644296A (en) 2023-08-25
CN116644296B CN116644296B (en) 2023-10-03

Family

ID=87643760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310928849.4A Active CN116644296B (en) 2023-07-27 2023-07-27 Data enhancement method and device

Country Status (1)

Country Link
CN (1) CN116644296B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220114805A1 (en) * 2021-12-22 2022-04-14 Julio Fernando Jarquin Arroyo Autonomous vehicle perception multimodal sensor data management
CN114419018A (en) * 2022-01-25 2022-04-29 重庆紫光华山智安科技有限公司 Image sampling method, system, device and medium
CN114529778A (en) * 2021-12-22 2022-05-24 武汉万集光电技术有限公司 Data enhancement method, device, equipment and storage medium
CN114881096A (en) * 2021-02-05 2022-08-09 华为技术有限公司 Multi-label class balancing method and device
CN115222858A (en) * 2022-07-27 2022-10-21 上海硬通网络科技有限公司 Method and equipment for training animation reconstruction network and image reconstruction and video reconstruction thereof
CN115346041A (en) * 2022-09-05 2022-11-15 北京云迹科技股份有限公司 Point position marking method, device and equipment based on deep learning and storage medium
CN115512391A (en) * 2022-09-29 2022-12-23 珠海视熙科技有限公司 Target detection model training method, device and equipment for data adaptive resampling

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881096A (en) * 2021-02-05 2022-08-09 华为技术有限公司 Multi-label class balancing method and device
US20220114805A1 (en) * 2021-12-22 2022-04-14 Julio Fernando Jarquin Arroyo Autonomous vehicle perception multimodal sensor data management
CN114529778A (en) * 2021-12-22 2022-05-24 武汉万集光电技术有限公司 Data enhancement method, device, equipment and storage medium
CN114419018A (en) * 2022-01-25 2022-04-29 重庆紫光华山智安科技有限公司 Image sampling method, system, device and medium
CN115222858A (en) * 2022-07-27 2022-10-21 上海硬通网络科技有限公司 Method and equipment for training animation reconstruction network and image reconstruction and video reconstruction thereof
CN115346041A (en) * 2022-09-05 2022-11-15 北京云迹科技股份有限公司 Point position marking method, device and equipment based on deep learning and storage medium
CN115512391A (en) * 2022-09-29 2022-12-23 珠海视熙科技有限公司 Target detection model training method, device and equipment for data adaptive resampling

Also Published As

Publication number Publication date
CN116644296B (en) 2023-10-03

Similar Documents

Publication Publication Date Title
CN107169768B (en) Method and device for acquiring abnormal transaction data
CN113688042B (en) Determination method and device of test scene, electronic equipment and readable storage medium
CN111460312A (en) Method and device for identifying empty-shell enterprise and computer equipment
CN107483451B (en) Method and system for processing network security data based on serial-parallel structure and social network
CN114387591A (en) License plate recognition method, system, equipment and storage medium
CN111723815A (en) Model training method, image processing method, device, computer system, and medium
US20220358747A1 (en) Method and Generator for Generating Disturbed Input Data for a Neural Network
CN113313479A (en) Payment service big data processing method and system based on artificial intelligence
CN112884121A (en) Traffic identification method based on generation of confrontation deep convolutional network
CN111753592A (en) Traffic sign recognition method, traffic sign recognition device, computer equipment and storage medium
CN103258123A (en) Steganalysis method based on blindness of steganalysis systems
CN113297939B (en) Obstacle detection method, obstacle detection system, terminal device and storage medium
CN111178153A (en) Traffic sign detection method and system
CN116644296B (en) Data enhancement method and device
CN114024761A (en) Network threat data detection method and device, storage medium and electronic equipment
CN113765850B (en) Internet of things abnormality detection method and device, computing equipment and computer storage medium
Hashemi et al. Runtime monitoring for out-of-distribution detection in object detection neural networks
CN114520775B (en) Application control method and device, electronic equipment and storage medium
CN114970694B (en) Network security situation assessment method and model training method thereof
CN115689946A (en) Image restoration method, electronic device and computer program product
CN117391214A (en) Model training method and device and related equipment
CN114707566A (en) Intelligent networking automobile abnormity intelligent detection method and device and readable storage medium
CN115037790A (en) Abnormal registration identification method, device, equipment and storage medium
CN113902999A (en) Tracking method, device, equipment and medium
CN113553953A (en) Vehicle parabolic detection method and device, electronic device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant