CN109726195A - A kind of data enhancement methods and device - Google Patents
A kind of data enhancement methods and device Download PDFInfo
- Publication number
- CN109726195A CN109726195A CN201811419516.4A CN201811419516A CN109726195A CN 109726195 A CN109726195 A CN 109726195A CN 201811419516 A CN201811419516 A CN 201811419516A CN 109726195 A CN109726195 A CN 109726195A
- Authority
- CN
- China
- Prior art keywords
- data
- feature dimensions
- master
- characteristic
- related coefficient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Complex Calculations (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the present invention provides a kind of data enhancement methods and device and obtains characteristic this method comprises: pre-processing to multi-dimensional time sequence data under different labels;Signature analysis is carried out to characteristic, obtains the related coefficient in characteristic between feature dimensions and label, wherein related coefficient is used for the relationship reflected between feature dimensions and label;According to the size of related coefficient, using the feature dimensions of preset quantity as main feature dimensions, remaining feature dimensions is non-master feature dimensions;To non-master intrinsic dimensionality according to fusion treatment or plus noise processing is weighted, enhance data according to generation in conjunction with main intrinsic dimensionality.Data enhancement methods and device provided in an embodiment of the present invention, apply in the data enhancing of multi-dimensional feature data, it can be under the premise of keeping legacy data main feature constant, non-principal feature is weighted to achieve the effect that enhance data, accuracy rate and generalization ability can be improved when handling Small Sample Database collection.
Description
Technical field
The present invention relates to terminal O&M application field more particularly to a kind of data enhancement methods and device.
Background technique
With the promotion of computer storage capacity and the development of complicated algorithm, data volume exponentially type in recent years increases,
The data for coming automatic network, smart phone, sensor, camera and other approach produce huge commercial value.Each large enterprises
The problems such as development trend of industry, the demand of user and hobby are understood using big data analysis changes existing business mould
Formula.Under the background of big data, the source problem that comes of data is often the main problem for hindering research, once lack the number of sufficient amount
According to collection, strong influence can be generated to result of study, in the case where researcher's spontaneous acquisition data, data volume is often difficult to reach
To the desired level of researcher.At this point, data enhancing technology is just particularly important.
Existing data enhancement methods are applied in image domains mostly, primarily to the over-fitting of less network is existing
As converting the stronger network of available generalization ability by carrying out to data, preferably adapting to application scenarios.In image domains
In, existing more common data enhancement methods have following several:
Rotation/reflection transformation: Random-Rotation image certain angle;Change the direction of picture material.
It is turning-over changed: along horizontal or vertical direction flipped image.
Scale transformation: zoom in or out image according to a certain percentage.
Translation transformation: image is translated in a certain way on the image plane.
Change of scale: it to image according to specified scale factor, zooms in or out;Or it extracts and thinks referring to SIFT feature
Think, using specified scale factor to image filtering tectonic scale space, changes the size or fog-level of picture material.
Contrast variation: in the hsv color space of image, change saturation degree S and brightness V component, keep tone H constant.
S and V component to each pixel carry out exponent arithmetic, increase illumination variation.
Noise disturbance: random perturbation is carried out to each pixel RGB of image, common noise is Gaussian noise.
Color change: random perturbation is added in image channel.
Random shearing: random image difference approach is used, image is cut, is scaled.
And for multi-dimensional feature data, existing data enhancement methods are more rare, and have the defects that certain, carrying out
During data enhance, due to having carried out a degree of transformation to data, the feature of data may be destroyed, be made
It obtains neural network to be difficult to extract accurate feature, to reduce the accuracy rate of identification.
Summary of the invention
The embodiment of the present invention is to overcome above-mentioned technological deficiency, provides a kind of data enhancement methods and device.
In a first aspect, the embodiment of the present invention provides a kind of data enhancement methods, comprising:
Multi-dimensional time sequence data under different labels are pre-processed, characteristic is obtained;
Signature analysis is carried out to the characteristic, obtains the phase relation in the characteristic between feature dimensions and label
Number, wherein the related coefficient is used for the relationship reflected between feature dimensions and the label;
According to the size of the related coefficient, using the feature dimensions of preset quantity as main feature dimensions, remaining feature dimensions is non-
Main feature dimensions;
To non-master intrinsic dimensionality according to fusion treatment or plus noise processing is weighted, enhance in conjunction with main intrinsic dimensionality according to generation
Data.
Second aspect, the embodiment of the present invention provide a kind of data enhancement device, comprising:
Preprocessing module obtains characteristic for pre-processing to multi-dimensional time sequence data under different labels;
Characteristics analysis module obtains feature dimensions in the characteristic for carrying out signature analysis to the characteristic
Related coefficient between label, wherein the related coefficient is used for the relationship reflected between feature dimensions and the label;
Processing module, for the size according to the related coefficient, using the feature dimensions of preset quantity as main feature dimensions,
Remaining feature dimensions are non-master feature dimensions;
Data generation module, for being weighted fusion treatment or plus noise processing to non-master intrinsic dimensionality evidence, in conjunction with master
Intrinsic dimensionality enhances data according to generation.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, including memory and processor, the processor and
The memory completes mutual communication by bus;The memory, which is stored with, to be referred to by the program that the processor executes
It enables, the processor calls described program to instruct a kind of data enhancement methods being able to carry out as described in relation to the first aspect.
Fourth aspect, the embodiment of the present invention provide a kind of non-transient computer readable storage medium, are stored thereon with calculating
Machine program realizes a kind of data enhancement methods as described in relation to the first aspect when the computer program is executed by processor.
A kind of data enhancement methods and device provided in an embodiment of the present invention propose a kind of non-master feature dimensions Weighted Fusion
Algorithm applies in the data enhancing of multi-dimensional feature data, can be right under the premise of keeping legacy data main feature constant
Non-principal feature is weighted to achieve the effect that enhance data, can be when handling Small Sample Database collection, and it is accurate to improve
Rate and generalization ability.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of data enhancement methods provided in an embodiment of the present invention;
Fig. 2 is a kind of structural schematic diagram of data enhancement device provided in an embodiment of the present invention;
Fig. 3 is the entity structure schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached in the embodiment of the present invention
Figure, is clearly and completely described the technical solution in the present invention, it is clear that described embodiment is one of the invention
Divide embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making
Every other embodiment obtained, shall fall within the protection scope of the present invention under the premise of creative work.
Fig. 1 is a kind of flow diagram of data enhancement methods provided in an embodiment of the present invention, as shown in Figure 1, comprising:
Step 11, multi-dimensional time sequence data under different labels are pre-processed, obtains characteristic;
Step 12, signature analysis is carried out to the characteristic, obtained in the characteristic between feature dimensions and label
Related coefficient, wherein the related coefficient is used to reflect relationship between feature dimensions and the label;
Step 13, according to the size of the related coefficient, using the feature dimensions of preset quantity as main feature dimensions, remaining feature
Dimension is non-master feature dimensions;
Step 14, to non-master intrinsic dimensionality according to fusion treatment or plus noise processing is weighted, in conjunction with main intrinsic dimensionality evidence
Generate enhancing data.
Time series data refers to dependent on the time and changes, and can reflect the data of its variation degree with numerical value, when ordinal number
According to tool, there are two crucial indexs: monitoring time and monitoring numerical value.Multi-dimensional time sequence data refer to multiple and different under different labels
Time series data under label, the corresponding one group of data of a behavior, the behavior is the label of this group of data, with the driving of certain vehicle
For behavioral data, turn left this state behavior be its label, one group of data under this behavior of turning left, including acceleration with
Angular speed etc. is all the data under the label.
Multi-dimensional time sequence data under different labels are pre-processed first, obtain characteristic, are convenient for subsequent processing.
After obtaining characteristic, according to the relationship in characteristic between feature dimensions and label, the master in feature dimensions is determined
Feature dimensions and non-master feature dimensions calculate each feature specifically, carrying out correlation analysis to each feature dimensions in characteristic
Related coefficient between dimension and label, related coefficient are used for the relationship reflected between feature dimensions and label.With vehicle drive behavior
Six axle sensor data instances, under this state of turning left, acceleration y-axis and angular speed z-axis and the close phase of this behavior of turning left
It closes, changes greatly, therefore the related coefficient between acceleration y-axis and the corresponding feature dimensions of angular speed z-axis and label is larger, and accelerate
It is opposite with the relationship of this behavior of turning left not close to spend x, acceleration z, angular speed x, angular speed y, related coefficient is smaller.
Main feature dimensions and non-master feature dimensions are determined by related coefficient, according to the size of related coefficient, by preset quantity
Feature dimensions as main feature dimensions, remaining feature dimensions is non-master feature dimensions.
It after main feature dimensions and non-master feature dimensions have been determined, needs to enhance data, the method for data enhancing has two
Kind, it is constant that one kind is to maintain main feature dimensions, several non-master feature dimensions data investigations is taken mean value to merge, another kind is to maintain
Main intrinsic dimensionality is according to constant, the plus noise in non-master feature dimensions, the enhancing of Lai Shixian data and extension.It is with the first Enhancement Method
Example, concrete operations are to retain main intrinsic dimensionality according to constant, to non-master intrinsic dimensionality according to being weighted in proportion, then will be after weighting
Non-master intrinsic dimensionality, according to split is carried out, generate enhancing data, which can be used as new data pair according to main intrinsic dimensionality
Machine learning algorithm model is trained and tests.
For example, existing six axle sensors are with the six axle sensors data for reflecting the driving behavior of certain vehicle
Example is acceleration y-axis and angle speed according to the main feature dimensions of the available state of correlation matrix under this state of turning left
Z-axis is spent, the data for then retaining main feature dimensions acceleration y-axis and angular speed z-axis are constant, to non-master feature dimensions acceleration x, accelerate
Degree z is weighted and merges with angular speed x, angular speed y, it may be assumed that
Data_acc=acc_x*w1+acc_z*w2,
Data_gyr=gyr_x*w1+gyr_y*w2,
Wherein w1+w2=1, acc_x represent non-master feature dimensions acceleration x data, and acc_z represents non-master feature dimensions acceleration
Z data, gyr_x represent non-master feature dimensions angular speed x data, and gyr_y represents non-master feature dimensions angular speed y data, and w1 and w2 are
Weighting coefficient, data_acc are the fused data of data weighting of acceleration x, acceleration z, and data_gyr is angular speed x, angle
The fused data of the data weighting of speed y.It, which is carried out split with main feature dimensions acceleration y-axis and angular speed z-axis data, is
Produce enhancing data.After obtaining enhancing data according to the above method, enhancing data can be used for machine learning algorithm model
Training.The embodiment of the present invention will enhance data and be randomly divided into training set, verifying collection and test set, and training set is for training convolutional mind
Through network model, the generalization ability of verifying collection and test set for the convolutional neural networks model after testing training.
With six axle sensor data instances, with convolutional neural networks for basic network frame.Enhancing data are divided at random
For training set, verifying collection and test set, wherein test set accounts for the 10% of total data set, and verifies collection and account for 2.7%, remaining is as instruction
Practice collection.
It is respectively fed to convolutional neural networks training by raw data set and by the enhanced data set of data in the present invention,
Obtained result is as shown in the table:
Table 1: Comparative result
Models | Loss | Acc | Val-Loss | Val-Acc | Precision | Recall | F1-score |
CNN | 1.2118 | 0.9800 | 101816 | 0.9286 | 0.92 | 0.90 | 0.91 |
CNN+DA | 0.3112 | 0.9656 | 0.3619 | 0.9400 | 0.96 | 0.95 | 0.95 |
CNN represents raw data set, and CNN+DA is represented the enhanced data set of data, can be observed to be enhanced using data
Loss when scheme in verifying collection and test set significantly reduces.Data increase also reduces accurate between verifying collection and test set
Spend difference.This means that data enhanced scheme improves the generalization ability of deep neural network in small data set.
A kind of data enhancement methods provided in an embodiment of the present invention propose a kind of non-master feature dimensions Weighted Fusion algorithm,
Apply in the data enhancing of multi-dimensional feature data, it can be under the premise of keeping legacy data main feature constant, to non-master
Want feature to be weighted with achieve the effect that enhance data, can when handle Small Sample Database collection, improve accuracy rate with
Generalization ability.
On the basis of the above embodiments, described that multi-dimensional time sequence data under different labels are pre-processed, it specifically includes:
Data interpolating processing and/or standardization are carried out to multi-dimensional time sequence data under the different labels.
In data mining, there may be a large amount of numbers that are imperfect, inconsistent, having exception, deviation point in initial data
According to.These problems data gently then influence data mining execution efficiency, heavy then influence implementing result.Therefore data prediction work must
Indispensable, the embodiment of the present invention is handled by data interpolating or standardization carries out in advance multi-dimensional time sequence data under different labels
Processing, obtains characteristic.
Several discrete data can be obtained by the methods of such as sampling, testing, according to these data, it is desirable to obtain one
The discrete equation of a continuous function (namely curve) or more crypto set matches with given data, this process is just called quasi-
It closes.The method for obtaining the data of unknown point by being fitted obtained function, is called interpolation.Interpolation processing include polynomial interopolation,
Linear interpolation and Lagrange's interpolation equal length treatment etc., Lagrange's interpolation is a kind of polynomial interopolation method, such as to practice
In some physical quantity be observed, obtain corresponding observation in place several different, Lagrange's interpolation can be with
A multinomial is found, gets the value observed in the point of each observation just.Such multinomial is known as Lagrange and inserts
It is worth multinomial.The data interpolating processing of the embodiment of the present invention has used Lagrange's interpolation equal length treatment technology, e.g., will collect
Different labels under multi-dimensional time sequence data unify interpolation processing to 300 data lengths, so that each intrinsic dimensionality is according to isometric.
Before data analysis, it usually needs first handle data normalization, carry out data using the data after standardization
Analysis.The standardization of the embodiment of the present invention specifically includes: removing average value processing, normalized and whitening processing.
Each dimension all subtracts the mean value of corresponding dimension in the data for going average value processing to refer to, so that each dimension of input data is all
The reason of center turns to 0, carries out average value processing is that data is avoided to be easy fitting, causes data process effects bad.
Normalized includes most value normalization, such as maximum value is normalized to 1, and Returning to one for minimum value turns to -1 or handle
Maximum value is normalized to 1, and Returning to one for minimum value turns to 0, the data that the normalization of most value is suitable for being distributed over originally in limited range.
Another normalized is mean variance normalization, and usually mean normalization at 0, equation is normalized to 1, mean variance
Normalization is suitable for the case where being distributed no obvious boundary.The purpose of normalized is that the scale of each feature is controlled in phase
In same range, convenient for finding optimal solution, the efficiency of data processing is improved.
Whitening processing, which refers to, carries the less dimension of information content by abandoning, retain main characteristic information come to data into
Row dimension-reduction treatment, it is therefore an objective to remove the associated degree between data and variance is enabled to uniform.
Standardization can be removed between feature using going the technologies such as average value processing, normalized and whitening processing
Redundancy improves data-handling efficiency.After processing, characteristic is obtained, according in characteristic between feature dimensions and label
Relationship, the main feature dimensions and non-master feature dimensions in feature dimensions are determined, to non-master intrinsic dimensionality according to Weighted Fusion, in conjunction with main feature
Dimension data generates enhancing data.
A kind of data enhancement methods provided in an embodiment of the present invention propose a kind of non-master feature dimensions Weighted Fusion algorithm,
Apply in the data enhancing of multi-dimensional feature data, it can be under the premise of keeping legacy data main feature constant, to non-master
Want feature to be weighted with achieve the effect that enhance data, can when handle Small Sample Database collection, improve accuracy rate with
Generalization ability.
On the basis of the above embodiments, the phase relation obtained in the characteristic between feature dimensions and label
Number, specifically includes:
According to the corresponding numerical characteristic of the feature dimensions data acquisition;
According to the numerical characteristic and the label, the related coefficient is obtained.
According to the corresponding numerical characteristic of feature dimensions data acquisition, numerical characteristic includes but is not limited to data capacity.With data
Illustrate for energy, for a certain concrete behavior, the data capacity of each feature dimensions is inconsistent, carries out first to characteristic special
In view of characteristic, there are negative values after pretreatment for sign analysis, therefore obtain each characteristic pair after handling characteristic square
The data capacity answered, it may be assumed that
Q=a2,
A is any feature dimension data under the label, and Q is corresponding data capacity.
To data capacity energy with correlation analysis is carried out between label, pass through the DataFrame function in pandas packet
The related coefficient between each feature dimensions and label is calculated, chooses wherein more significantly one or two feature dimensions as main feature
Dimension.The related coefficient between the data capacity and label of feature dimensions is obtained based on the analysis results, further according to the size of related coefficient
Determine that preset quantity main feature dimensions, remaining feature dimensions are non-master feature dimensions, wherein preset quantity is one or more, can
Determines according to actual conditions.
A kind of data enhancement methods provided in an embodiment of the present invention propose a kind of non-master feature dimensions Weighted Fusion algorithm,
Apply in the data enhancing of multi-dimensional feature data, it can be under the premise of keeping legacy data main feature constant, to non-master
Want feature to be weighted with achieve the effect that enhance data, can when handle Small Sample Database collection, improve accuracy rate with
Generalization ability.
On the basis of the above embodiments, described that fusion treatment is weighted to non-master intrinsic dimensionality evidence, it specifically includes:
The main intrinsic dimensionality is according to constant, according to weighted value to the non-master feature dimensions data processing, number after being weighted
According to;
The non-master intrinsic dimensionality evidence is added in data after the weighting, obtains new non-master intrinsic dimensionality evidence.
After main feature dimensions and non-master feature dimensions have been determined, need to carry out data enhancing, the data that the embodiment of the present invention is taken
Enhancement Method is to maintain main intrinsic dimensionality according to constant, several non-master feature dimensions data investigations are taken mean value to merge.Specifically,
First according to weighted value to non-master feature dimensions data processing, the sum of data after being weighted, the weighted value of each non-master feature dimensions is
1, after being weighted after data, by it with original non-master intrinsic dimensionality according to new non-master intrinsic dimensionality evidence is formed together, then
Enhance data according to generation in conjunction with main intrinsic dimensionality evidence and new non-master intrinsic dimensionality.
With six axle sensor data instances of vehicle drive behavior, under this state of turning left, according to correlation matrix
The main feature dimensions of the available state are acceleration y-axis and angular speed z-axis, then retain main feature dimensions acceleration y-axis and angle
The data of speed z-axis are constant, and non-master feature dimensions acceleration x, acceleration z are weighted and are merged with angular speed x, angular speed y,
That is:
Data_acc=acc_x*w1+acc_z*w2,
Data_gyr=gyr_x*w1+gyr_y*w2,
Wherein w1 and w2 is weighted value, and it is fixed that the value of w1 and w2 can arbitrarily take, but requires w1 > 0, w2 > 0 and w1+w2=1,
Acc_x represents non-master feature dimensions acceleration x data, and acc_z represents non-master feature dimensions acceleration z data, and gyr_x represents non-master spy
Sign dimension angular speed x data, gyr_y represent non-master feature dimensions angular speed y data, and data_acc is the number of acceleration x, acceleration z
According to the data after Weighted Fusion, data_gyr is the fused data of data weighting of angular speed x, angular speed y.By data_acc
Original non-master intrinsic dimensionality is added according in acceleration x, acceleration z, angular speed x and angular speed y with data_gyr, is formed new
Non-master intrinsic dimensionality is according to data_acc, data_gyr, acceleration x, acceleration z, angular speed x and angular speed y.
New non-master intrinsic dimensionality can be given birth to according to main feature dimensions acceleration y-axis and the progress split of angular speed z-axis data
At enhancing data.
The data enhancement methods that the embodiment of the present invention is taken, can be with white with the main intrinsic dimensionality evidence of holding in addition to the above method
Constant, the mode of plus noise in non-master feature dimensions, Lai Shixian data enhance and extension, and details are not described herein again.
A kind of data enhancement methods provided in an embodiment of the present invention propose a kind of non-master feature dimensions Weighted Fusion algorithm,
Apply in the data enhancing of multi-dimensional feature data, it can be under the premise of keeping legacy data main feature constant, to non-master
Want feature to be weighted with achieve the effect that enhance data, can when handle Small Sample Database collection, improve accuracy rate with
Generalization ability.
Fig. 2 is a kind of structural schematic diagram of data enhancement device provided in an embodiment of the present invention, as shown in Fig. 2, including pre-
Processing module 21, characteristics analysis module 22, processing module 23 and data generation module 24, in which:
Preprocessing module 21 obtains characteristic for pre-processing to multi-dimensional time sequence data under different labels;
Characteristics analysis module 22 is used to carry out signature analysis to the characteristic, obtains feature dimensions in the characteristic
Related coefficient between label, wherein the related coefficient is used for the relationship reflected between feature dimensions and the label;
Processing module 23 is used for according to the size of the related coefficient, using the feature dimensions of preset quantity as main feature dimensions,
Remaining feature dimensions is non-master feature dimensions;
Data generation module 24 is used for non-master intrinsic dimensionality according to fusion treatment or plus noise processing is weighted, in conjunction with master
Intrinsic dimensionality enhances data according to generation.
Multi-dimensional time sequence data pre-process under 21 pairs of preprocessing module different labels first, obtain characteristic, are convenient for
Subsequent processing.
After obtaining characteristic, according to the relationship in characteristic between feature dimensions and label, the master in feature dimensions is determined
Feature dimensions and non-master feature dimensions, specifically, characteristics analysis module 22 carries out correlation point to each feature dimensions in characteristic
Analysis, calculates the related coefficient between each feature dimensions and label, and related coefficient is used for the relationship reflected between feature dimensions and label.
With six axle sensor data instance of vehicle drive behavior, under this state of turning left, acceleration y-axis and angular speed z-axis and turn left
This behavior is closely related, changes greatly, therefore related between acceleration y-axis and the corresponding feature dimensions of angular speed z-axis and label
Coefficient is larger, and acceleration x, acceleration z, angular speed x, angular speed y be not opposite with the relationship of this behavior of turning left close, related
Coefficient is smaller.
Processing module 23 determines main feature dimensions and non-master feature dimensions by related coefficient, according to the size of related coefficient,
Using the feature dimensions of preset quantity as main feature dimensions, remaining feature dimensions is non-master feature dimensions.
After main feature dimensions and non-master feature dimensions have been determined, data generation module 24 retains main intrinsic dimensionality according to constant, right
Non-master intrinsic dimensionality according to being weighted in proportion, then by the non-master intrinsic dimensionality after weighting according to main intrinsic dimensionality according to spelling
It closes, generates enhancing data, which can be used as new data and machine learning algorithm model is trained and is tested.
Device provided in an embodiment of the present invention is for executing above-mentioned each method embodiment, specific process and in detail Jie
It continues and refers to above-mentioned each method embodiment, details are not described herein again.
A kind of data enhancement device provided in an embodiment of the present invention proposes a kind of non-master feature dimensions Weighted Fusion algorithm,
Apply in the data enhancing of multi-dimensional feature data, it can be under the premise of keeping legacy data main feature constant, to non-master
Want feature to be weighted with achieve the effect that enhance data, can when handle Small Sample Database collection, improve accuracy rate with
Generalization ability.
Fig. 3 is the entity structure schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention, as shown in figure 3, the electronics
Equipment may include: processor (processor) 310, communication interface (Communications Interface) 320, storage
Device (memory) 330 and bus 340, wherein processor 310, communication interface 320, memory 330 complete phase by bus 340
Communication between mutually.Bus 340 can be used for the transmission of the information between electronic equipment and sensor.Processor 310 can be called and be deposited
Logical order in reservoir 330, to execute following method: being pre-processed to multi-dimensional time sequence data under different labels, obtain spy
Levy data;Signature analysis is carried out to the characteristic, obtains the phase relation in the characteristic between feature dimensions and label
Number, wherein the related coefficient is used for the relationship reflected between feature dimensions and the label;According to the size of the related coefficient,
Using the feature dimensions of preset quantity as main feature dimensions, remaining feature dimensions is non-master feature dimensions;To non-master intrinsic dimensionality according to adding
Fusion treatment or plus noise processing are weighed, enhances data according to generation in conjunction with main intrinsic dimensionality
In addition, the logical order in above-mentioned memory 330 can be realized by way of SFU software functional unit and conduct
Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally
Substantially the part of the part that contributes to existing technology or the technical solution can be in other words for the technical solution of invention
The form of software product embodies, which is stored in a storage medium, including some instructions to
So that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation of the present invention
The all or part of the steps of example the method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various
It can store the medium of program code.
The embodiment of the present invention provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage medium
Matter stores computer instruction, which makes computer execute a kind of data enhancement methods provided by above-described embodiment,
For example, multi-dimensional time sequence data under different labels are pre-processed, characteristic is obtained;The characteristic is carried out special
Sign analysis, obtains the related coefficient in the characteristic between feature dimensions and label, wherein the related coefficient is for reflecting
Relationship between feature dimensions and the label;According to the size of the related coefficient, using the feature dimensions of preset quantity as main spy
Sign dimension, remaining feature dimensions are non-master feature dimensions;Fusion treatment or plus noise processing are weighted to non-master intrinsic dimensionality evidence, in conjunction with
Main intrinsic dimensionality enhances data according to generation.
The above description is only a preferred embodiment of the present invention, is not intended to restrict the invention.The technical field of the invention
Technical staff can make various modifications or additions to the described embodiments, but without departing from of the invention
Spirit surmounts the range that the appended claims define.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, and those skilled in the art is it is understood that it still can be right
Technical solution documented by foregoing embodiments is modified or equivalent replacement of some of the technical features;And this
It modifies or replaces, the spirit and model of technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution
It encloses.
Claims (10)
1. a kind of data enhancement methods characterized by comprising
Multi-dimensional time sequence data under different labels are pre-processed, characteristic is obtained;
Signature analysis is carried out to the characteristic, obtains the related coefficient in the characteristic between feature dimensions and label,
Wherein the related coefficient is used for the relationship reflected between feature dimensions and the label;
According to the size of the related coefficient, using the feature dimensions of preset quantity as main feature dimensions, remaining feature dimensions is non-master spy
Sign dimension;
To non-master intrinsic dimensionality according to fusion treatment or plus noise processing is weighted, enhance number according to generation in conjunction with main intrinsic dimensionality
According to.
2. the method according to claim 1, wherein described locate multi-dimensional time sequence data under different labels in advance
Reason, specifically includes:
Data interpolating processing and/or standardization are carried out to multi-dimensional time sequence data under the different labels.
3. according to the method described in claim 2, it is characterized in that, data interpolating processing specifically includes Lagrange's interpolation
Equal length treatment.
4. according to the method described in claim 2, it is characterized in that, the standardization specifically includes: removing average value processing, return
One changes processing and whitening processing.
5. according to the method described in claim 2, it is characterized in that, it is described obtain in the characteristic feature dimensions and label it
Between related coefficient, specifically include:
According to the corresponding numerical characteristic of the feature dimensions data acquisition;
According to the numerical characteristic and the label, the related coefficient is obtained.
6. according to the method described in claim 5, it is characterized in that, the preset quantity is one or more.
7. method according to claim 1-6, which is characterized in that described to be weighted to non-master intrinsic dimensionality evidence
Fusion treatment specifically includes:
The main intrinsic dimensionality is according to constant, according to weighted value to the non-master feature dimensions data processing, data after being weighted;
The non-master intrinsic dimensionality evidence is added in data after the weighting, obtains new non-master intrinsic dimensionality evidence.
8. a kind of data enhancement device characterized by comprising
Preprocessing module obtains characteristic for pre-processing to multi-dimensional time sequence data under different labels;
Characteristics analysis module obtains feature dimensions and mark in the characteristic for carrying out signature analysis to the characteristic
Related coefficient between label, wherein the related coefficient is used for the relationship reflected between feature dimensions and the label;
Processing module, for the size according to the related coefficient, using the feature dimensions of preset quantity as main feature dimensions, remaining is special
Sign dimension is non-master feature dimensions;
Data generation module, for being weighted fusion treatment or plus noise processing to non-master intrinsic dimensionality evidence, in conjunction with main feature
Dimension data generates enhancing data.
9. a kind of electronic equipment, which is characterized in that including memory and processor, the processor and the memory pass through always
Line completes mutual communication;The memory is stored with the program instruction that can be executed by the processor, the processor tune
A kind of data enhancement methods as described in claim 1 to 7 is any are able to carry out with described program instruction.
10. a kind of non-transient computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer
A kind of data enhancement methods as described in any one of claim 1 to 7 are realized when program is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811419516.4A CN109726195B (en) | 2018-11-26 | 2018-11-26 | Data enhancement method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811419516.4A CN109726195B (en) | 2018-11-26 | 2018-11-26 | Data enhancement method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109726195A true CN109726195A (en) | 2019-05-07 |
CN109726195B CN109726195B (en) | 2020-09-11 |
Family
ID=66295149
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811419516.4A Active CN109726195B (en) | 2018-11-26 | 2018-11-26 | Data enhancement method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109726195B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110163174A (en) * | 2019-05-27 | 2019-08-23 | 成都科睿埃科技有限公司 | A kind of living body faces detection method based on monocular cam |
CN110309854A (en) * | 2019-05-21 | 2019-10-08 | 北京邮电大学 | A kind of signal modulation mode recognition methods and device |
CN111579446A (en) * | 2020-05-19 | 2020-08-25 | 中煤科工集团重庆研究院有限公司 | Dust concentration detection method based on optimal fusion algorithm |
CN111638428A (en) * | 2020-06-08 | 2020-09-08 | 国网山东省电力公司电力科学研究院 | GIS-based ultrahigh frequency partial discharge data processing method and system |
CN111738007A (en) * | 2020-07-03 | 2020-10-02 | 北京邮电大学 | Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network |
CN113112410A (en) * | 2020-01-10 | 2021-07-13 | 华为技术有限公司 | Data enhancement method and device, computing equipment, chip and computer storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101159062A (en) * | 2007-08-21 | 2008-04-09 | 北京航空航天大学 | Self-adapting image enhancement method based on related coefficient |
CN105868779A (en) * | 2016-03-28 | 2016-08-17 | 浙江工业大学 | Method for identifying behavior based on feature enhancement and decision fusion |
GB2550716A (en) * | 2014-12-29 | 2017-11-29 | Flir Systems | Sonar data enhancement systems and methods |
CN108319909A (en) * | 2018-01-29 | 2018-07-24 | 清华大学 | A kind of driving behavior analysis method and system |
CN108537100A (en) * | 2017-11-17 | 2018-09-14 | 吉林大学 | A kind of electrocardiosignal personal identification method and system based on PCA and LDA analyses |
CN108717548A (en) * | 2018-04-10 | 2018-10-30 | 中国科学院计算技术研究所 | A kind of increased Activity recognition model update method of facing sensing device dynamic and system |
-
2018
- 2018-11-26 CN CN201811419516.4A patent/CN109726195B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101159062A (en) * | 2007-08-21 | 2008-04-09 | 北京航空航天大学 | Self-adapting image enhancement method based on related coefficient |
GB2550716A (en) * | 2014-12-29 | 2017-11-29 | Flir Systems | Sonar data enhancement systems and methods |
CN105868779A (en) * | 2016-03-28 | 2016-08-17 | 浙江工业大学 | Method for identifying behavior based on feature enhancement and decision fusion |
CN108537100A (en) * | 2017-11-17 | 2018-09-14 | 吉林大学 | A kind of electrocardiosignal personal identification method and system based on PCA and LDA analyses |
CN108319909A (en) * | 2018-01-29 | 2018-07-24 | 清华大学 | A kind of driving behavior analysis method and system |
CN108717548A (en) * | 2018-04-10 | 2018-10-30 | 中国科学院计算技术研究所 | A kind of increased Activity recognition model update method of facing sensing device dynamic and system |
Non-Patent Citations (2)
Title |
---|
CHEN, YING-CONG等: "《Person Re-Identification by Camera Correlation Aware Feature Augmentation》", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGEN》 * |
卫震: "《基于Android平台的跌倒检测算法研究及实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110309854A (en) * | 2019-05-21 | 2019-10-08 | 北京邮电大学 | A kind of signal modulation mode recognition methods and device |
CN110163174A (en) * | 2019-05-27 | 2019-08-23 | 成都科睿埃科技有限公司 | A kind of living body faces detection method based on monocular cam |
CN113112410A (en) * | 2020-01-10 | 2021-07-13 | 华为技术有限公司 | Data enhancement method and device, computing equipment, chip and computer storage medium |
CN111579446A (en) * | 2020-05-19 | 2020-08-25 | 中煤科工集团重庆研究院有限公司 | Dust concentration detection method based on optimal fusion algorithm |
CN111638428A (en) * | 2020-06-08 | 2020-09-08 | 国网山东省电力公司电力科学研究院 | GIS-based ultrahigh frequency partial discharge data processing method and system |
CN111638428B (en) * | 2020-06-08 | 2022-09-20 | 国网山东省电力公司电力科学研究院 | GIS-based ultrahigh frequency partial discharge data processing method and system |
CN111738007A (en) * | 2020-07-03 | 2020-10-02 | 北京邮电大学 | Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network |
Also Published As
Publication number | Publication date |
---|---|
CN109726195B (en) | 2020-09-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109726195A (en) | A kind of data enhancement methods and device | |
CN110378235B (en) | Fuzzy face image recognition method and device and terminal equipment | |
CN109117831B (en) | Training method and device of object detection network | |
CN109840477B (en) | Method and device for recognizing shielded face based on feature transformation | |
CN110287125A (en) | Software routine test method and device based on image recognition | |
CN112926595B (en) | Training device of deep learning neural network model, target detection system and method | |
CN113689436A (en) | Image semantic segmentation method, device, equipment and storage medium | |
CN109165654B (en) | Training method of target positioning model and target positioning method and device | |
CN110738238A (en) | certificate information classification positioning method and device | |
CN111275051A (en) | Character recognition method, character recognition device, computer equipment and computer-readable storage medium | |
CN114266901A (en) | Document contour extraction model construction method, device, equipment and readable storage medium | |
CN116309612B (en) | Semiconductor silicon wafer detection method, device and medium based on frequency decoupling supervision | |
CN115393868B (en) | Text detection method, device, electronic equipment and storage medium | |
CN116452802A (en) | Vehicle loss detection method, device, equipment and storage medium | |
CN116128044A (en) | Model pruning method, image processing method and related devices | |
CN112288748B (en) | Semantic segmentation network training and image semantic segmentation method and device | |
CN116912518B (en) | Image multi-scale feature processing method and device | |
CN117132767B (en) | Small target detection method, device, equipment and readable storage medium | |
CN117576109B (en) | Defect detection method, device, equipment and storage medium | |
CN116912634B (en) | Training method and device for target tracking model | |
CN116912889B (en) | Pedestrian re-identification method and device | |
CN116051860A (en) | Vehicle key point detection method and system based on contrast learning | |
CN116883816A (en) | Data processing method, apparatus, device, readable storage medium, and program product | |
CN118470032A (en) | Training method and device for super-resolution image segmentation model | |
CN116110027A (en) | Traffic sign recognition method, medium and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |