CN109726195A - A kind of data enhancement methods and device - Google Patents

A kind of data enhancement methods and device Download PDF

Info

Publication number
CN109726195A
CN109726195A CN201811419516.4A CN201811419516A CN109726195A CN 109726195 A CN109726195 A CN 109726195A CN 201811419516 A CN201811419516 A CN 201811419516A CN 109726195 A CN109726195 A CN 109726195A
Authority
CN
China
Prior art keywords
data
feature dimensions
master
characteristic
related coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811419516.4A
Other languages
Chinese (zh)
Other versions
CN109726195B (en
Inventor
张勇
郭达
滕颖蕾
魏翼飞
宋梅
李俊杰
马腾滕
郭耀华
鲍捷
康灿平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201811419516.4A priority Critical patent/CN109726195B/en
Publication of CN109726195A publication Critical patent/CN109726195A/en
Application granted granted Critical
Publication of CN109726195B publication Critical patent/CN109726195B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the present invention provides a kind of data enhancement methods and device and obtains characteristic this method comprises: pre-processing to multi-dimensional time sequence data under different labels;Signature analysis is carried out to characteristic, obtains the related coefficient in characteristic between feature dimensions and label, wherein related coefficient is used for the relationship reflected between feature dimensions and label;According to the size of related coefficient, using the feature dimensions of preset quantity as main feature dimensions, remaining feature dimensions is non-master feature dimensions;To non-master intrinsic dimensionality according to fusion treatment or plus noise processing is weighted, enhance data according to generation in conjunction with main intrinsic dimensionality.Data enhancement methods and device provided in an embodiment of the present invention, apply in the data enhancing of multi-dimensional feature data, it can be under the premise of keeping legacy data main feature constant, non-principal feature is weighted to achieve the effect that enhance data, accuracy rate and generalization ability can be improved when handling Small Sample Database collection.

Description

A kind of data enhancement methods and device
Technical field
The present invention relates to terminal O&M application field more particularly to a kind of data enhancement methods and device.
Background technique
With the promotion of computer storage capacity and the development of complicated algorithm, data volume exponentially type in recent years increases, The data for coming automatic network, smart phone, sensor, camera and other approach produce huge commercial value.Each large enterprises The problems such as development trend of industry, the demand of user and hobby are understood using big data analysis changes existing business mould Formula.Under the background of big data, the source problem that comes of data is often the main problem for hindering research, once lack the number of sufficient amount According to collection, strong influence can be generated to result of study, in the case where researcher's spontaneous acquisition data, data volume is often difficult to reach To the desired level of researcher.At this point, data enhancing technology is just particularly important.
Existing data enhancement methods are applied in image domains mostly, primarily to the over-fitting of less network is existing As converting the stronger network of available generalization ability by carrying out to data, preferably adapting to application scenarios.In image domains In, existing more common data enhancement methods have following several:
Rotation/reflection transformation: Random-Rotation image certain angle;Change the direction of picture material.
It is turning-over changed: along horizontal or vertical direction flipped image.
Scale transformation: zoom in or out image according to a certain percentage.
Translation transformation: image is translated in a certain way on the image plane.
Change of scale: it to image according to specified scale factor, zooms in or out;Or it extracts and thinks referring to SIFT feature Think, using specified scale factor to image filtering tectonic scale space, changes the size or fog-level of picture material.
Contrast variation: in the hsv color space of image, change saturation degree S and brightness V component, keep tone H constant. S and V component to each pixel carry out exponent arithmetic, increase illumination variation.
Noise disturbance: random perturbation is carried out to each pixel RGB of image, common noise is Gaussian noise.
Color change: random perturbation is added in image channel.
Random shearing: random image difference approach is used, image is cut, is scaled.
And for multi-dimensional feature data, existing data enhancement methods are more rare, and have the defects that certain, carrying out During data enhance, due to having carried out a degree of transformation to data, the feature of data may be destroyed, be made It obtains neural network to be difficult to extract accurate feature, to reduce the accuracy rate of identification.
Summary of the invention
The embodiment of the present invention is to overcome above-mentioned technological deficiency, provides a kind of data enhancement methods and device.
In a first aspect, the embodiment of the present invention provides a kind of data enhancement methods, comprising:
Multi-dimensional time sequence data under different labels are pre-processed, characteristic is obtained;
Signature analysis is carried out to the characteristic, obtains the phase relation in the characteristic between feature dimensions and label Number, wherein the related coefficient is used for the relationship reflected between feature dimensions and the label;
According to the size of the related coefficient, using the feature dimensions of preset quantity as main feature dimensions, remaining feature dimensions is non- Main feature dimensions;
To non-master intrinsic dimensionality according to fusion treatment or plus noise processing is weighted, enhance in conjunction with main intrinsic dimensionality according to generation Data.
Second aspect, the embodiment of the present invention provide a kind of data enhancement device, comprising:
Preprocessing module obtains characteristic for pre-processing to multi-dimensional time sequence data under different labels;
Characteristics analysis module obtains feature dimensions in the characteristic for carrying out signature analysis to the characteristic Related coefficient between label, wherein the related coefficient is used for the relationship reflected between feature dimensions and the label;
Processing module, for the size according to the related coefficient, using the feature dimensions of preset quantity as main feature dimensions, Remaining feature dimensions are non-master feature dimensions;
Data generation module, for being weighted fusion treatment or plus noise processing to non-master intrinsic dimensionality evidence, in conjunction with master Intrinsic dimensionality enhances data according to generation.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, including memory and processor, the processor and The memory completes mutual communication by bus;The memory, which is stored with, to be referred to by the program that the processor executes It enables, the processor calls described program to instruct a kind of data enhancement methods being able to carry out as described in relation to the first aspect.
Fourth aspect, the embodiment of the present invention provide a kind of non-transient computer readable storage medium, are stored thereon with calculating Machine program realizes a kind of data enhancement methods as described in relation to the first aspect when the computer program is executed by processor.
A kind of data enhancement methods and device provided in an embodiment of the present invention propose a kind of non-master feature dimensions Weighted Fusion Algorithm applies in the data enhancing of multi-dimensional feature data, can be right under the premise of keeping legacy data main feature constant Non-principal feature is weighted to achieve the effect that enhance data, can be when handling Small Sample Database collection, and it is accurate to improve Rate and generalization ability.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of data enhancement methods provided in an embodiment of the present invention;
Fig. 2 is a kind of structural schematic diagram of data enhancement device provided in an embodiment of the present invention;
Fig. 3 is the entity structure schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached in the embodiment of the present invention Figure, is clearly and completely described the technical solution in the present invention, it is clear that described embodiment is one of the invention Divide embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making Every other embodiment obtained, shall fall within the protection scope of the present invention under the premise of creative work.
Fig. 1 is a kind of flow diagram of data enhancement methods provided in an embodiment of the present invention, as shown in Figure 1, comprising:
Step 11, multi-dimensional time sequence data under different labels are pre-processed, obtains characteristic;
Step 12, signature analysis is carried out to the characteristic, obtained in the characteristic between feature dimensions and label Related coefficient, wherein the related coefficient is used to reflect relationship between feature dimensions and the label;
Step 13, according to the size of the related coefficient, using the feature dimensions of preset quantity as main feature dimensions, remaining feature Dimension is non-master feature dimensions;
Step 14, to non-master intrinsic dimensionality according to fusion treatment or plus noise processing is weighted, in conjunction with main intrinsic dimensionality evidence Generate enhancing data.
Time series data refers to dependent on the time and changes, and can reflect the data of its variation degree with numerical value, when ordinal number According to tool, there are two crucial indexs: monitoring time and monitoring numerical value.Multi-dimensional time sequence data refer to multiple and different under different labels Time series data under label, the corresponding one group of data of a behavior, the behavior is the label of this group of data, with the driving of certain vehicle For behavioral data, turn left this state behavior be its label, one group of data under this behavior of turning left, including acceleration with Angular speed etc. is all the data under the label.
Multi-dimensional time sequence data under different labels are pre-processed first, obtain characteristic, are convenient for subsequent processing.
After obtaining characteristic, according to the relationship in characteristic between feature dimensions and label, the master in feature dimensions is determined Feature dimensions and non-master feature dimensions calculate each feature specifically, carrying out correlation analysis to each feature dimensions in characteristic Related coefficient between dimension and label, related coefficient are used for the relationship reflected between feature dimensions and label.With vehicle drive behavior Six axle sensor data instances, under this state of turning left, acceleration y-axis and angular speed z-axis and the close phase of this behavior of turning left It closes, changes greatly, therefore the related coefficient between acceleration y-axis and the corresponding feature dimensions of angular speed z-axis and label is larger, and accelerate It is opposite with the relationship of this behavior of turning left not close to spend x, acceleration z, angular speed x, angular speed y, related coefficient is smaller.
Main feature dimensions and non-master feature dimensions are determined by related coefficient, according to the size of related coefficient, by preset quantity Feature dimensions as main feature dimensions, remaining feature dimensions is non-master feature dimensions.
It after main feature dimensions and non-master feature dimensions have been determined, needs to enhance data, the method for data enhancing has two Kind, it is constant that one kind is to maintain main feature dimensions, several non-master feature dimensions data investigations is taken mean value to merge, another kind is to maintain Main intrinsic dimensionality is according to constant, the plus noise in non-master feature dimensions, the enhancing of Lai Shixian data and extension.It is with the first Enhancement Method Example, concrete operations are to retain main intrinsic dimensionality according to constant, to non-master intrinsic dimensionality according to being weighted in proportion, then will be after weighting Non-master intrinsic dimensionality, according to split is carried out, generate enhancing data, which can be used as new data pair according to main intrinsic dimensionality Machine learning algorithm model is trained and tests.
For example, existing six axle sensors are with the six axle sensors data for reflecting the driving behavior of certain vehicle Example is acceleration y-axis and angle speed according to the main feature dimensions of the available state of correlation matrix under this state of turning left Z-axis is spent, the data for then retaining main feature dimensions acceleration y-axis and angular speed z-axis are constant, to non-master feature dimensions acceleration x, accelerate Degree z is weighted and merges with angular speed x, angular speed y, it may be assumed that
Data_acc=acc_x*w1+acc_z*w2,
Data_gyr=gyr_x*w1+gyr_y*w2,
Wherein w1+w2=1, acc_x represent non-master feature dimensions acceleration x data, and acc_z represents non-master feature dimensions acceleration Z data, gyr_x represent non-master feature dimensions angular speed x data, and gyr_y represents non-master feature dimensions angular speed y data, and w1 and w2 are Weighting coefficient, data_acc are the fused data of data weighting of acceleration x, acceleration z, and data_gyr is angular speed x, angle The fused data of the data weighting of speed y.It, which is carried out split with main feature dimensions acceleration y-axis and angular speed z-axis data, is Produce enhancing data.After obtaining enhancing data according to the above method, enhancing data can be used for machine learning algorithm model Training.The embodiment of the present invention will enhance data and be randomly divided into training set, verifying collection and test set, and training set is for training convolutional mind Through network model, the generalization ability of verifying collection and test set for the convolutional neural networks model after testing training.
With six axle sensor data instances, with convolutional neural networks for basic network frame.Enhancing data are divided at random For training set, verifying collection and test set, wherein test set accounts for the 10% of total data set, and verifies collection and account for 2.7%, remaining is as instruction Practice collection.
It is respectively fed to convolutional neural networks training by raw data set and by the enhanced data set of data in the present invention, Obtained result is as shown in the table:
Table 1: Comparative result
Models Loss Acc Val-Loss Val-Acc Precision Recall F1-score
CNN 1.2118 0.9800 101816 0.9286 0.92 0.90 0.91
CNN+DA 0.3112 0.9656 0.3619 0.9400 0.96 0.95 0.95
CNN represents raw data set, and CNN+DA is represented the enhanced data set of data, can be observed to be enhanced using data Loss when scheme in verifying collection and test set significantly reduces.Data increase also reduces accurate between verifying collection and test set Spend difference.This means that data enhanced scheme improves the generalization ability of deep neural network in small data set.
A kind of data enhancement methods provided in an embodiment of the present invention propose a kind of non-master feature dimensions Weighted Fusion algorithm, Apply in the data enhancing of multi-dimensional feature data, it can be under the premise of keeping legacy data main feature constant, to non-master Want feature to be weighted with achieve the effect that enhance data, can when handle Small Sample Database collection, improve accuracy rate with Generalization ability.
On the basis of the above embodiments, described that multi-dimensional time sequence data under different labels are pre-processed, it specifically includes:
Data interpolating processing and/or standardization are carried out to multi-dimensional time sequence data under the different labels.
In data mining, there may be a large amount of numbers that are imperfect, inconsistent, having exception, deviation point in initial data According to.These problems data gently then influence data mining execution efficiency, heavy then influence implementing result.Therefore data prediction work must Indispensable, the embodiment of the present invention is handled by data interpolating or standardization carries out in advance multi-dimensional time sequence data under different labels Processing, obtains characteristic.
Several discrete data can be obtained by the methods of such as sampling, testing, according to these data, it is desirable to obtain one The discrete equation of a continuous function (namely curve) or more crypto set matches with given data, this process is just called quasi- It closes.The method for obtaining the data of unknown point by being fitted obtained function, is called interpolation.Interpolation processing include polynomial interopolation, Linear interpolation and Lagrange's interpolation equal length treatment etc., Lagrange's interpolation is a kind of polynomial interopolation method, such as to practice In some physical quantity be observed, obtain corresponding observation in place several different, Lagrange's interpolation can be with A multinomial is found, gets the value observed in the point of each observation just.Such multinomial is known as Lagrange and inserts It is worth multinomial.The data interpolating processing of the embodiment of the present invention has used Lagrange's interpolation equal length treatment technology, e.g., will collect Different labels under multi-dimensional time sequence data unify interpolation processing to 300 data lengths, so that each intrinsic dimensionality is according to isometric.
Before data analysis, it usually needs first handle data normalization, carry out data using the data after standardization Analysis.The standardization of the embodiment of the present invention specifically includes: removing average value processing, normalized and whitening processing.
Each dimension all subtracts the mean value of corresponding dimension in the data for going average value processing to refer to, so that each dimension of input data is all The reason of center turns to 0, carries out average value processing is that data is avoided to be easy fitting, causes data process effects bad.
Normalized includes most value normalization, such as maximum value is normalized to 1, and Returning to one for minimum value turns to -1 or handle Maximum value is normalized to 1, and Returning to one for minimum value turns to 0, the data that the normalization of most value is suitable for being distributed over originally in limited range. Another normalized is mean variance normalization, and usually mean normalization at 0, equation is normalized to 1, mean variance Normalization is suitable for the case where being distributed no obvious boundary.The purpose of normalized is that the scale of each feature is controlled in phase In same range, convenient for finding optimal solution, the efficiency of data processing is improved.
Whitening processing, which refers to, carries the less dimension of information content by abandoning, retain main characteristic information come to data into Row dimension-reduction treatment, it is therefore an objective to remove the associated degree between data and variance is enabled to uniform.
Standardization can be removed between feature using going the technologies such as average value processing, normalized and whitening processing Redundancy improves data-handling efficiency.After processing, characteristic is obtained, according in characteristic between feature dimensions and label Relationship, the main feature dimensions and non-master feature dimensions in feature dimensions are determined, to non-master intrinsic dimensionality according to Weighted Fusion, in conjunction with main feature Dimension data generates enhancing data.
A kind of data enhancement methods provided in an embodiment of the present invention propose a kind of non-master feature dimensions Weighted Fusion algorithm, Apply in the data enhancing of multi-dimensional feature data, it can be under the premise of keeping legacy data main feature constant, to non-master Want feature to be weighted with achieve the effect that enhance data, can when handle Small Sample Database collection, improve accuracy rate with Generalization ability.
On the basis of the above embodiments, the phase relation obtained in the characteristic between feature dimensions and label Number, specifically includes:
According to the corresponding numerical characteristic of the feature dimensions data acquisition;
According to the numerical characteristic and the label, the related coefficient is obtained.
According to the corresponding numerical characteristic of feature dimensions data acquisition, numerical characteristic includes but is not limited to data capacity.With data Illustrate for energy, for a certain concrete behavior, the data capacity of each feature dimensions is inconsistent, carries out first to characteristic special In view of characteristic, there are negative values after pretreatment for sign analysis, therefore obtain each characteristic pair after handling characteristic square The data capacity answered, it may be assumed that
Q=a2,
A is any feature dimension data under the label, and Q is corresponding data capacity.
To data capacity energy with correlation analysis is carried out between label, pass through the DataFrame function in pandas packet The related coefficient between each feature dimensions and label is calculated, chooses wherein more significantly one or two feature dimensions as main feature Dimension.The related coefficient between the data capacity and label of feature dimensions is obtained based on the analysis results, further according to the size of related coefficient Determine that preset quantity main feature dimensions, remaining feature dimensions are non-master feature dimensions, wherein preset quantity is one or more, can Determines according to actual conditions.
A kind of data enhancement methods provided in an embodiment of the present invention propose a kind of non-master feature dimensions Weighted Fusion algorithm, Apply in the data enhancing of multi-dimensional feature data, it can be under the premise of keeping legacy data main feature constant, to non-master Want feature to be weighted with achieve the effect that enhance data, can when handle Small Sample Database collection, improve accuracy rate with Generalization ability.
On the basis of the above embodiments, described that fusion treatment is weighted to non-master intrinsic dimensionality evidence, it specifically includes:
The main intrinsic dimensionality is according to constant, according to weighted value to the non-master feature dimensions data processing, number after being weighted According to;
The non-master intrinsic dimensionality evidence is added in data after the weighting, obtains new non-master intrinsic dimensionality evidence.
After main feature dimensions and non-master feature dimensions have been determined, need to carry out data enhancing, the data that the embodiment of the present invention is taken Enhancement Method is to maintain main intrinsic dimensionality according to constant, several non-master feature dimensions data investigations are taken mean value to merge.Specifically, First according to weighted value to non-master feature dimensions data processing, the sum of data after being weighted, the weighted value of each non-master feature dimensions is 1, after being weighted after data, by it with original non-master intrinsic dimensionality according to new non-master intrinsic dimensionality evidence is formed together, then Enhance data according to generation in conjunction with main intrinsic dimensionality evidence and new non-master intrinsic dimensionality.
With six axle sensor data instances of vehicle drive behavior, under this state of turning left, according to correlation matrix The main feature dimensions of the available state are acceleration y-axis and angular speed z-axis, then retain main feature dimensions acceleration y-axis and angle The data of speed z-axis are constant, and non-master feature dimensions acceleration x, acceleration z are weighted and are merged with angular speed x, angular speed y, That is:
Data_acc=acc_x*w1+acc_z*w2,
Data_gyr=gyr_x*w1+gyr_y*w2,
Wherein w1 and w2 is weighted value, and it is fixed that the value of w1 and w2 can arbitrarily take, but requires w1 > 0, w2 > 0 and w1+w2=1, Acc_x represents non-master feature dimensions acceleration x data, and acc_z represents non-master feature dimensions acceleration z data, and gyr_x represents non-master spy Sign dimension angular speed x data, gyr_y represent non-master feature dimensions angular speed y data, and data_acc is the number of acceleration x, acceleration z According to the data after Weighted Fusion, data_gyr is the fused data of data weighting of angular speed x, angular speed y.By data_acc Original non-master intrinsic dimensionality is added according in acceleration x, acceleration z, angular speed x and angular speed y with data_gyr, is formed new Non-master intrinsic dimensionality is according to data_acc, data_gyr, acceleration x, acceleration z, angular speed x and angular speed y.
New non-master intrinsic dimensionality can be given birth to according to main feature dimensions acceleration y-axis and the progress split of angular speed z-axis data At enhancing data.
The data enhancement methods that the embodiment of the present invention is taken, can be with white with the main intrinsic dimensionality evidence of holding in addition to the above method Constant, the mode of plus noise in non-master feature dimensions, Lai Shixian data enhance and extension, and details are not described herein again.
A kind of data enhancement methods provided in an embodiment of the present invention propose a kind of non-master feature dimensions Weighted Fusion algorithm, Apply in the data enhancing of multi-dimensional feature data, it can be under the premise of keeping legacy data main feature constant, to non-master Want feature to be weighted with achieve the effect that enhance data, can when handle Small Sample Database collection, improve accuracy rate with Generalization ability.
Fig. 2 is a kind of structural schematic diagram of data enhancement device provided in an embodiment of the present invention, as shown in Fig. 2, including pre- Processing module 21, characteristics analysis module 22, processing module 23 and data generation module 24, in which:
Preprocessing module 21 obtains characteristic for pre-processing to multi-dimensional time sequence data under different labels;
Characteristics analysis module 22 is used to carry out signature analysis to the characteristic, obtains feature dimensions in the characteristic Related coefficient between label, wherein the related coefficient is used for the relationship reflected between feature dimensions and the label;
Processing module 23 is used for according to the size of the related coefficient, using the feature dimensions of preset quantity as main feature dimensions, Remaining feature dimensions is non-master feature dimensions;
Data generation module 24 is used for non-master intrinsic dimensionality according to fusion treatment or plus noise processing is weighted, in conjunction with master Intrinsic dimensionality enhances data according to generation.
Multi-dimensional time sequence data pre-process under 21 pairs of preprocessing module different labels first, obtain characteristic, are convenient for Subsequent processing.
After obtaining characteristic, according to the relationship in characteristic between feature dimensions and label, the master in feature dimensions is determined Feature dimensions and non-master feature dimensions, specifically, characteristics analysis module 22 carries out correlation point to each feature dimensions in characteristic Analysis, calculates the related coefficient between each feature dimensions and label, and related coefficient is used for the relationship reflected between feature dimensions and label. With six axle sensor data instance of vehicle drive behavior, under this state of turning left, acceleration y-axis and angular speed z-axis and turn left This behavior is closely related, changes greatly, therefore related between acceleration y-axis and the corresponding feature dimensions of angular speed z-axis and label Coefficient is larger, and acceleration x, acceleration z, angular speed x, angular speed y be not opposite with the relationship of this behavior of turning left close, related Coefficient is smaller.
Processing module 23 determines main feature dimensions and non-master feature dimensions by related coefficient, according to the size of related coefficient, Using the feature dimensions of preset quantity as main feature dimensions, remaining feature dimensions is non-master feature dimensions.
After main feature dimensions and non-master feature dimensions have been determined, data generation module 24 retains main intrinsic dimensionality according to constant, right Non-master intrinsic dimensionality according to being weighted in proportion, then by the non-master intrinsic dimensionality after weighting according to main intrinsic dimensionality according to spelling It closes, generates enhancing data, which can be used as new data and machine learning algorithm model is trained and is tested.
Device provided in an embodiment of the present invention is for executing above-mentioned each method embodiment, specific process and in detail Jie It continues and refers to above-mentioned each method embodiment, details are not described herein again.
A kind of data enhancement device provided in an embodiment of the present invention proposes a kind of non-master feature dimensions Weighted Fusion algorithm, Apply in the data enhancing of multi-dimensional feature data, it can be under the premise of keeping legacy data main feature constant, to non-master Want feature to be weighted with achieve the effect that enhance data, can when handle Small Sample Database collection, improve accuracy rate with Generalization ability.
Fig. 3 is the entity structure schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention, as shown in figure 3, the electronics Equipment may include: processor (processor) 310, communication interface (Communications Interface) 320, storage Device (memory) 330 and bus 340, wherein processor 310, communication interface 320, memory 330 complete phase by bus 340 Communication between mutually.Bus 340 can be used for the transmission of the information between electronic equipment and sensor.Processor 310 can be called and be deposited Logical order in reservoir 330, to execute following method: being pre-processed to multi-dimensional time sequence data under different labels, obtain spy Levy data;Signature analysis is carried out to the characteristic, obtains the phase relation in the characteristic between feature dimensions and label Number, wherein the related coefficient is used for the relationship reflected between feature dimensions and the label;According to the size of the related coefficient, Using the feature dimensions of preset quantity as main feature dimensions, remaining feature dimensions is non-master feature dimensions;To non-master intrinsic dimensionality according to adding Fusion treatment or plus noise processing are weighed, enhances data according to generation in conjunction with main intrinsic dimensionality
In addition, the logical order in above-mentioned memory 330 can be realized by way of SFU software functional unit and conduct Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally Substantially the part of the part that contributes to existing technology or the technical solution can be in other words for the technical solution of invention The form of software product embodies, which is stored in a storage medium, including some instructions to So that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation of the present invention The all or part of the steps of example the method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various It can store the medium of program code.
The embodiment of the present invention provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage medium Matter stores computer instruction, which makes computer execute a kind of data enhancement methods provided by above-described embodiment, For example, multi-dimensional time sequence data under different labels are pre-processed, characteristic is obtained;The characteristic is carried out special Sign analysis, obtains the related coefficient in the characteristic between feature dimensions and label, wherein the related coefficient is for reflecting Relationship between feature dimensions and the label;According to the size of the related coefficient, using the feature dimensions of preset quantity as main spy Sign dimension, remaining feature dimensions are non-master feature dimensions;Fusion treatment or plus noise processing are weighted to non-master intrinsic dimensionality evidence, in conjunction with Main intrinsic dimensionality enhances data according to generation.
The above description is only a preferred embodiment of the present invention, is not intended to restrict the invention.The technical field of the invention Technical staff can make various modifications or additions to the described embodiments, but without departing from of the invention Spirit surmounts the range that the appended claims define.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, and those skilled in the art is it is understood that it still can be right Technical solution documented by foregoing embodiments is modified or equivalent replacement of some of the technical features;And this It modifies or replaces, the spirit and model of technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution It encloses.

Claims (10)

1. a kind of data enhancement methods characterized by comprising
Multi-dimensional time sequence data under different labels are pre-processed, characteristic is obtained;
Signature analysis is carried out to the characteristic, obtains the related coefficient in the characteristic between feature dimensions and label, Wherein the related coefficient is used for the relationship reflected between feature dimensions and the label;
According to the size of the related coefficient, using the feature dimensions of preset quantity as main feature dimensions, remaining feature dimensions is non-master spy Sign dimension;
To non-master intrinsic dimensionality according to fusion treatment or plus noise processing is weighted, enhance number according to generation in conjunction with main intrinsic dimensionality According to.
2. the method according to claim 1, wherein described locate multi-dimensional time sequence data under different labels in advance Reason, specifically includes:
Data interpolating processing and/or standardization are carried out to multi-dimensional time sequence data under the different labels.
3. according to the method described in claim 2, it is characterized in that, data interpolating processing specifically includes Lagrange's interpolation Equal length treatment.
4. according to the method described in claim 2, it is characterized in that, the standardization specifically includes: removing average value processing, return One changes processing and whitening processing.
5. according to the method described in claim 2, it is characterized in that, it is described obtain in the characteristic feature dimensions and label it Between related coefficient, specifically include:
According to the corresponding numerical characteristic of the feature dimensions data acquisition;
According to the numerical characteristic and the label, the related coefficient is obtained.
6. according to the method described in claim 5, it is characterized in that, the preset quantity is one or more.
7. method according to claim 1-6, which is characterized in that described to be weighted to non-master intrinsic dimensionality evidence Fusion treatment specifically includes:
The main intrinsic dimensionality is according to constant, according to weighted value to the non-master feature dimensions data processing, data after being weighted;
The non-master intrinsic dimensionality evidence is added in data after the weighting, obtains new non-master intrinsic dimensionality evidence.
8. a kind of data enhancement device characterized by comprising
Preprocessing module obtains characteristic for pre-processing to multi-dimensional time sequence data under different labels;
Characteristics analysis module obtains feature dimensions and mark in the characteristic for carrying out signature analysis to the characteristic Related coefficient between label, wherein the related coefficient is used for the relationship reflected between feature dimensions and the label;
Processing module, for the size according to the related coefficient, using the feature dimensions of preset quantity as main feature dimensions, remaining is special Sign dimension is non-master feature dimensions;
Data generation module, for being weighted fusion treatment or plus noise processing to non-master intrinsic dimensionality evidence, in conjunction with main feature Dimension data generates enhancing data.
9. a kind of electronic equipment, which is characterized in that including memory and processor, the processor and the memory pass through always Line completes mutual communication;The memory is stored with the program instruction that can be executed by the processor, the processor tune A kind of data enhancement methods as described in claim 1 to 7 is any are able to carry out with described program instruction.
10. a kind of non-transient computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer A kind of data enhancement methods as described in any one of claim 1 to 7 are realized when program is executed by processor.
CN201811419516.4A 2018-11-26 2018-11-26 Data enhancement method and device Active CN109726195B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811419516.4A CN109726195B (en) 2018-11-26 2018-11-26 Data enhancement method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811419516.4A CN109726195B (en) 2018-11-26 2018-11-26 Data enhancement method and device

Publications (2)

Publication Number Publication Date
CN109726195A true CN109726195A (en) 2019-05-07
CN109726195B CN109726195B (en) 2020-09-11

Family

ID=66295149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811419516.4A Active CN109726195B (en) 2018-11-26 2018-11-26 Data enhancement method and device

Country Status (1)

Country Link
CN (1) CN109726195B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163174A (en) * 2019-05-27 2019-08-23 成都科睿埃科技有限公司 A kind of living body faces detection method based on monocular cam
CN110309854A (en) * 2019-05-21 2019-10-08 北京邮电大学 A kind of signal modulation mode recognition methods and device
CN111579446A (en) * 2020-05-19 2020-08-25 中煤科工集团重庆研究院有限公司 Dust concentration detection method based on optimal fusion algorithm
CN111638428A (en) * 2020-06-08 2020-09-08 国网山东省电力公司电力科学研究院 GIS-based ultrahigh frequency partial discharge data processing method and system
CN111738007A (en) * 2020-07-03 2020-10-02 北京邮电大学 Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network
CN113112410A (en) * 2020-01-10 2021-07-13 华为技术有限公司 Data enhancement method and device, computing equipment, chip and computer storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159062A (en) * 2007-08-21 2008-04-09 北京航空航天大学 Self-adapting image enhancement method based on related coefficient
CN105868779A (en) * 2016-03-28 2016-08-17 浙江工业大学 Method for identifying behavior based on feature enhancement and decision fusion
GB2550716A (en) * 2014-12-29 2017-11-29 Flir Systems Sonar data enhancement systems and methods
CN108319909A (en) * 2018-01-29 2018-07-24 清华大学 A kind of driving behavior analysis method and system
CN108537100A (en) * 2017-11-17 2018-09-14 吉林大学 A kind of electrocardiosignal personal identification method and system based on PCA and LDA analyses
CN108717548A (en) * 2018-04-10 2018-10-30 中国科学院计算技术研究所 A kind of increased Activity recognition model update method of facing sensing device dynamic and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159062A (en) * 2007-08-21 2008-04-09 北京航空航天大学 Self-adapting image enhancement method based on related coefficient
GB2550716A (en) * 2014-12-29 2017-11-29 Flir Systems Sonar data enhancement systems and methods
CN105868779A (en) * 2016-03-28 2016-08-17 浙江工业大学 Method for identifying behavior based on feature enhancement and decision fusion
CN108537100A (en) * 2017-11-17 2018-09-14 吉林大学 A kind of electrocardiosignal personal identification method and system based on PCA and LDA analyses
CN108319909A (en) * 2018-01-29 2018-07-24 清华大学 A kind of driving behavior analysis method and system
CN108717548A (en) * 2018-04-10 2018-10-30 中国科学院计算技术研究所 A kind of increased Activity recognition model update method of facing sensing device dynamic and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEN, YING-CONG等: "《Person Re-Identification by Camera Correlation Aware Feature Augmentation》", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGEN》 *
卫震: "《基于Android平台的跌倒检测算法研究及实现》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309854A (en) * 2019-05-21 2019-10-08 北京邮电大学 A kind of signal modulation mode recognition methods and device
CN110163174A (en) * 2019-05-27 2019-08-23 成都科睿埃科技有限公司 A kind of living body faces detection method based on monocular cam
CN113112410A (en) * 2020-01-10 2021-07-13 华为技术有限公司 Data enhancement method and device, computing equipment, chip and computer storage medium
CN111579446A (en) * 2020-05-19 2020-08-25 中煤科工集团重庆研究院有限公司 Dust concentration detection method based on optimal fusion algorithm
CN111638428A (en) * 2020-06-08 2020-09-08 国网山东省电力公司电力科学研究院 GIS-based ultrahigh frequency partial discharge data processing method and system
CN111638428B (en) * 2020-06-08 2022-09-20 国网山东省电力公司电力科学研究院 GIS-based ultrahigh frequency partial discharge data processing method and system
CN111738007A (en) * 2020-07-03 2020-10-02 北京邮电大学 Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network

Also Published As

Publication number Publication date
CN109726195B (en) 2020-09-11

Similar Documents

Publication Publication Date Title
CN109726195A (en) A kind of data enhancement methods and device
CN110378235B (en) Fuzzy face image recognition method and device and terminal equipment
CN109117831B (en) Training method and device of object detection network
CN109840477B (en) Method and device for recognizing shielded face based on feature transformation
CN110287125A (en) Software routine test method and device based on image recognition
CN112926595B (en) Training device of deep learning neural network model, target detection system and method
CN113689436A (en) Image semantic segmentation method, device, equipment and storage medium
CN109165654B (en) Training method of target positioning model and target positioning method and device
CN110738238A (en) certificate information classification positioning method and device
CN111275051A (en) Character recognition method, character recognition device, computer equipment and computer-readable storage medium
CN114266901A (en) Document contour extraction model construction method, device, equipment and readable storage medium
CN116309612B (en) Semiconductor silicon wafer detection method, device and medium based on frequency decoupling supervision
CN115393868B (en) Text detection method, device, electronic equipment and storage medium
CN116452802A (en) Vehicle loss detection method, device, equipment and storage medium
CN116128044A (en) Model pruning method, image processing method and related devices
CN112288748B (en) Semantic segmentation network training and image semantic segmentation method and device
CN116912518B (en) Image multi-scale feature processing method and device
CN117132767B (en) Small target detection method, device, equipment and readable storage medium
CN117576109B (en) Defect detection method, device, equipment and storage medium
CN116912634B (en) Training method and device for target tracking model
CN116912889B (en) Pedestrian re-identification method and device
CN116051860A (en) Vehicle key point detection method and system based on contrast learning
CN116883816A (en) Data processing method, apparatus, device, readable storage medium, and program product
CN118470032A (en) Training method and device for super-resolution image segmentation model
CN116110027A (en) Traffic sign recognition method, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant