CN107644102A - Data characteristics building method and device, storage medium, electronic equipment - Google Patents
Data characteristics building method and device, storage medium, electronic equipment Download PDFInfo
- Publication number
- CN107644102A CN107644102A CN201710954269.7A CN201710954269A CN107644102A CN 107644102 A CN107644102 A CN 107644102A CN 201710954269 A CN201710954269 A CN 201710954269A CN 107644102 A CN107644102 A CN 107644102A
- Authority
- CN
- China
- Prior art keywords
- feature
- data
- data characteristics
- initial data
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The disclosure is directed to a kind of data characteristics building method and device, belong to technical field of data processing, this method includes:Obtain multiple initial data features and each initial data feature is cleaned to obtain multiple first object data characteristicses;Each first object data characteristics is merged to obtain multiple second target data features.This method obtains multiple second target data features by the way that each first object data characteristics is merged, the hiding information in each data characteristics can be excavated, so as to further be excavated from hiding information to the association each data characteristics, the model that can be trained and predict by the data characteristics has higher accuracy.
Description
Technical field
This disclosure relates to technical field of data processing, in particular to a kind of data characteristics building method, data characteristics
Constructing apparatus, computer-readable recording medium and electronic equipment.
Background technology
, it is necessary to which extracting the more sparse data characteristics of some users carries out feature mining in Data Mining Project.It is sparse
Feature can for example include the feature of window, nearly 1-5 days of user browse plus feature of purchase and search etc..
But, it can be seen that the characteristic value of above-mentioned data characteristics is largely zero after data statistics.In general, number
Need to include three key elements according to the eigenvalue of feature:Informative (rich in information content), Discriminative
(there is distinction) and Independent (independent);And when data characteristics is excessively sparse, the spy of data characteristics
Value indicative cannot be satisfied with above three key element.
In general, it is only having chosen, have strong discrimination feature could make it that the output effect of model is preferable, and with
The above-mentioned sparse features of the removed users of meaning can lose many hiding deeper important informations again, and the result to data mining is brought
Larger error.
Accordingly, it is desirable to provide a kind of new data characteristics building method and device.
It should be noted that information is only used for strengthening the reason to the background of the disclosure disclosed in above-mentioned background section
Solution, therefore can include not forming the information to prior art known to persons of ordinary skill in the art.
The content of the invention
The purpose of the disclosure is to provide a kind of data characteristics building method, data characteristics constructing apparatus, computer-readable
Storage medium and electronic equipment, and then at least overcome to a certain extent caused by the limitation of correlation technique and defect
One or more problem.
According to an aspect of this disclosure, there is provided a kind of data characteristics building method, including:
Obtain multiple initial data features and each initial data feature is cleaned to obtain multiple first object numbers
According to feature;
Each first object data characteristics is merged to obtain multiple second target data features.
In a kind of exemplary embodiment of the disclosure, each initial data feature is cleaned to obtain multiple first
Target data feature includes:
Off-note in each initial data feature is rejected to obtain multiple first object data characteristicses.
In a kind of exemplary embodiment of the disclosure, the characteristic value in each initial data feature is picked
Except including:
Judge whether the characteristic value of each initial data feature is less than the first preset value;
When the characteristic value for judging each initial data feature is less than first preset value, it is corresponding to reject this feature value
Initial data feature;And
When the characteristic value for judging each initial data feature is not less than first preset value, judge each described original
Whether the characteristic value of data characteristics is more than the second preset value;
When the characteristic value for judging each initial data feature is more than second preset value, it is corresponding to reject this feature value
Initial data feature.
In a kind of exemplary embodiment of the disclosure, each first object data characteristics is merged to obtain multiple
Second target data feature includes:
Build multiple characteristic indexs;
Calculate each characteristic index of each first object data characteristics;
Each characteristic index of each first object data characteristics is merged to obtain multiple second targets
Data characteristics.
In a kind of exemplary embodiment of the disclosure, the characteristic index include average, median, upper quartile,
It is multiple in standard deviation and maximum.
In a kind of exemplary embodiment of the disclosure, the initial data feature includes purchase feature, browses feature, adds
Purchase multiple in feature, search characteristics and certificate feature.
According to an aspect of this disclosure, there is provided a kind of data characteristics constructing apparatus, including:
Data characteristics cleaning module, for obtaining multiple initial data features and being carried out to each initial data feature clear
Wash to obtain multiple first object data characteristicses;
Data characteristics Fusion Module, for being merged to obtain multiple second targets each first object data characteristics
Data characteristics.
In a kind of exemplary embodiment of the disclosure, each initial data feature is cleaned to obtain multiple first
Target data feature includes:
Off-note in each initial data feature is rejected to obtain multiple first object data characteristicses.
In a kind of exemplary embodiment of the disclosure, the characteristic value in each initial data feature is picked
Except including:
Judge whether the characteristic value of each initial data feature is less than the first preset value;
When the characteristic value for judging each initial data feature is less than first preset value, it is corresponding to reject this feature value
Initial data feature;And
When the characteristic value for judging each initial data feature is not less than first preset value, judge each described original
Whether the characteristic value of data characteristics is more than the second preset value;
When the characteristic value for judging each initial data feature is more than second preset value, it is corresponding to reject this feature value
Initial data feature.
In a kind of exemplary embodiment of the disclosure, each first object data characteristics is merged to obtain multiple
Second target data feature includes:
Build multiple characteristic indexs;
Calculate each characteristic index of each first object data characteristics;
Each characteristic index of each first object data characteristics is merged to obtain multiple second targets
Data characteristics.
In a kind of exemplary embodiment of the disclosure, the characteristic index include average, median, upper quartile,
It is multiple in standard deviation and maximum.
In a kind of exemplary embodiment of the disclosure, the initial data feature includes purchase feature, browses feature, adds
Purchase multiple in feature, search characteristics and certificate feature.
According to an aspect of this disclosure, there is provided a kind of computer-readable recording medium, computer program is stored thereon with,
The computer program realizes the data characteristics building method described in above-mentioned any one when being executed by processor.
According to an aspect of this disclosure, there is provided a kind of electronic equipment, including:
Processor;And
Memory, for storing the executable instruction of the processor;
Wherein, the processor is configured to perform the number described in above-mentioned any one via the executable instruction is performed
According to latent structure method.
A kind of data characteristics building method of the disclosure and device, by obtaining multiple initial data features and to each original number
Cleaned to obtain multiple first object data characteristicses according to feature;Then each first object data characteristics is merged to obtain more
Individual second target data feature;On the one hand, by being cleaned to each initial data feature, initial data spy can be washed
Abnormal data feature in sign so that the data characteristics of construction can include Feature Selection each key element (it is rich in information content,
It is with distinguishing property and with independence), improve the accuracy of data characteristics;On the other hand, by by each first
Target data feature is merged to obtain multiple second target data features, can excavate the hiding letter in each data characteristics
Breath, so as to further be excavated from hiding information to the association each data characteristics, can cause by the data
The model that feature is trained and predicted has higher accuracy.
It should be appreciated that the general description and following detailed description of the above are only exemplary and explanatory, not
The disclosure can be limited.
Brief description of the drawings
Accompanying drawing herein is merged in specification and forms the part of this specification, shows the implementation for meeting the disclosure
Example, and be used to together with specification to explain the principle of the disclosure.It should be evident that drawings in the following description are only the disclosure
Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis
These accompanying drawings obtain other accompanying drawings.
Fig. 1 schematically shows a kind of flow chart of data characteristics building method.
Fig. 2 schematically shows the method flow diagram that the characteristic value in a kind of feature to initial data is rejected.
Fig. 3 schematically shows a kind of box-shaped diagram illustration.
Fig. 4 schematically shows a kind of method flow diagram merged to first object data.
Fig. 5 schematically shows a kind of block diagram of data characteristics constructing apparatus.
Fig. 6 schematically shows a kind of electronic equipment for being used to realize above-mentioned data characteristics building method.
Fig. 7 schematically shows a kind of computer-readable recording medium for being used to realize above-mentioned data characteristics building method.
Embodiment
Example embodiment is described more fully with referring now to accompanying drawing.However, example embodiment can be with a variety of shapes
Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, these embodiments are provided so that the disclosure will more
Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot
Structure or characteristic can be incorporated in one or more embodiments in any suitable manner.In the following description, there is provided permitted
More details fully understand so as to provide to embodiment of the present disclosure.It will be appreciated, however, by one skilled in the art that can
Omitted with putting into practice the technical scheme of the disclosure one or more in the specific detail, or others side can be used
Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution a presumptuous guest usurps the role of the host to avoid and
So that each side of the disclosure thickens.
In addition, accompanying drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical accompanying drawing mark in figure
Note represents same or similar part, thus will omit repetition thereof.Some block diagrams shown in accompanying drawing are work(
Can entity, not necessarily must be corresponding with physically or logically independent entity.These work(can be realized using software form
Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place
These functional entitys are realized in reason device device and/or microcontroller device.
A kind of data characteristics building method is provide firstly in this example embodiment.With reference to shown in figure 1, the data characteristics
Building method may comprise steps of:
Step S110. obtains multiple initial data features and is cleaned to obtain multiple the to each initial data feature
One target data feature.
Each first object data characteristics is merged to obtain multiple second target data features by step S120..
In above-mentioned data characteristics building method, on the one hand, by being cleaned to each initial data feature, can clean
Fall the abnormal data feature in initial data feature so that the data characteristics of construction can include each key element of Feature Selection
(rich in information content, with distinguishing property and with independence), improve the accuracy of data characteristics;The opposing party
Face, by being merged each first object data characteristics to obtain multiple second target data features, each data can be excavated
Hiding information in feature, excavated so as to further from hiding information to the association each data characteristics, can be with
So that the model for being trained and predicting by the data characteristics has higher accuracy.
Below, detailed explanation will be carried out to each step in above-mentioned data characteristics building method in this example embodiment
And explanation.
In step s 110, multiple initial data features are obtained and each initial data feature are cleaned to obtain more
Individual first object data characteristics.
First, above-mentioned initial data feature is explained and illustrated.Initial data feature can include purchase feature,
Browse feature plus purchase feature, search characteristics and certificate feature etc.;Other data characteristicses can also be included, such as can be included
Touch-control feature etc., this example are not done specifically limited to this.Wherein:
Purchase feature can be the sku amounts of nearly 1 year of user purchase, visitor's unit price, order volume, the order amount of money, order within nearly one month
Single amount of money, it is nearly three months in order volume, it is nearly three to six months in order volume etc.;Other features can also be included, such as can be
Order volume etc. in nearly 1 year, this example are not done specifically limited to this;
It can be that user browses sku amounts, number of visits, nearly 7 days, 7-15 days, 15-21 in nearly 1,2,3,4,5 day to browse feature
My god, 21-28 days numbers of visits etc.;Other features can also be included, such as can browse sku amounts etc., this example pair in 7 days
This does not do specifically limited;
Add purchase feature can be user it is nearly 1-3 days, 1-5 days, add within 1-7 days, 7-15 days, 15-30 days purchase sku number, add
The three-level category of purchase plus the trademark quantity of purchase etc.;Other features can also be included, such as can be the sku within 2 months plus purchased
Number plus the three-level category of purchase plus the trademark quantity of purchase etc.;Other features etc. can also be included, this example is not spy to this
Different limitation;
Search characteristics can be user it is nearly 10 days, 20 days, 30 days search keyword number, search brand number, search
One-level category number, three-level category number etc. or other features, such as can be the keyword of search in 2 months
Number, the brand number of search, the one-level category number of search, three-level category number etc., this example is not done specifically limited to this;
Certificate feature can be that user is nearly 3 months, use certificate number within 3-6 months, 6-9 months, 9-12 months, lead certificate number,
With certificate purchase order amount, the preferential amount of money, three-level category for being bought with certificate etc., other features can also be included, such as can be
Sku quantity bought with certificate etc., this example are not done specifically limited to this.
Secondly, step S110 is further explained and illustrated based on above-mentioned initial data feature.First, obtain
The purchase feature of all users, browse feature plus purchase feature, search characteristics and certificate feature;Then, it is special to above-mentioned initial data
Sign is cleaned, and can specifically be included:Off-note in each initial data feature is rejected multiple to obtain
First object data characteristics.It should be added that, the off-note in initial data feature carries out rejecting it herein
Before, the risk subscribers in initial data feature and brush single user can also be removed, to obtain normal users state
Initial data feature.Further, with reference to shown in figure 2, carrying out rejecting to the characteristic value in initial data feature can wrap
Include step S210- steps S240.Wherein:
In step S210, judge whether the characteristic value of each initial data feature is less than the first preset value.
First, box-shaped figure (Box-plot) is explained and illustrated.Box-shaped figure (box-plot) is the knowledge of exceptional value
A standard is not provided, and first the quartile used in box-shaped figure is simply introduced.With reference to shown in figure 3, quartile
Number is the ascending arrangement of all numerical value and is divided into quarter, and the numerical value can in three cut-point positions is referred to as four
Quantile;Wherein, first quartile (Q1), lower quartile can also be referred to as, equal to all numerical value in the sample by small
25%th numeral after to longer spread;Second quartile (Q2), can also be referred to as median, equal to all numbers in the sample
It is worth the after ascending arrangement the 50%th numeral;3rd quartile (Q3), can be referred to as upper quartile again, equal to the sample
In this after all ascending arrangements of numerical value the 75%th numeral;3rd quartile and the gap of first quartile again can be with
It is referred to as interquartile-range IQR (Inter Quartile Range, IQR).Further, in batch of data, exceptional value can be determined
Justice is the value less than Q1-1.5IQR or more than Q3+1.5IQR.
Secondly, based on above-mentioned quartile, step S210 is explained and illustrated.First, it is determined that each initial data
Whether the characteristic value of feature is less than the first preset value.For example:
Such as in initial data feature, the characteristic value of purchase feature purchase feature is 20, the first preset value Q1-1.5IQR's
It is worth for 35, then may determine that the characteristic value of the initial data feature is less than the first preset value.Herein it should be added that, by
In having used box-shaped figure to judge exceptional value here, therefore the size of the first preset value is set up dependent on box-shaped figure;
When using other method exceptional value is judged when, the decision rule of the first preset value then changes therewith, this example to this not
Do particular determination.
In step S220, when the characteristic value for judging each initial data feature is less than first preset value, pick
Except initial data feature corresponding to this feature value.Specifically:
Characteristic value due in initial data feature, buying feature is 20, and the first preset value Q1-1.5IQR value is 35,
It then may determine that the characteristic value of the initial data feature is less than the first preset value;Therefore data corresponding to this feature value can be deleted
Feature.Herein it should be added that, because each initial data feature includes multiple features, therefore in order to can
To improve the accuracy calculated, can be judged by the characteristic value of each subcharacter to each initial data feature, so it is right
Subcharacter less than the first preset value is deleted, and this example is not done specifically limited to this.
In step S230, when the characteristic value for judging each initial data feature is not less than first preset value,
Judge whether the characteristic value of each initial data feature is more than the second preset value.Specifically:
When the characteristic value for judging each initial data feature is more than above-mentioned first preset value (Q1-1.5IQR), it is also necessary to sentence
Whether the characteristic value for each initial data feature of breaking is less than the second preset value (Q3+1.5IQR).Such as in initial data feature, browse
The characteristic value of feature is 2000, and the second preset value Q3+1.5IQR value is 1800, then may determine that the spy of the initial data feature
Value indicative is more than the second preset value.Herein it should be added that, due to having used box-shaped figure to sentence exceptional value here
It is disconnected, therefore the size of the second preset value also depends on box-shaped figure and set up;Exceptional value is judged when using other method
When, the decision rule of the second preset value then changes therewith, and this example does not do particular determination to this.
In step S240, when the characteristic value for judging each initial data feature is more than second preset value, pick
Except initial data feature corresponding to this feature value.Specifically:
Because the characteristic value in initial data feature, browsing feature is 2000, the second preset value Q3+1.5IQR value is
1800, then it may determine that the characteristic value of the initial data feature is more than the second preset value;Therefore it is corresponding that this feature value can be deleted
Data characteristics.Herein it should be added that, because each initial data feature includes multiple features, therefore
In order to which the accuracy of calculating can be improved, can be judged by the characteristic value of each subcharacter to each initial data feature,
And then the subcharacter more than the second preset value is deleted, this example is not done specifically limited to this.
In the step s 120, each first object data characteristics is merged to obtain multiple second target datas spies
Sign.Wherein, with reference to shown in figure 4, fusion is carried out to first object data characteristics can include step S1202- steps S1206.Its
In:
In step S1202, multiple characteristic indexs are built.
In this example embodiment, features described above index can with average, median, upper quartile, standard deviation and
It is multiple in maximum;Other characteristic indexs can also be included, such as can be first quartile etc., this example to this not
Do specifically limited.Wherein:
Average:Average can represent the amount number of trend in a group data set, refer to all data sums in one group of data
Again divided by this group of data number;Median:, can be by being looked for after all observed values height is sorted for limited data set
Go out middle one is used as median;Upper quartile:Upper quartile refers to count descriptive analysis method by quartile
When describing data, the dispersion degree of skewness data, i.e., total data is arranged from small to large, be directly aligned in down on 1/4 position
Number be just called lower quartile and be also referred to as first quartile (according to % ratios, that is, the number on 75% position), come
Number on 1/4 position just calls quartile and is also referred to as the 3rd quartile (according to % ratios, that is, the number on 25% position);
Standard deviation:Standard deviation is the square root of the arithmetic average of deviation from average square, and it is the arithmetic square root of variance;Further,
Standard deviation can reflect the dispersion degree of a data set.
In step S1204, each characteristic index of each first object data characteristics is calculated.For example:
Calculate all users nearly 5 days plus the average of purchase sku quantity, whole station user add the middle position for purchasing sku quantity in nearly 5 days
Number, the upper quartile of nearly 5 days of whole station user plus purchase sku quantity, the standard deviation of nearly 5 days of whole station user plus purchase sku quantity and
The nearly 5 days maximums for adding purchase sku quantity of whole station user.Requiring supplementation with explanation herein is, in this example embodiment, with original number
, can also be by taking other subcharacters as an example, herein no longer according to being illustrated exemplified by feature plus purchase feature plus purchase sku subcharacters
Repeat.
In step S1206, each characteristic index of each first object data characteristics is merged to obtain more
The individual second target data feature.Specifically:
After above-mentioned calculating is completed, carry out that the fusion for carrying out characteristic index is calculated as below:
A, whole users add the average of purchase sku quantity-nearly 5 days of whole users plus purchase sku quantity for nearly 5 days;
B, whole users add the median of purchase sku quantity-nearly 5 days of whole users plus purchase sku quantity for nearly 5 days;
C, whole users add the upper quartile of purchase sku quantity-nearly 5 days of whole users plus purchase sku quantity for nearly 5 days;
D, (whole users add for nearly 5 days purchases sku quantity-standard deviation of nearly 5 days of whole users plus purchase sku quantity)/all use
The nearly 5 days averages for adding purchase sku quantity in family;
Whole users are added to the maximum of purchase sku quantity/nearly 5 days of whole users plus purchase sku quantity for nearly 5 days E,;
After features described above index merges, A-E result of calculation is as the second target data feature, by second number of targets
Input into GBDT (Gradient Boosting Decision Tree, gradient lifting decision tree) algorithm and instructed according to feature
Practice;By training result compared with the training result of initial data feature can with learn, the second target data feature obtains
Training result there is more preferable accuracy than the training result of initial data feature, and greatly avoid the sparse of feature
Property.
This example embodiment additionally provides a kind of data characteristics constructing apparatus.With reference to shown in figure 5, data characteristics construction
Device can include data characteristics cleaning module 510 and data characteristics Fusion Module 520.Wherein:
Data characteristics cleaning module 510 can be used for obtaining multiple initial data features and to each initial data feature
Cleaned to obtain multiple first object data characteristicses.
Data characteristics Fusion Module 520 can be used for being merged each first object data characteristics to obtain multiple
Two target data features.
In this example embodiment, each initial data feature is cleaned to obtain multiple first object data spies
Sign includes:Off-note in each initial data feature is rejected to obtain multiple first object data characteristicses.
In this example embodiment, carrying out rejecting to the characteristic value in each initial data feature includes:Sentence
Whether the characteristic value for each initial data feature of breaking is less than the first preset value;Judging the feature of each initial data feature
When value is less than first preset value, initial data feature corresponding to this feature value is rejected;Judging that each initial data is special
When the characteristic value of sign is not less than first preset value, judge whether the characteristic value of each initial data feature is pre- more than second
If value;When the characteristic value for judging each initial data feature is more than second preset value, reject corresponding to this feature value
Initial data feature.
In this example embodiment, each first object data characteristics is merged to obtain multiple second number of targets
Include according to feature:Build multiple characteristic indexs;Calculate each characteristic index of each first object data characteristics;By each institute
Each characteristic index for stating first object data characteristics is merged to obtain multiple second target data features.
In this example embodiment, the characteristic index include average, median, upper quartile, standard deviation and
It is multiple in maximum.
In this example embodiment, the initial data feature includes purchase feature, browses feature plus purchase feature, search
It is multiple in Suo Tezheng and certificate feature.
The detail of each module is in corresponding data characteristics building method in above-mentioned data characteristics constructing apparatus
Carry out wanting to describe in detail, therefore here is omitted.
It should be noted that although some modules or list of the equipment for action executing are referred in above-detailed
Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more
Either the feature of unit and function can embody module in a module or unit.A conversely, above-described mould
Either the feature of unit and function can be further divided into being embodied by multiple modules or unit block.
In addition, although describing each step of method in the disclosure with particular order in the accompanying drawings, still, this does not really want
These steps must be performed according to the particular order by asking or implying, or the step having to carry out shown in whole could be realized
Desired result.It is additional or alternative, it is convenient to omit some steps, multiple steps are merged into a step and performed, and/
Or a step is decomposed into execution of multiple steps etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can be realized by software, can also be realized by way of software combines necessary hardware.Therefore, according to the disclosure
The technical scheme of embodiment can be embodied in the form of software product, the software product can be stored in one it is non-volatile
Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are to cause a calculating
Equipment (can be personal computer, server, mobile terminal or network equipment etc.) is performed according to disclosure embodiment
Method.
In an exemplary embodiment of the disclosure, a kind of electronic equipment that can realize the above method is additionally provided.
Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or
Program product.Therefore, various aspects of the invention can be implemented as following form, i.e.,:It is complete hardware embodiment, complete
The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.), or hardware and software, can unite here
Referred to as " circuit ", " module " or " system ".
The electronic equipment 600 according to the embodiment of the invention is described referring to Fig. 6.The electronics that Fig. 6 is shown
Equipment 600 is only an example, should not bring any restrictions to the function and use range of the embodiment of the present invention.
As shown in fig. 6, electronic equipment 600 is showed in the form of universal computing device.The component of electronic equipment 600 can wrap
Include but be not limited to:Above-mentioned at least one processing unit 610, above-mentioned at least one memory cell 620, connection different system component
The bus 630 of (including memory cell 620 and processing unit 610).
Wherein, the memory cell is had program stored therein code, and described program code can be held by the processing unit 610
OK so that the processing unit 610 performs various according to the present invention described in above-mentioned " illustrative methods " part of this specification
The step of illustrative embodiments.For example, the processing unit 610 can perform step S110 as shown in fig. 1:Obtain more
Individual initial data feature simultaneously is cleaned to obtain multiple first object data characteristicses to each initial data feature;S120:Will
Each first object data characteristics is merged to obtain multiple second target data features.
Memory cell 620 can include the computer-readable recording medium of volatile memory cell form, such as Random Access Storage Unit
(RAM) 6201 and/or cache memory unit 6202, it can further include read-only memory unit (ROM) 6203.
Memory cell 620 can also include program/utility with one group of (at least one) program module 6205
6204, such program module 6205 includes but is not limited to:Operating system, one or more application program, other program moulds
Block and routine data, the realization of network environment may be included in each or certain combination in these examples.
Bus 630 can be to represent the one or more in a few class bus structures, including memory cell bus or storage
Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures
Local bus.
Electronic equipment 600 can also be with one or more external equipments 700 (such as keyboard, sensing equipment, bluetooth equipment
Deng) communication, the equipment communication interacted with the electronic equipment 600 can be also enabled a user to one or more, and/or with causing
Any equipment that the electronic equipment 600 can be communicated with one or more of the other computing device (such as router, modulation /demodulation
Device etc.) communication.This communication can be carried out by input/output (I/O) interface 650.Also, electronic equipment 600 can be with
By network adapter 660 and one or more network (such as LAN (LAN), wide area network (WAN) and/or public network,
Such as internet) communication.As illustrated, network adapter 660 is communicated by bus 630 with other modules of electronic equipment 600.
It should be understood that although not shown in the drawings, can combine electronic equipment 600 does not use other hardware and/or software module, including but not
It is limited to:Microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and
Data backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can be realized by software, can also be realized by way of software combines necessary hardware.Therefore, according to the disclosure
The technical scheme of embodiment can be embodied in the form of software product, the software product can be stored in one it is non-volatile
Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are to cause a calculating
Equipment (can be personal computer, server, terminal installation or network equipment etc.) is performed according to disclosure embodiment
Method.
In an exemplary embodiment of the disclosure, a kind of computer-readable recording medium is additionally provided, is stored thereon with energy
Enough realize the program product of this specification above method.In some possible embodiments, various aspects of the invention may be used also
In the form of being embodied as a kind of program product, it includes program code, when described program product is run on the terminal device, institute
State program code be used for make the terminal device perform described in above-mentioned " illustrative methods " part of this specification according to this hair
The step of bright various illustrative embodiments.
With reference to shown in figure 7, the program product for being used to realize the above method according to the embodiment of the present invention is described
800, it can use portable compact disc read only memory (CD-ROM) and including program code, and can in terminal device,
Such as run on PC.However, the program product not limited to this of the present invention, in this document, readable storage medium storing program for executing can be with
Be it is any include or the tangible medium of storage program, the program can be commanded execution system, device either device use or
It is in connection.
Described program product can use any combination of one or more computer-readable recording mediums.Computer-readable recording medium can be readable letter
Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared ray or
System, device or the device of semiconductor, or any combination above.The more specifically example of readable storage medium storing program for executing is (non exhaustive
List) include:It is electrical connection, portable disc, hard disk, random access memory (RAM) with one or more wires, read-only
Memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read only memory
(CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
Computer-readable signal media can be including the data-signal in a base band or as carrier wave part propagation, its
In carry readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal,
Optical signal or above-mentioned any appropriate combination.Readable signal medium can also be any readable Jie beyond readable storage medium storing program for executing
Matter, the computer-readable recording medium can send, propagate either transmit for used by instruction execution system, device or device or and its
The program of combined use.
The program code included on computer-readable recording medium can be transmitted with any appropriate medium, including but not limited to wirelessly, be had
Line, optical cable, RF etc., or above-mentioned any appropriate combination.
Can being combined to write the program operated for performing the present invention with one or more programming languages
Code, described program design language include object oriented program language-Java, C++ etc., include routine
Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user
Perform on computing device, partly perform on a user device, the software kit independent as one performs, is partly calculated in user
Its upper side point is performed or performed completely in remote computing device or server on a remote computing.It is remote being related to
In the situation of journey computing device, remote computing device can pass through the network of any kind, including LAN (LAN) or wide area network
(WAN) user calculating equipment, is connected to, or, it may be connected to external computing device (such as utilize ISP
To pass through Internet connection).
In addition, above-mentioned accompanying drawing is only the schematic theory of the processing included by method according to an exemplary embodiment of the present invention
It is bright, rather than limitation purpose.It can be readily appreciated that the time that above-mentioned processing shown in the drawings was not intended that or limited these processing is suitable
Sequence.In addition, being also easy to understand, these processing for example can be performed either synchronously or asynchronously in multiple modules.
Those skilled in the art will readily occur to the disclosure its after considering specification and putting into practice invention disclosed herein
His embodiment.The application is intended to any modification, purposes or the adaptations of the disclosure, these modifications, purposes or
Adaptations follow the general principle of the disclosure and including the undocumented common knowledge in the art of the disclosure or
Conventional techniques.Description and embodiments are considered only as exemplary, and the true scope of the disclosure and spirit are by claim
Point out.
Claims (14)
- A kind of 1. data characteristics building method, it is characterised in that including:Obtain multiple initial data features and each initial data feature is cleaned to obtain multiple first object data spies Sign;Each first object data characteristics is merged to obtain multiple second target data features.
- 2. data characteristics building method according to claim 1, it is characterised in that carried out to each initial data feature Cleaning, which obtains multiple first object data characteristicses, to be included:Off-note in each initial data feature is rejected to obtain multiple first object data characteristicses.
- 3. data characteristics building method according to claim 2, it is characterised in that in each initial data feature Characteristic value, which carries out rejecting, to be included:Judge whether the characteristic value of each initial data feature is less than the first preset value;When the characteristic value for judging each initial data feature is less than first preset value, reject former corresponding to this feature value Beginning data characteristics;AndWhen the characteristic value for judging each initial data feature is not less than first preset value, each initial data is judged Whether the characteristic value of feature is more than the second preset value;When the characteristic value for judging each initial data feature is more than second preset value, reject former corresponding to this feature value Beginning data characteristics.
- 4. data characteristics building method according to claim 1, it is characterised in that by each first object data characteristics Being merged to obtain multiple second target data features includes:Build multiple characteristic indexs;Calculate each characteristic index of each first object data characteristics;Each characteristic index of each first object data characteristics is merged to obtain multiple second target datas Feature.
- 5. data characteristics building method according to claim 4, it is characterised in that the characteristic index include average, in It is multiple in digit, upper quartile, standard deviation and maximum.
- 6. according to the data characteristics building method described in claim any one of 1-5, it is characterised in that the initial data feature Including purchase feature, browse it is multiple in feature plus purchase feature, search characteristics and certificate feature.
- A kind of 7. data characteristics constructing apparatus, it is characterised in that including:Data characteristics cleaning module, for obtaining multiple initial data features and to each initial data feature clean To multiple first object data characteristicses;Data characteristics Fusion Module, for being merged to obtain multiple second target datas each first object data characteristics Feature.
- 8. data characteristics constructing apparatus according to claim 7, it is characterised in that carried out to each initial data feature Cleaning, which obtains multiple first object data characteristicses, to be included:Off-note in each initial data feature is rejected to obtain multiple first object data characteristicses.
- 9. data characteristics constructing apparatus according to claim 8, it is characterised in that in each initial data feature Characteristic value, which carries out rejecting, to be included:Judge whether the characteristic value of each initial data feature is less than the first preset value;When the characteristic value for judging each initial data feature is less than first preset value, reject former corresponding to this feature value Beginning data characteristics;AndWhen the characteristic value for judging each initial data feature is not less than first preset value, each initial data is judged Whether the characteristic value of feature is more than the second preset value;When the characteristic value for judging each initial data feature is more than second preset value, reject former corresponding to this feature value Beginning data characteristics.
- 10. data characteristics constructing apparatus according to claim 7, it is characterised in that each first object data are special Sign, which is merged to obtain multiple second target data features, to be included:Build multiple characteristic indexs;Calculate each characteristic index of each first object data characteristics;Each characteristic index of each first object data characteristics is merged to obtain multiple second target datas Feature.
- 11. data characteristics constructing apparatus according to claim 10, it is characterised in that the characteristic index include average, It is multiple in median, upper quartile, standard deviation and maximum.
- 12. according to the data characteristics constructing apparatus described in claim any one of 7-11, it is characterised in that the initial data is special Sign includes buying feature, browses feature plus purchase multiple in feature, search characteristics and certificate feature.
- 13. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the computer program The data characteristics building method described in claim any one of 1-6 is realized when being executed by processor.
- 14. a kind of electronic equipment, it is characterised in that including:Processor;AndMemory, for storing the executable instruction of the processor;Wherein, the processor is configured to come described in perform claim requirement any one of 1-6 via the execution executable instruction Data characteristics building method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710954269.7A CN107644102B (en) | 2017-10-13 | 2017-10-13 | Data feature construction method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710954269.7A CN107644102B (en) | 2017-10-13 | 2017-10-13 | Data feature construction method and device, storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107644102A true CN107644102A (en) | 2018-01-30 |
CN107644102B CN107644102B (en) | 2020-11-03 |
Family
ID=61123521
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710954269.7A Active CN107644102B (en) | 2017-10-13 | 2017-10-13 | Data feature construction method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107644102B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020155788A1 (en) * | 2019-01-30 | 2020-08-06 | 阿里巴巴集团控股有限公司 | Data determination method, apparatus and device, and medium |
CN112199374A (en) * | 2020-09-29 | 2021-01-08 | 中国平安人寿保险股份有限公司 | Data feature mining method aiming at data missing and related equipment thereof |
CN113315721A (en) * | 2021-05-26 | 2021-08-27 | 恒安嘉新(北京)科技股份公司 | Network data feature processing method, device, equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101702262A (en) * | 2009-11-06 | 2010-05-05 | 北京交通大学 | Data syncretizing method for urban traffic circulation indexes |
US20120215735A1 (en) * | 2011-02-18 | 2012-08-23 | Larus Technologies Corporation | System and Method for Data Fusion with Adaptive Learning |
CN105512687A (en) * | 2015-12-15 | 2016-04-20 | 北京锐安科技有限公司 | Emotion classification model training and textual emotion polarity analysis method and system |
CN105930942A (en) * | 2016-06-03 | 2016-09-07 | 北京理工大学 | Intelligent system for predicting energy technologies under big data background |
CN106777274A (en) * | 2016-06-16 | 2017-05-31 | 北京理工大学 | A kind of Chinese tour field knowledge mapping construction method and system |
CN106846805A (en) * | 2017-03-06 | 2017-06-13 | 南京多伦科技股份有限公司 | A kind of dynamic road grid traffic needing forecasting method and its system |
-
2017
- 2017-10-13 CN CN201710954269.7A patent/CN107644102B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101702262A (en) * | 2009-11-06 | 2010-05-05 | 北京交通大学 | Data syncretizing method for urban traffic circulation indexes |
US20120215735A1 (en) * | 2011-02-18 | 2012-08-23 | Larus Technologies Corporation | System and Method for Data Fusion with Adaptive Learning |
CN105512687A (en) * | 2015-12-15 | 2016-04-20 | 北京锐安科技有限公司 | Emotion classification model training and textual emotion polarity analysis method and system |
CN105930942A (en) * | 2016-06-03 | 2016-09-07 | 北京理工大学 | Intelligent system for predicting energy technologies under big data background |
CN106777274A (en) * | 2016-06-16 | 2017-05-31 | 北京理工大学 | A kind of Chinese tour field knowledge mapping construction method and system |
CN106846805A (en) * | 2017-03-06 | 2017-06-13 | 南京多伦科技股份有限公司 | A kind of dynamic road grid traffic needing forecasting method and its system |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020155788A1 (en) * | 2019-01-30 | 2020-08-06 | 阿里巴巴集团控股有限公司 | Data determination method, apparatus and device, and medium |
CN112199374A (en) * | 2020-09-29 | 2021-01-08 | 中国平安人寿保险股份有限公司 | Data feature mining method aiming at data missing and related equipment thereof |
CN112199374B (en) * | 2020-09-29 | 2023-12-05 | 中国平安人寿保险股份有限公司 | Data feature mining method for data missing and related equipment thereof |
CN113315721A (en) * | 2021-05-26 | 2021-08-27 | 恒安嘉新(北京)科技股份公司 | Network data feature processing method, device, equipment and storage medium |
CN113315721B (en) * | 2021-05-26 | 2023-01-17 | 恒安嘉新(北京)科技股份公司 | Network data feature processing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107644102B (en) | 2020-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104346372B (en) | Method and apparatus for assessment prediction model | |
CN109360012A (en) | The selection method and device, storage medium, electronic equipment of advertisement dispensing channel | |
CN108804704A (en) | A kind of user's depth portrait method and device | |
CN112270546A (en) | Risk prediction method and device based on stacking algorithm and electronic equipment | |
WO2022083093A1 (en) | Probability calculation method and apparatus in graph, computer device and storage medium | |
CN107644102A (en) | Data characteristics building method and device, storage medium, electronic equipment | |
CN111325619A (en) | Credit card fraud detection model updating method and device based on joint learning | |
CN110363636A (en) | Risk of fraud recognition methods and device based on relational network | |
CN107392801A (en) | The method and its device, storage medium, electronic equipment of order are upset in control | |
CN110689135B (en) | Anti-money laundering model training method and device and electronic equipment | |
EP2947610A1 (en) | Business problem networking system and tool | |
CN107609958A (en) | Behavioral guidance strategy determines method and device, storage medium and electronic equipment | |
CN110348208A (en) | A kind of risk control method based on user behavior and neural network, device and electronic equipment | |
CN113609345B (en) | Target object association method and device, computing equipment and storage medium | |
CN113420212A (en) | Deep feature learning-based recommendation method, device, equipment and storage medium | |
CN110363654A (en) | A kind of favor information method for pushing, device and electronic equipment | |
CN112380104A (en) | User attribute identification method and device, electronic equipment and storage medium | |
CN110362825A (en) | A kind of text based finance data abstracting method, device and electronic equipment | |
CN111125445B (en) | Community theme generation method and device, electronic equipment and storage medium | |
CN106779929A (en) | A kind of Products Show method, device and computing device | |
CN110348190B (en) | User equipment attribution judging method and device based on user operation behaviors | |
CN110413632A (en) | Method, apparatus, computer-readable medium and the electronic equipment of controlled state | |
CN104679492A (en) | Computer-implemented technical support providing device and method | |
CN107590012A (en) | Equipment goes offline analysis of causes method and device, storage medium, electronic equipment | |
CN110717101B (en) | User classification method and device based on application behaviors and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |