CN107644102A - Data characteristics building method and device, storage medium, electronic equipment - Google Patents

Data characteristics building method and device, storage medium, electronic equipment Download PDF

Info

Publication number
CN107644102A
CN107644102A CN201710954269.7A CN201710954269A CN107644102A CN 107644102 A CN107644102 A CN 107644102A CN 201710954269 A CN201710954269 A CN 201710954269A CN 107644102 A CN107644102 A CN 107644102A
Authority
CN
China
Prior art keywords
feature
data
data characteristics
initial data
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710954269.7A
Other languages
Chinese (zh)
Other versions
CN107644102B (en
Inventor
王硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710954269.7A priority Critical patent/CN107644102B/en
Publication of CN107644102A publication Critical patent/CN107644102A/en
Application granted granted Critical
Publication of CN107644102B publication Critical patent/CN107644102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure is directed to a kind of data characteristics building method and device, belong to technical field of data processing, this method includes:Obtain multiple initial data features and each initial data feature is cleaned to obtain multiple first object data characteristicses;Each first object data characteristics is merged to obtain multiple second target data features.This method obtains multiple second target data features by the way that each first object data characteristics is merged, the hiding information in each data characteristics can be excavated, so as to further be excavated from hiding information to the association each data characteristics, the model that can be trained and predict by the data characteristics has higher accuracy.

Description

Data characteristics building method and device, storage medium, electronic equipment
Technical field
This disclosure relates to technical field of data processing, in particular to a kind of data characteristics building method, data characteristics Constructing apparatus, computer-readable recording medium and electronic equipment.
Background technology
, it is necessary to which extracting the more sparse data characteristics of some users carries out feature mining in Data Mining Project.It is sparse Feature can for example include the feature of window, nearly 1-5 days of user browse plus feature of purchase and search etc..
But, it can be seen that the characteristic value of above-mentioned data characteristics is largely zero after data statistics.In general, number Need to include three key elements according to the eigenvalue of feature:Informative (rich in information content), Discriminative (there is distinction) and Independent (independent);And when data characteristics is excessively sparse, the spy of data characteristics Value indicative cannot be satisfied with above three key element.
In general, it is only having chosen, have strong discrimination feature could make it that the output effect of model is preferable, and with The above-mentioned sparse features of the removed users of meaning can lose many hiding deeper important informations again, and the result to data mining is brought Larger error.
Accordingly, it is desirable to provide a kind of new data characteristics building method and device.
It should be noted that information is only used for strengthening the reason to the background of the disclosure disclosed in above-mentioned background section Solution, therefore can include not forming the information to prior art known to persons of ordinary skill in the art.
The content of the invention
The purpose of the disclosure is to provide a kind of data characteristics building method, data characteristics constructing apparatus, computer-readable Storage medium and electronic equipment, and then at least overcome to a certain extent caused by the limitation of correlation technique and defect One or more problem.
According to an aspect of this disclosure, there is provided a kind of data characteristics building method, including:
Obtain multiple initial data features and each initial data feature is cleaned to obtain multiple first object numbers According to feature;
Each first object data characteristics is merged to obtain multiple second target data features.
In a kind of exemplary embodiment of the disclosure, each initial data feature is cleaned to obtain multiple first Target data feature includes:
Off-note in each initial data feature is rejected to obtain multiple first object data characteristicses.
In a kind of exemplary embodiment of the disclosure, the characteristic value in each initial data feature is picked Except including:
Judge whether the characteristic value of each initial data feature is less than the first preset value;
When the characteristic value for judging each initial data feature is less than first preset value, it is corresponding to reject this feature value Initial data feature;And
When the characteristic value for judging each initial data feature is not less than first preset value, judge each described original Whether the characteristic value of data characteristics is more than the second preset value;
When the characteristic value for judging each initial data feature is more than second preset value, it is corresponding to reject this feature value Initial data feature.
In a kind of exemplary embodiment of the disclosure, each first object data characteristics is merged to obtain multiple Second target data feature includes:
Build multiple characteristic indexs;
Calculate each characteristic index of each first object data characteristics;
Each characteristic index of each first object data characteristics is merged to obtain multiple second targets Data characteristics.
In a kind of exemplary embodiment of the disclosure, the characteristic index include average, median, upper quartile, It is multiple in standard deviation and maximum.
In a kind of exemplary embodiment of the disclosure, the initial data feature includes purchase feature, browses feature, adds Purchase multiple in feature, search characteristics and certificate feature.
According to an aspect of this disclosure, there is provided a kind of data characteristics constructing apparatus, including:
Data characteristics cleaning module, for obtaining multiple initial data features and being carried out to each initial data feature clear Wash to obtain multiple first object data characteristicses;
Data characteristics Fusion Module, for being merged to obtain multiple second targets each first object data characteristics Data characteristics.
In a kind of exemplary embodiment of the disclosure, each initial data feature is cleaned to obtain multiple first Target data feature includes:
Off-note in each initial data feature is rejected to obtain multiple first object data characteristicses.
In a kind of exemplary embodiment of the disclosure, the characteristic value in each initial data feature is picked Except including:
Judge whether the characteristic value of each initial data feature is less than the first preset value;
When the characteristic value for judging each initial data feature is less than first preset value, it is corresponding to reject this feature value Initial data feature;And
When the characteristic value for judging each initial data feature is not less than first preset value, judge each described original Whether the characteristic value of data characteristics is more than the second preset value;
When the characteristic value for judging each initial data feature is more than second preset value, it is corresponding to reject this feature value Initial data feature.
In a kind of exemplary embodiment of the disclosure, each first object data characteristics is merged to obtain multiple Second target data feature includes:
Build multiple characteristic indexs;
Calculate each characteristic index of each first object data characteristics;
Each characteristic index of each first object data characteristics is merged to obtain multiple second targets Data characteristics.
In a kind of exemplary embodiment of the disclosure, the characteristic index include average, median, upper quartile, It is multiple in standard deviation and maximum.
In a kind of exemplary embodiment of the disclosure, the initial data feature includes purchase feature, browses feature, adds Purchase multiple in feature, search characteristics and certificate feature.
According to an aspect of this disclosure, there is provided a kind of computer-readable recording medium, computer program is stored thereon with, The computer program realizes the data characteristics building method described in above-mentioned any one when being executed by processor.
According to an aspect of this disclosure, there is provided a kind of electronic equipment, including:
Processor;And
Memory, for storing the executable instruction of the processor;
Wherein, the processor is configured to perform the number described in above-mentioned any one via the executable instruction is performed According to latent structure method.
A kind of data characteristics building method of the disclosure and device, by obtaining multiple initial data features and to each original number Cleaned to obtain multiple first object data characteristicses according to feature;Then each first object data characteristics is merged to obtain more Individual second target data feature;On the one hand, by being cleaned to each initial data feature, initial data spy can be washed Abnormal data feature in sign so that the data characteristics of construction can include Feature Selection each key element (it is rich in information content, It is with distinguishing property and with independence), improve the accuracy of data characteristics;On the other hand, by by each first Target data feature is merged to obtain multiple second target data features, can excavate the hiding letter in each data characteristics Breath, so as to further be excavated from hiding information to the association each data characteristics, can cause by the data The model that feature is trained and predicted has higher accuracy.
It should be appreciated that the general description and following detailed description of the above are only exemplary and explanatory, not The disclosure can be limited.
Brief description of the drawings
Accompanying drawing herein is merged in specification and forms the part of this specification, shows the implementation for meeting the disclosure Example, and be used to together with specification to explain the principle of the disclosure.It should be evident that drawings in the following description are only the disclosure Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis These accompanying drawings obtain other accompanying drawings.
Fig. 1 schematically shows a kind of flow chart of data characteristics building method.
Fig. 2 schematically shows the method flow diagram that the characteristic value in a kind of feature to initial data is rejected.
Fig. 3 schematically shows a kind of box-shaped diagram illustration.
Fig. 4 schematically shows a kind of method flow diagram merged to first object data.
Fig. 5 schematically shows a kind of block diagram of data characteristics constructing apparatus.
Fig. 6 schematically shows a kind of electronic equipment for being used to realize above-mentioned data characteristics building method.
Fig. 7 schematically shows a kind of computer-readable recording medium for being used to realize above-mentioned data characteristics building method.
Embodiment
Example embodiment is described more fully with referring now to accompanying drawing.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, these embodiments are provided so that the disclosure will more Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot Structure or characteristic can be incorporated in one or more embodiments in any suitable manner.In the following description, there is provided permitted More details fully understand so as to provide to embodiment of the present disclosure.It will be appreciated, however, by one skilled in the art that can Omitted with putting into practice the technical scheme of the disclosure one or more in the specific detail, or others side can be used Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution a presumptuous guest usurps the role of the host to avoid and So that each side of the disclosure thickens.
In addition, accompanying drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical accompanying drawing mark in figure Note represents same or similar part, thus will omit repetition thereof.Some block diagrams shown in accompanying drawing are work( Can entity, not necessarily must be corresponding with physically or logically independent entity.These work(can be realized using software form Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place These functional entitys are realized in reason device device and/or microcontroller device.
A kind of data characteristics building method is provide firstly in this example embodiment.With reference to shown in figure 1, the data characteristics Building method may comprise steps of:
Step S110. obtains multiple initial data features and is cleaned to obtain multiple the to each initial data feature One target data feature.
Each first object data characteristics is merged to obtain multiple second target data features by step S120..
In above-mentioned data characteristics building method, on the one hand, by being cleaned to each initial data feature, can clean Fall the abnormal data feature in initial data feature so that the data characteristics of construction can include each key element of Feature Selection (rich in information content, with distinguishing property and with independence), improve the accuracy of data characteristics;The opposing party Face, by being merged each first object data characteristics to obtain multiple second target data features, each data can be excavated Hiding information in feature, excavated so as to further from hiding information to the association each data characteristics, can be with So that the model for being trained and predicting by the data characteristics has higher accuracy.
Below, detailed explanation will be carried out to each step in above-mentioned data characteristics building method in this example embodiment And explanation.
In step s 110, multiple initial data features are obtained and each initial data feature are cleaned to obtain more Individual first object data characteristics.
First, above-mentioned initial data feature is explained and illustrated.Initial data feature can include purchase feature, Browse feature plus purchase feature, search characteristics and certificate feature etc.;Other data characteristicses can also be included, such as can be included Touch-control feature etc., this example are not done specifically limited to this.Wherein:
Purchase feature can be the sku amounts of nearly 1 year of user purchase, visitor's unit price, order volume, the order amount of money, order within nearly one month Single amount of money, it is nearly three months in order volume, it is nearly three to six months in order volume etc.;Other features can also be included, such as can be Order volume etc. in nearly 1 year, this example are not done specifically limited to this;
It can be that user browses sku amounts, number of visits, nearly 7 days, 7-15 days, 15-21 in nearly 1,2,3,4,5 day to browse feature My god, 21-28 days numbers of visits etc.;Other features can also be included, such as can browse sku amounts etc., this example pair in 7 days This does not do specifically limited;
Add purchase feature can be user it is nearly 1-3 days, 1-5 days, add within 1-7 days, 7-15 days, 15-30 days purchase sku number, add The three-level category of purchase plus the trademark quantity of purchase etc.;Other features can also be included, such as can be the sku within 2 months plus purchased Number plus the three-level category of purchase plus the trademark quantity of purchase etc.;Other features etc. can also be included, this example is not spy to this Different limitation;
Search characteristics can be user it is nearly 10 days, 20 days, 30 days search keyword number, search brand number, search One-level category number, three-level category number etc. or other features, such as can be the keyword of search in 2 months Number, the brand number of search, the one-level category number of search, three-level category number etc., this example is not done specifically limited to this;
Certificate feature can be that user is nearly 3 months, use certificate number within 3-6 months, 6-9 months, 9-12 months, lead certificate number, With certificate purchase order amount, the preferential amount of money, three-level category for being bought with certificate etc., other features can also be included, such as can be Sku quantity bought with certificate etc., this example are not done specifically limited to this.
Secondly, step S110 is further explained and illustrated based on above-mentioned initial data feature.First, obtain The purchase feature of all users, browse feature plus purchase feature, search characteristics and certificate feature;Then, it is special to above-mentioned initial data Sign is cleaned, and can specifically be included:Off-note in each initial data feature is rejected multiple to obtain First object data characteristics.It should be added that, the off-note in initial data feature carries out rejecting it herein Before, the risk subscribers in initial data feature and brush single user can also be removed, to obtain normal users state Initial data feature.Further, with reference to shown in figure 2, carrying out rejecting to the characteristic value in initial data feature can wrap Include step S210- steps S240.Wherein:
In step S210, judge whether the characteristic value of each initial data feature is less than the first preset value.
First, box-shaped figure (Box-plot) is explained and illustrated.Box-shaped figure (box-plot) is the knowledge of exceptional value A standard is not provided, and first the quartile used in box-shaped figure is simply introduced.With reference to shown in figure 3, quartile Number is the ascending arrangement of all numerical value and is divided into quarter, and the numerical value can in three cut-point positions is referred to as four Quantile;Wherein, first quartile (Q1), lower quartile can also be referred to as, equal to all numerical value in the sample by small 25%th numeral after to longer spread;Second quartile (Q2), can also be referred to as median, equal to all numbers in the sample It is worth the after ascending arrangement the 50%th numeral;3rd quartile (Q3), can be referred to as upper quartile again, equal to the sample In this after all ascending arrangements of numerical value the 75%th numeral;3rd quartile and the gap of first quartile again can be with It is referred to as interquartile-range IQR (Inter Quartile Range, IQR).Further, in batch of data, exceptional value can be determined Justice is the value less than Q1-1.5IQR or more than Q3+1.5IQR.
Secondly, based on above-mentioned quartile, step S210 is explained and illustrated.First, it is determined that each initial data Whether the characteristic value of feature is less than the first preset value.For example:
Such as in initial data feature, the characteristic value of purchase feature purchase feature is 20, the first preset value Q1-1.5IQR's It is worth for 35, then may determine that the characteristic value of the initial data feature is less than the first preset value.Herein it should be added that, by In having used box-shaped figure to judge exceptional value here, therefore the size of the first preset value is set up dependent on box-shaped figure; When using other method exceptional value is judged when, the decision rule of the first preset value then changes therewith, this example to this not Do particular determination.
In step S220, when the characteristic value for judging each initial data feature is less than first preset value, pick Except initial data feature corresponding to this feature value.Specifically:
Characteristic value due in initial data feature, buying feature is 20, and the first preset value Q1-1.5IQR value is 35, It then may determine that the characteristic value of the initial data feature is less than the first preset value;Therefore data corresponding to this feature value can be deleted Feature.Herein it should be added that, because each initial data feature includes multiple features, therefore in order to can To improve the accuracy calculated, can be judged by the characteristic value of each subcharacter to each initial data feature, so it is right Subcharacter less than the first preset value is deleted, and this example is not done specifically limited to this.
In step S230, when the characteristic value for judging each initial data feature is not less than first preset value, Judge whether the characteristic value of each initial data feature is more than the second preset value.Specifically:
When the characteristic value for judging each initial data feature is more than above-mentioned first preset value (Q1-1.5IQR), it is also necessary to sentence Whether the characteristic value for each initial data feature of breaking is less than the second preset value (Q3+1.5IQR).Such as in initial data feature, browse The characteristic value of feature is 2000, and the second preset value Q3+1.5IQR value is 1800, then may determine that the spy of the initial data feature Value indicative is more than the second preset value.Herein it should be added that, due to having used box-shaped figure to sentence exceptional value here It is disconnected, therefore the size of the second preset value also depends on box-shaped figure and set up;Exceptional value is judged when using other method When, the decision rule of the second preset value then changes therewith, and this example does not do particular determination to this.
In step S240, when the characteristic value for judging each initial data feature is more than second preset value, pick Except initial data feature corresponding to this feature value.Specifically:
Because the characteristic value in initial data feature, browsing feature is 2000, the second preset value Q3+1.5IQR value is 1800, then it may determine that the characteristic value of the initial data feature is more than the second preset value;Therefore it is corresponding that this feature value can be deleted Data characteristics.Herein it should be added that, because each initial data feature includes multiple features, therefore In order to which the accuracy of calculating can be improved, can be judged by the characteristic value of each subcharacter to each initial data feature, And then the subcharacter more than the second preset value is deleted, this example is not done specifically limited to this.
In the step s 120, each first object data characteristics is merged to obtain multiple second target datas spies Sign.Wherein, with reference to shown in figure 4, fusion is carried out to first object data characteristics can include step S1202- steps S1206.Its In:
In step S1202, multiple characteristic indexs are built.
In this example embodiment, features described above index can with average, median, upper quartile, standard deviation and It is multiple in maximum;Other characteristic indexs can also be included, such as can be first quartile etc., this example to this not Do specifically limited.Wherein:
Average:Average can represent the amount number of trend in a group data set, refer to all data sums in one group of data Again divided by this group of data number;Median:, can be by being looked for after all observed values height is sorted for limited data set Go out middle one is used as median;Upper quartile:Upper quartile refers to count descriptive analysis method by quartile When describing data, the dispersion degree of skewness data, i.e., total data is arranged from small to large, be directly aligned in down on 1/4 position Number be just called lower quartile and be also referred to as first quartile (according to % ratios, that is, the number on 75% position), come Number on 1/4 position just calls quartile and is also referred to as the 3rd quartile (according to % ratios, that is, the number on 25% position); Standard deviation:Standard deviation is the square root of the arithmetic average of deviation from average square, and it is the arithmetic square root of variance;Further, Standard deviation can reflect the dispersion degree of a data set.
In step S1204, each characteristic index of each first object data characteristics is calculated.For example:
Calculate all users nearly 5 days plus the average of purchase sku quantity, whole station user add the middle position for purchasing sku quantity in nearly 5 days Number, the upper quartile of nearly 5 days of whole station user plus purchase sku quantity, the standard deviation of nearly 5 days of whole station user plus purchase sku quantity and The nearly 5 days maximums for adding purchase sku quantity of whole station user.Requiring supplementation with explanation herein is, in this example embodiment, with original number , can also be by taking other subcharacters as an example, herein no longer according to being illustrated exemplified by feature plus purchase feature plus purchase sku subcharacters Repeat.
In step S1206, each characteristic index of each first object data characteristics is merged to obtain more The individual second target data feature.Specifically:
After above-mentioned calculating is completed, carry out that the fusion for carrying out characteristic index is calculated as below:
A, whole users add the average of purchase sku quantity-nearly 5 days of whole users plus purchase sku quantity for nearly 5 days;
B, whole users add the median of purchase sku quantity-nearly 5 days of whole users plus purchase sku quantity for nearly 5 days;
C, whole users add the upper quartile of purchase sku quantity-nearly 5 days of whole users plus purchase sku quantity for nearly 5 days;
D, (whole users add for nearly 5 days purchases sku quantity-standard deviation of nearly 5 days of whole users plus purchase sku quantity)/all use The nearly 5 days averages for adding purchase sku quantity in family;
Whole users are added to the maximum of purchase sku quantity/nearly 5 days of whole users plus purchase sku quantity for nearly 5 days E,;
After features described above index merges, A-E result of calculation is as the second target data feature, by second number of targets Input into GBDT (Gradient Boosting Decision Tree, gradient lifting decision tree) algorithm and instructed according to feature Practice;By training result compared with the training result of initial data feature can with learn, the second target data feature obtains Training result there is more preferable accuracy than the training result of initial data feature, and greatly avoid the sparse of feature Property.
This example embodiment additionally provides a kind of data characteristics constructing apparatus.With reference to shown in figure 5, data characteristics construction Device can include data characteristics cleaning module 510 and data characteristics Fusion Module 520.Wherein:
Data characteristics cleaning module 510 can be used for obtaining multiple initial data features and to each initial data feature Cleaned to obtain multiple first object data characteristicses.
Data characteristics Fusion Module 520 can be used for being merged each first object data characteristics to obtain multiple Two target data features.
In this example embodiment, each initial data feature is cleaned to obtain multiple first object data spies Sign includes:Off-note in each initial data feature is rejected to obtain multiple first object data characteristicses.
In this example embodiment, carrying out rejecting to the characteristic value in each initial data feature includes:Sentence Whether the characteristic value for each initial data feature of breaking is less than the first preset value;Judging the feature of each initial data feature When value is less than first preset value, initial data feature corresponding to this feature value is rejected;Judging that each initial data is special When the characteristic value of sign is not less than first preset value, judge whether the characteristic value of each initial data feature is pre- more than second If value;When the characteristic value for judging each initial data feature is more than second preset value, reject corresponding to this feature value Initial data feature.
In this example embodiment, each first object data characteristics is merged to obtain multiple second number of targets Include according to feature:Build multiple characteristic indexs;Calculate each characteristic index of each first object data characteristics;By each institute Each characteristic index for stating first object data characteristics is merged to obtain multiple second target data features.
In this example embodiment, the characteristic index include average, median, upper quartile, standard deviation and It is multiple in maximum.
In this example embodiment, the initial data feature includes purchase feature, browses feature plus purchase feature, search It is multiple in Suo Tezheng and certificate feature.
The detail of each module is in corresponding data characteristics building method in above-mentioned data characteristics constructing apparatus Carry out wanting to describe in detail, therefore here is omitted.
It should be noted that although some modules or list of the equipment for action executing are referred in above-detailed Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more Either the feature of unit and function can embody module in a module or unit.A conversely, above-described mould Either the feature of unit and function can be further divided into being embodied by multiple modules or unit block.
In addition, although describing each step of method in the disclosure with particular order in the accompanying drawings, still, this does not really want These steps must be performed according to the particular order by asking or implying, or the step having to carry out shown in whole could be realized Desired result.It is additional or alternative, it is convenient to omit some steps, multiple steps are merged into a step and performed, and/ Or a step is decomposed into execution of multiple steps etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can be realized by software, can also be realized by way of software combines necessary hardware.Therefore, according to the disclosure The technical scheme of embodiment can be embodied in the form of software product, the software product can be stored in one it is non-volatile Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are to cause a calculating Equipment (can be personal computer, server, mobile terminal or network equipment etc.) is performed according to disclosure embodiment Method.
In an exemplary embodiment of the disclosure, a kind of electronic equipment that can realize the above method is additionally provided.
Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or Program product.Therefore, various aspects of the invention can be implemented as following form, i.e.,:It is complete hardware embodiment, complete The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.), or hardware and software, can unite here Referred to as " circuit ", " module " or " system ".
The electronic equipment 600 according to the embodiment of the invention is described referring to Fig. 6.The electronics that Fig. 6 is shown Equipment 600 is only an example, should not bring any restrictions to the function and use range of the embodiment of the present invention.
As shown in fig. 6, electronic equipment 600 is showed in the form of universal computing device.The component of electronic equipment 600 can wrap Include but be not limited to:Above-mentioned at least one processing unit 610, above-mentioned at least one memory cell 620, connection different system component The bus 630 of (including memory cell 620 and processing unit 610).
Wherein, the memory cell is had program stored therein code, and described program code can be held by the processing unit 610 OK so that the processing unit 610 performs various according to the present invention described in above-mentioned " illustrative methods " part of this specification The step of illustrative embodiments.For example, the processing unit 610 can perform step S110 as shown in fig. 1:Obtain more Individual initial data feature simultaneously is cleaned to obtain multiple first object data characteristicses to each initial data feature;S120:Will Each first object data characteristics is merged to obtain multiple second target data features.
Memory cell 620 can include the computer-readable recording medium of volatile memory cell form, such as Random Access Storage Unit (RAM) 6201 and/or cache memory unit 6202, it can further include read-only memory unit (ROM) 6203.
Memory cell 620 can also include program/utility with one group of (at least one) program module 6205 6204, such program module 6205 includes but is not limited to:Operating system, one or more application program, other program moulds Block and routine data, the realization of network environment may be included in each or certain combination in these examples.
Bus 630 can be to represent the one or more in a few class bus structures, including memory cell bus or storage Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures Local bus.
Electronic equipment 600 can also be with one or more external equipments 700 (such as keyboard, sensing equipment, bluetooth equipment Deng) communication, the equipment communication interacted with the electronic equipment 600 can be also enabled a user to one or more, and/or with causing Any equipment that the electronic equipment 600 can be communicated with one or more of the other computing device (such as router, modulation /demodulation Device etc.) communication.This communication can be carried out by input/output (I/O) interface 650.Also, electronic equipment 600 can be with By network adapter 660 and one or more network (such as LAN (LAN), wide area network (WAN) and/or public network, Such as internet) communication.As illustrated, network adapter 660 is communicated by bus 630 with other modules of electronic equipment 600. It should be understood that although not shown in the drawings, can combine electronic equipment 600 does not use other hardware and/or software module, including but not It is limited to:Microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and Data backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can be realized by software, can also be realized by way of software combines necessary hardware.Therefore, according to the disclosure The technical scheme of embodiment can be embodied in the form of software product, the software product can be stored in one it is non-volatile Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are to cause a calculating Equipment (can be personal computer, server, terminal installation or network equipment etc.) is performed according to disclosure embodiment Method.
In an exemplary embodiment of the disclosure, a kind of computer-readable recording medium is additionally provided, is stored thereon with energy Enough realize the program product of this specification above method.In some possible embodiments, various aspects of the invention may be used also In the form of being embodied as a kind of program product, it includes program code, when described program product is run on the terminal device, institute State program code be used for make the terminal device perform described in above-mentioned " illustrative methods " part of this specification according to this hair The step of bright various illustrative embodiments.
With reference to shown in figure 7, the program product for being used to realize the above method according to the embodiment of the present invention is described 800, it can use portable compact disc read only memory (CD-ROM) and including program code, and can in terminal device, Such as run on PC.However, the program product not limited to this of the present invention, in this document, readable storage medium storing program for executing can be with Be it is any include or the tangible medium of storage program, the program can be commanded execution system, device either device use or It is in connection.
Described program product can use any combination of one or more computer-readable recording mediums.Computer-readable recording medium can be readable letter Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared ray or System, device or the device of semiconductor, or any combination above.The more specifically example of readable storage medium storing program for executing is (non exhaustive List) include:It is electrical connection, portable disc, hard disk, random access memory (RAM) with one or more wires, read-only Memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
Computer-readable signal media can be including the data-signal in a base band or as carrier wave part propagation, its In carry readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal, Optical signal or above-mentioned any appropriate combination.Readable signal medium can also be any readable Jie beyond readable storage medium storing program for executing Matter, the computer-readable recording medium can send, propagate either transmit for used by instruction execution system, device or device or and its The program of combined use.
The program code included on computer-readable recording medium can be transmitted with any appropriate medium, including but not limited to wirelessly, be had Line, optical cable, RF etc., or above-mentioned any appropriate combination.
Can being combined to write the program operated for performing the present invention with one or more programming languages Code, described program design language include object oriented program language-Java, C++ etc., include routine Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user Perform on computing device, partly perform on a user device, the software kit independent as one performs, is partly calculated in user Its upper side point is performed or performed completely in remote computing device or server on a remote computing.It is remote being related to In the situation of journey computing device, remote computing device can pass through the network of any kind, including LAN (LAN) or wide area network (WAN) user calculating equipment, is connected to, or, it may be connected to external computing device (such as utilize ISP To pass through Internet connection).
In addition, above-mentioned accompanying drawing is only the schematic theory of the processing included by method according to an exemplary embodiment of the present invention It is bright, rather than limitation purpose.It can be readily appreciated that the time that above-mentioned processing shown in the drawings was not intended that or limited these processing is suitable Sequence.In addition, being also easy to understand, these processing for example can be performed either synchronously or asynchronously in multiple modules.
Those skilled in the art will readily occur to the disclosure its after considering specification and putting into practice invention disclosed herein His embodiment.The application is intended to any modification, purposes or the adaptations of the disclosure, these modifications, purposes or Adaptations follow the general principle of the disclosure and including the undocumented common knowledge in the art of the disclosure or Conventional techniques.Description and embodiments are considered only as exemplary, and the true scope of the disclosure and spirit are by claim Point out.

Claims (14)

  1. A kind of 1. data characteristics building method, it is characterised in that including:
    Obtain multiple initial data features and each initial data feature is cleaned to obtain multiple first object data spies Sign;
    Each first object data characteristics is merged to obtain multiple second target data features.
  2. 2. data characteristics building method according to claim 1, it is characterised in that carried out to each initial data feature Cleaning, which obtains multiple first object data characteristicses, to be included:
    Off-note in each initial data feature is rejected to obtain multiple first object data characteristicses.
  3. 3. data characteristics building method according to claim 2, it is characterised in that in each initial data feature Characteristic value, which carries out rejecting, to be included:
    Judge whether the characteristic value of each initial data feature is less than the first preset value;
    When the characteristic value for judging each initial data feature is less than first preset value, reject former corresponding to this feature value Beginning data characteristics;And
    When the characteristic value for judging each initial data feature is not less than first preset value, each initial data is judged Whether the characteristic value of feature is more than the second preset value;
    When the characteristic value for judging each initial data feature is more than second preset value, reject former corresponding to this feature value Beginning data characteristics.
  4. 4. data characteristics building method according to claim 1, it is characterised in that by each first object data characteristics Being merged to obtain multiple second target data features includes:
    Build multiple characteristic indexs;
    Calculate each characteristic index of each first object data characteristics;
    Each characteristic index of each first object data characteristics is merged to obtain multiple second target datas Feature.
  5. 5. data characteristics building method according to claim 4, it is characterised in that the characteristic index include average, in It is multiple in digit, upper quartile, standard deviation and maximum.
  6. 6. according to the data characteristics building method described in claim any one of 1-5, it is characterised in that the initial data feature Including purchase feature, browse it is multiple in feature plus purchase feature, search characteristics and certificate feature.
  7. A kind of 7. data characteristics constructing apparatus, it is characterised in that including:
    Data characteristics cleaning module, for obtaining multiple initial data features and to each initial data feature clean To multiple first object data characteristicses;
    Data characteristics Fusion Module, for being merged to obtain multiple second target datas each first object data characteristics Feature.
  8. 8. data characteristics constructing apparatus according to claim 7, it is characterised in that carried out to each initial data feature Cleaning, which obtains multiple first object data characteristicses, to be included:
    Off-note in each initial data feature is rejected to obtain multiple first object data characteristicses.
  9. 9. data characteristics constructing apparatus according to claim 8, it is characterised in that in each initial data feature Characteristic value, which carries out rejecting, to be included:
    Judge whether the characteristic value of each initial data feature is less than the first preset value;
    When the characteristic value for judging each initial data feature is less than first preset value, reject former corresponding to this feature value Beginning data characteristics;And
    When the characteristic value for judging each initial data feature is not less than first preset value, each initial data is judged Whether the characteristic value of feature is more than the second preset value;
    When the characteristic value for judging each initial data feature is more than second preset value, reject former corresponding to this feature value Beginning data characteristics.
  10. 10. data characteristics constructing apparatus according to claim 7, it is characterised in that each first object data are special Sign, which is merged to obtain multiple second target data features, to be included:
    Build multiple characteristic indexs;
    Calculate each characteristic index of each first object data characteristics;
    Each characteristic index of each first object data characteristics is merged to obtain multiple second target datas Feature.
  11. 11. data characteristics constructing apparatus according to claim 10, it is characterised in that the characteristic index include average, It is multiple in median, upper quartile, standard deviation and maximum.
  12. 12. according to the data characteristics constructing apparatus described in claim any one of 7-11, it is characterised in that the initial data is special Sign includes buying feature, browses feature plus purchase multiple in feature, search characteristics and certificate feature.
  13. 13. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the computer program The data characteristics building method described in claim any one of 1-6 is realized when being executed by processor.
  14. 14. a kind of electronic equipment, it is characterised in that including:
    Processor;And
    Memory, for storing the executable instruction of the processor;
    Wherein, the processor is configured to come described in perform claim requirement any one of 1-6 via the execution executable instruction Data characteristics building method.
CN201710954269.7A 2017-10-13 2017-10-13 Data feature construction method and device, storage medium and electronic equipment Active CN107644102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710954269.7A CN107644102B (en) 2017-10-13 2017-10-13 Data feature construction method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710954269.7A CN107644102B (en) 2017-10-13 2017-10-13 Data feature construction method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN107644102A true CN107644102A (en) 2018-01-30
CN107644102B CN107644102B (en) 2020-11-03

Family

ID=61123521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710954269.7A Active CN107644102B (en) 2017-10-13 2017-10-13 Data feature construction method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN107644102B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020155788A1 (en) * 2019-01-30 2020-08-06 阿里巴巴集团控股有限公司 Data determination method, apparatus and device, and medium
CN112199374A (en) * 2020-09-29 2021-01-08 中国平安人寿保险股份有限公司 Data feature mining method aiming at data missing and related equipment thereof
CN113315721A (en) * 2021-05-26 2021-08-27 恒安嘉新(北京)科技股份公司 Network data feature processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101702262A (en) * 2009-11-06 2010-05-05 北京交通大学 Data syncretizing method for urban traffic circulation indexes
US20120215735A1 (en) * 2011-02-18 2012-08-23 Larus Technologies Corporation System and Method for Data Fusion with Adaptive Learning
CN105512687A (en) * 2015-12-15 2016-04-20 北京锐安科技有限公司 Emotion classification model training and textual emotion polarity analysis method and system
CN105930942A (en) * 2016-06-03 2016-09-07 北京理工大学 Intelligent system for predicting energy technologies under big data background
CN106777274A (en) * 2016-06-16 2017-05-31 北京理工大学 A kind of Chinese tour field knowledge mapping construction method and system
CN106846805A (en) * 2017-03-06 2017-06-13 南京多伦科技股份有限公司 A kind of dynamic road grid traffic needing forecasting method and its system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101702262A (en) * 2009-11-06 2010-05-05 北京交通大学 Data syncretizing method for urban traffic circulation indexes
US20120215735A1 (en) * 2011-02-18 2012-08-23 Larus Technologies Corporation System and Method for Data Fusion with Adaptive Learning
CN105512687A (en) * 2015-12-15 2016-04-20 北京锐安科技有限公司 Emotion classification model training and textual emotion polarity analysis method and system
CN105930942A (en) * 2016-06-03 2016-09-07 北京理工大学 Intelligent system for predicting energy technologies under big data background
CN106777274A (en) * 2016-06-16 2017-05-31 北京理工大学 A kind of Chinese tour field knowledge mapping construction method and system
CN106846805A (en) * 2017-03-06 2017-06-13 南京多伦科技股份有限公司 A kind of dynamic road grid traffic needing forecasting method and its system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020155788A1 (en) * 2019-01-30 2020-08-06 阿里巴巴集团控股有限公司 Data determination method, apparatus and device, and medium
CN112199374A (en) * 2020-09-29 2021-01-08 中国平安人寿保险股份有限公司 Data feature mining method aiming at data missing and related equipment thereof
CN112199374B (en) * 2020-09-29 2023-12-05 中国平安人寿保险股份有限公司 Data feature mining method for data missing and related equipment thereof
CN113315721A (en) * 2021-05-26 2021-08-27 恒安嘉新(北京)科技股份公司 Network data feature processing method, device, equipment and storage medium
CN113315721B (en) * 2021-05-26 2023-01-17 恒安嘉新(北京)科技股份公司 Network data feature processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN107644102B (en) 2020-11-03

Similar Documents

Publication Publication Date Title
CN104346372B (en) Method and apparatus for assessment prediction model
CN109360012A (en) The selection method and device, storage medium, electronic equipment of advertisement dispensing channel
CN108804704A (en) A kind of user's depth portrait method and device
CN112270546A (en) Risk prediction method and device based on stacking algorithm and electronic equipment
WO2022083093A1 (en) Probability calculation method and apparatus in graph, computer device and storage medium
CN107644102A (en) Data characteristics building method and device, storage medium, electronic equipment
CN111325619A (en) Credit card fraud detection model updating method and device based on joint learning
CN110363636A (en) Risk of fraud recognition methods and device based on relational network
CN107392801A (en) The method and its device, storage medium, electronic equipment of order are upset in control
CN110689135B (en) Anti-money laundering model training method and device and electronic equipment
EP2947610A1 (en) Business problem networking system and tool
CN107609958A (en) Behavioral guidance strategy determines method and device, storage medium and electronic equipment
CN110348208A (en) A kind of risk control method based on user behavior and neural network, device and electronic equipment
CN113609345B (en) Target object association method and device, computing equipment and storage medium
CN113420212A (en) Deep feature learning-based recommendation method, device, equipment and storage medium
CN110363654A (en) A kind of favor information method for pushing, device and electronic equipment
CN112380104A (en) User attribute identification method and device, electronic equipment and storage medium
CN110362825A (en) A kind of text based finance data abstracting method, device and electronic equipment
CN111125445B (en) Community theme generation method and device, electronic equipment and storage medium
CN106779929A (en) A kind of Products Show method, device and computing device
CN110348190B (en) User equipment attribution judging method and device based on user operation behaviors
CN110413632A (en) Method, apparatus, computer-readable medium and the electronic equipment of controlled state
CN104679492A (en) Computer-implemented technical support providing device and method
CN107590012A (en) Equipment goes offline analysis of causes method and device, storage medium, electronic equipment
CN110717101B (en) User classification method and device based on application behaviors and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant