CN108921693A - A kind of deriving method of data, device and equipment - Google Patents

A kind of deriving method of data, device and equipment Download PDF

Info

Publication number
CN108921693A
CN108921693A CN201810630668.2A CN201810630668A CN108921693A CN 108921693 A CN108921693 A CN 108921693A CN 201810630668 A CN201810630668 A CN 201810630668A CN 108921693 A CN108921693 A CN 108921693A
Authority
CN
China
Prior art keywords
data
index
base values
target data
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810630668.2A
Other languages
Chinese (zh)
Other versions
CN108921693B (en
Inventor
宋博文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810630668.2A priority Critical patent/CN108921693B/en
Publication of CN108921693A publication Critical patent/CN108921693A/en
Application granted granted Critical
Publication of CN108921693B publication Critical patent/CN108921693B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Educational Administration (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Storage Device Security (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This specification embodiment discloses the deriving method, device and equipment of a kind of data, the method includes:Gridding processing is carried out to target data, generate the one or more grids for covering the target data, then, dimensional information can be generated according to the information for including in each grid, it include single dimension information and various dimensions information in the dimensional information, according to the dimensional information, and the period in each grid including and event type, determine the corresponding base values of the target data, finally, derivation process can be carried out to the base values by index derivative algorithm, obtain the corresponding derivative index of the target data.

Description

A kind of deriving method of data, device and equipment
Technical field
This specification is related to field of computer technology more particularly to a kind of deriving method of data, device and equipment.
Background technique
More prevalent with network skill and terminal technology, risk present in network trading is also more and more, although net There are risk prevention system rules in the operation systems such as road transaction, and still, there is no therefore reduce, operation system for network trading risk In risk prevention system still suffer from huge challenge, and how to obtain more accurate risk prevention system rule become continue solve Major issue.
The determination of risk prevention system rule, which depends on, generates training data or spy used in above-mentioned risk prevention system rule Sign, these features are properly termed as index.Under different scenes, the selection of the corresponding index of data is had a certain difference, for example, In the domestic transaction of this country, need to select case rate as an index, and in International Transaction, it will usually select reject rate or Person's failure rate is as index etc., moreover, the current above-mentioned main form using rate variable, however, passing through rate variable The index that determines of form rely primarily on the experience of technical staff and obtain, therefore, the selection of the corresponding index of data have compared with Big limitation thus needs to provide a kind of faster obtain so that the accuracy of the generation of risk prevention system rule is poor Access obtains the scheme of more multi objective according to corresponding index.
Summary of the invention
The purpose of this specification embodiment is to provide the deriving method, device and equipment of a kind of data, to provide one kind more The corresponding index of fast acquiring data, and obtain the scheme of more multi objective.
In order to solve the above technical problems, what this specification embodiment was realized in:
A kind of deriving method for data that this specification embodiment provides, the method includes:
Gridding processing is carried out to target data, generates the one or more grids for covering the target data;
Dimensional information is generated according to the information for including in each grid, includes single dimension information in the dimensional information and more Dimensional information;
According to the period and event type for including in the dimensional information and each grid, the number of targets is determined According to corresponding base values;
Derivation process is carried out to the base values by index derivative algorithm, obtains the corresponding derivative of the target data Index.
Optionally, the index derivative algorithm includes genetic algorithm, Random Walk Algorithm and violence derivative algorithm.
Optionally, described that derivation process is carried out to the base values by index derivative algorithm, obtain the number of targets According to corresponding derivative index, including:
Judge whether the base values meets the corresponding first selection condition of the index derivative algorithm;
If conditions are not met, then carrying out derivation process at least once to the base values, and executing at derivative every time Judge whether the index that the derivation process obtains meets the first selection condition after reason, until judging the derivation process The number that obtained index meets the derivation process of the first selection condition or execution reaches pre-determined number threshold value;It will most The index that a derivation process obtains afterwards is determined as the corresponding derivative index of the target data.
Optionally, described to judge whether the base values meets the index derivative algorithm corresponding first and choose item Part, including:
Coded treatment is carried out to the base values, obtains basic data;
Fitness calculating is carried out to the basic data, obtains the first data;
Judge whether first data meet the first selection condition;
If first data meet the first selection condition, the base values meets described first and chooses item Part;
If first data are unsatisfactory for the first selection condition, the base values is unsatisfactory for first choosing Take condition;
It is described that derivation process is carried out to the base values, including:
The basic data corresponding to the base values carries out cross processing and/or variation processing.
Optionally, the fitness obtains after being polymerize by multiple desired indicators.
Optionally, the multiple desired indicator includes area AUC, information content IV and the embeding layer under sensitivity ROC curve At least two in Embedding.
Optionally, the event type includes payment class and operation class, and the event of the payment class includes transaction event, turns Account event and barcode scanning event, the event of the operation class include log-in events, Modify password event, registered events and browsing thing Part.
Optionally, described that gridding processing is carried out to target data, generate the one or more for covering the target data Grid, including:
In a manner of random division, gridding processing is carried out to target data, generates one for covering the target data Or multiple grids.
A kind of deriving device for data that this specification embodiment provides, described device include:
Gridding module, for carrying out gridding processing to target data, generate one for covering the target data or Multiple grids;
Dimension generation module, for generating dimensional information according to the information for including in each grid, in the dimensional information Including single dimension information and various dimensions information;
Base values determining module, for according to the period and thing for including in the dimensional information and each grid Part type determines the corresponding base values of the target data;
Index derives module, for carrying out derivation process to the base values by index derivative algorithm, obtains described The corresponding derivative index of target data.
Optionally, the index derivative algorithm includes genetic algorithm, Random Walk Algorithm and violence derivative algorithm.
Optionally, the derivative module of the index, including:
Judging unit chooses item for judging whether the base values meets the index derivative algorithm corresponding first Part;
Index derived units, for if conditions are not met, then carry out derivation process at least once to the base values, and Judge whether the index that the derivation process obtains meets the first selection condition after executing derivation process every time, until sentencing The number for the derivation process that the index that the derivation process obtains disconnected out meets the first selection condition or execution reaches Pre-determined number threshold value;The index that last time derivation process obtains is determined as the corresponding derivative index of the target data.
Optionally, the judging unit, is used for:
Coded treatment is carried out to the base values, obtains basic data;
Fitness calculating is carried out to the basic data, obtains the first data;
Judge whether first data meet the first selection condition;
If first data meet the first selection condition, the base values meets described first and chooses item Part;
If first data are unsatisfactory for the first selection condition, the base values is unsatisfactory for first choosing Take condition;
The index derived units, for the basic data corresponding to the base values carry out cross processing and/ Or variation processing.
Optionally, the fitness obtains after being polymerize by multiple desired indicators.
Optionally, the multiple desired indicator includes area AUC, information content IV and the embeding layer under sensitivity ROC curve At least two in Embedding.
Optionally, the event type includes payment class and operation class, and the event of the payment class includes transaction event, turns Account event and barcode scanning event, the event of the operation class include log-in events, Modify password event, registered events and browsing thing Part.
Optionally, the gridding module, for carrying out gridding processing to target data in a manner of random division, Generate the one or more grids for covering the target data.
A kind of derivative equipment for data that this specification embodiment provides, the derivative equipment of the data include:
Processor;And
It is arranged to the memory of storage computer executable instructions, the executable instruction makes the place when executed Manage device:
Gridding processing is carried out to target data, generates the one or more grids for covering the target data;
Dimensional information is generated according to the information for including in each grid, includes single dimension information in the dimensional information and more Dimensional information;
According to the period and event type for including in the dimensional information and each grid, the number of targets is determined According to corresponding base values;
Derivation process is carried out to the base values by index derivative algorithm, obtains the corresponding derivative of the target data Index.
By above this specification embodiment provide technical solution as it can be seen that this specification embodiment by target data into Row gridding processing, generates one or more grids of coverage goal data, it is then possible to according to the letter for including in each grid Breath generates single dimension information and various dimensions information, can be respectively according to single dimension information and various dimensions information and each grid In include period and event type, determine the corresponding base values of target data, finally, index derivative algorithm can be passed through Derivation process is carried out to base values, the corresponding derivative index of target data is obtained, in this way, automatic by discrete target data Grid division, and obtained grid is listed, meanwhile, single dimension information and various dimensions information, and then structure are generated according to grid Base values is built, due to not only being determined by single dimension information in base values, can also be determined by various dimensions information, thus So that the content that base values includes is more extensive, data basis is provided for subsequent processing, in addition, real by index derivative algorithm Now to the polymerization of element in base values, and then the corresponding derivative index of target data is generated, target data can be made corresponding Index obtain certain expansion, and then the corresponding feature of the index also can be more fine and comprehensively, to improve subsequent processing Accuracy, for example, the risk prevention system rule that can make is more accurate etc..
Detailed description of the invention
In order to illustrate more clearly of this specification embodiment or technical solution in the prior art, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only The some embodiments recorded in this specification, for those of ordinary skill in the art, in not making the creative labor property Under the premise of, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of deriving method embodiment of data of this specification;
Fig. 2 is the deriving method embodiment of this specification another kind data;
Fig. 3 is the deriving method embodiment of another data of this specification;
Fig. 4 is a kind of deriving device embodiment of data of this specification;
Fig. 5 is a kind of derivative apparatus embodiments of data of this specification.
Specific embodiment
This specification embodiment provides the deriving method, device and equipment of a kind of data.
In order to make those skilled in the art more fully understand the technical solution in this specification, below in conjunction with this explanation Attached drawing in book embodiment is clearly and completely described the technical solution in this specification embodiment, it is clear that described Embodiment be only this specification a part of the embodiment, instead of all the embodiments.The embodiment of base in this manual, Every other embodiment obtained by those of ordinary skill in the art without making creative efforts, all should belong to The range of this specification protection.
Embodiment one
As shown in Figure 1, this specification embodiment provides a kind of deriving method of data, the executing subject of this method can be Terminal device or server, wherein the terminal device can such as personal computer equipment, can also be such as mobile phone, tablet computer Equal mobile terminal devices, the terminal device can be the terminal device that user uses.The server can be independent server, It is also possible to the server cluster being made of multiple servers, moreover, the server can be the background server of a certain business, It is also possible to the background server etc. of certain website (such as websites or payment application).This method can be used for based on base Plinth data derive in the processing such as more related datas, and in practical applications, this method can be applied under several scenes, example Such as in air control system, during fraud prevention and control rule is usurped in creation, it can be obtained by this specification embodiment more Training data, alternatively, also can be applied to risk find out it is medium.
In order to improve the treatment effeciency of data, it is illustrated so that executing subject is server as an example in the present embodiment, for The case where terminal device, can be handled according to following related contents, and details are not described herein.This method can specifically include following step Suddenly:
In step s 102, gridding processing is carried out to target data, generates one or more nets of coverage goal data Lattice.
Wherein, target data can be arbitrary data, for example, during shopping at network, the network trading of progress Related data;The related data mutually transferred accounts between different user by network payment application program;User logs in or note The related data etc. of certain website of volume, target data may include above-mentioned a type of data, such as include the correlation of network trading Data or user log in or register the related data etc. of certain website, and target data can also include above-mentioned a plurality of types of data, Related data such as including network trading and the related data transferred accounts, in practical applications, target data may include a plurality of Data can be a plurality of data collected in scheduled duration, for example, certain shopping website is produced in 30 days before current time Data, wherein may include the network trading generated in 30 days before current time related data and user log in Or register the related data etc. of the shopping website.Gridding processing, which can be, is split target data, by target data point The processing of a kind of grid type for being segmented into multiple portions and being formed can pass through a variety of partitioning schemes during gridding processing It realizes, such as average segmentation, alternatively, being split by scheduled segmentation rule, the geographic region specifically such as generated by data Domain (geographic range in such as city or setting) is split, alternatively, by the relevant information of different users or user (such as year Age or gender etc.) it is split, this specification embodiment does not limit this.Grid, which can be, is divided target data It cuts, after obtaining multiple partitioning portions, each partitioning portion can be a grid, can be by one or more number of targets in grid According to composition.
In an implementation, target data can be obtained through a variety of ways, for example, to obtain certain shopping website in certain time length For interior data are as target data, progress of the user in the shopping website can be recorded with the consent of the user All operations and behavior, wherein may include register the shopping website, log in the shopping website, selected in the shopping website Related data during taking commodity and place an order, pay etc., can be determined as mesh for the related data of the operation of record or behavior Data are marked, alternatively, the related data needed can also be bought to user, or guide its needs of user's typing by way of reward Related data etc., server can be using the data of above-mentioned acquisition as target data.
It is to a kind of efficient way to manage and processing mode of data to the griddings of data processing, grid variable is each Domain, which has been obtained, is widely applied and achieves extraordinary effect, such as in risk prevention system system, grid variable both may be used To be applied to (portray in some grid in the past period in the form of risk as called to usurping fraud and carry out real-time prevention and control FTG (Fraud To Gross, cheat total amount) variable carries out strong, the weak control cheated risk), risk can also be used for Find out that (unusual fluctuation for such as monitoring a certain achievement data in certain grid carrys out Perceived Risk trend, answers timely to carry out risk rapidly It is right) etc..
However, the division of mesh dimension and granularity in gridding processing is often different under different scenes, this It is also a difficult point during grid structure's variable, for example, for the wallet system based on the transaction between account, Seller's account can be used as a main mesh dimension, and in the electric business system based on card transaction, dependence seller (or sell Family's account) the grid variable portrayed of this mesh dimension, discrimination is lower.Meanwhile in current practical application, often deposit It generates cumbersome in grid and selectively has an inclined problem, the generating process of grid generally requires technical staff and passes through artificial hand It is dynamic to input SQL (Structured Query Language, structured query language) code to realize, moreover, in cross-dimension It is usually associated with a large amount of repeated work during (i.e. 2 dimension) and extension dimension (3 dimension or more high-dimensional), therefore, currently Grid generation often use precipitated experienced risk classifications on corresponding service, and for unknown or exist Potential risk, and current unknown dimension is often ignored, without will do it analysis.For this purpose, this specification embodiment A kind of achievable scheme is provided, can specifically include the following contents:
After getting target data through the above way, gridding processing can be carried out to target data, for example, can root According to the total amount of data of target data, target data is averagely divided into multiple portions, the target data of each part can be with structure At a grid, specifically such as, the total amount of data of target data is 2GB, target data can be equally divided into 100 parts, often The total amount of data of a part is 20.48MB, i.e., includes the data etc. of 20.48MB in each grid.Alternatively, be also possible to it is random right Target data is divided, and corresponding multiple grids are obtained.Wherein, it should be noted that target data is carried out at gridding Reason realizations can be controlled by pre-set code script, alternatively, can also it is other can execute automatically by way of reality Existing, this specification embodiment does not limit this.In addition, the one or more grids handled by above-mentioned gridding, institute There are the data in grid to combine and constitutes target data.
In step S104, dimensional information is generated according to the information for including in each grid, includes single in the dimensional information Dimensional information and various dimensions information.
Wherein, single dimension information can be the information in terms of some for including in data, as the age of user, gender, Locating region, the dimensional informations such as time, position in transaction, specifically such as, user are carrying out shopping at network, and carry out with trade company During transaction, exchange hour, the means of payment, the information such as commodity classification of transaction can be generated, and exchange hour therein, branch The mode of paying, transaction commodity classification in any one information may be constructed single dimension information, i.e. time dimension, means of payment dimension Degree, classification dimension etc..Various dimensions information can be the information having polymerize in terms of two or more for including in data, example Such as, single dimension information includes time dimension, means of payment dimension, classification dimension, then when corresponding various dimensions information may include M- means of payment dimension, the means of payment-classification dimension, when m- classification dimension and when m- payment-classification dimension etc..
In an implementation, after carrying out gridding processing to target data, at least one number can be included in obtained each grid According to, it will include the relevant information of operation and behavior of the user in corresponding website or a certain business in every data, it can be to net Each data in lattice is analyzed, in terms of therefrom can extracting or splitting out the information of multiple and different aspects, such as time Or in terms of classification etc., the feature that can include based on the information of each aspect determines the corresponding dimensional information of each aspect, it can be with Generate the dimensional information.For example, including transaction event in certain grid, the Transaction Information for including may include BIN in transaction event (Bank Identification Number, bank identifier code) code (preceding 6 bit digital of the card number of such as bank card), classification and Exchange hour, the dimensional information if necessary to generate include single dimension information and double dimensional informations, then BIN code, class can be generated Mesh, period, BIN code-period, classification-period, BIN code -6 kinds of classification dimensional information, wherein BIN code, classification, time Duan Weisan single dimension information, BIN code-period, classification-period, BIN code-classification are three double dimensional informations.
It should be noted that this specification embodiment is not limited in gridding information when generating dimensional information The single dimension information that can directly extract, it is also necessary to which one or more double dimension letters are further derived based on single dimension information Breath, three dimensionality information, even higher dimensional information etc. are user from more in this way, dimensional information can be made more extensive More dimensions solves the problems, such as corresponding provide the foundation.
In step s 106, according to the period and event type for including in above-mentioned dimensional information and each grid, really The corresponding base values of the data that set the goal.
Wherein, the period for including in grid is referred to as the time window of grid, which can be in grid The period constituted from the earliest time to the time the latest for including, for example, the data for including in grid are from current time 30 days data before, then the period is 30 days before current time.Event type may include it is a variety of, specifically It can be set according to the actual situation, such as log in class, registration class, class of trading and class of transferring accounts etc., this specification embodiment pair This is without limitation.Base values, which can be, summarizes a kind of letter obtained by calculating process such as direct or simple statistical analysis Breath, such as failure rate or case rate.
It should be noted that the selection of the corresponding index of data has a certain difference under different scenes, for example, at this In the domestic transaction of state, need to select case rate as an index, and in International Transaction, due to being returned the period by case report Etc. reasons influence, therefore reselection case rate has been unable to satisfy actual demand as an index, in this case, would generally select Reject rate or failure rate are selected as index etc..Moreover, the correlated variables (i.e. grid variable) of current grid mainly uses The form of rate variable, the case rate before as noted above current time in predetermined amount of time, alternatively, the use of new user Rate etc..
In an implementation, single dimension information and various dimensions information, Ke Yifen can be generated in the processing of S104 through the above steps Not Tong Guo single dimension information and various dimensions information, and time of grid where combining corresponding single dimension information or various dimensions information The corresponding event type of corresponding data is for statistical analysis in window and above-mentioned grid, determines corresponding index, and can incite somebody to action It can add up each base values as basic index, for example, by statistical analysis can by the amount of money, case load, check thing Part amount, volume of event as basic index, can with accumulating sum, case load, check the base values such as volume of event, volume of event, in reality It, can also be by accumulative base values formation base index pond in another embodiment of this specification in the application of border.
For example, the example based on above-mentioned steps S104, single dimension information includes BIN code, classification and period, double dimension letters Breath includes BIN code-period, classification-period and BIN code-classification, can be first BIN code according to single dimension information, and combine The corresponding time window of grid comprising BIN code, and the event type for including in the grid comprising BIN code carry out comprehensive point Analysis, can determine such as amount of money, case load, case rate, volume of event base values, and may be constructed base values pond.
In step S108, derivation process is carried out to above-mentioned base values by index derivative algorithm, obtains target data Corresponding derivative index.
Wherein, index derivative algorithm can be based on base values, by carrying out certain specific mode to base values Processing, thus develop out different from base values New Set rule or algorithm, index derivative algorithm may include it is a variety of, it is right It can be used as in that can reach in practical applications by being developed the rule for obtaining New Set or algorithm to base values Index derivative algorithm, specific such as genetic algorithm.Derivative index can be based on newly referring to obtained from the differentiation to base values Mark.
In an implementation, it is based on aforementioned related content, the correlated variables (i.e. grid variable) of current grid mainly uses ratio The form of variable, the case rate before as noted above current time in predetermined amount of time, alternatively, the utilization rate of new user Deng, as long as aforementioned proportion variable relies on the experience of technical staff, therefore, limitation is larger, in order to expand grid variable, so that There can be more selections in subsequent processing, New Set can be derived based on base values, specifically, can existed in advance Setting needs index derivative algorithm to be used in server, and in practical applications, index derivative algorithm can provide multiple choices, Different index derivative algorithms can be selected according to actual needs, such as in order to accelerate derivation process efficiency, can select processing The index derivative algorithm of logic relative simplicity, and obtain preferably deriving effect if necessary, treatment process can be selected related Fine index derivative algorithm etc..
When the processing of S106 through the above steps obtains the corresponding base values of target data, server can transfer original The index derivative algorithm for being first arranged or choosing derives base values by the processing logical process of index derivative algorithm Processing, obtains corresponding New Set, can assess the New Set derived, determine whether New Set meets demand, with And whether continue derivative New Set etc., the New Set for not meeting demand can carry out rejecting processing, continue to spread out if necessary Raw New Set, then can continue to derive New Set, until the New Set derived passes through upper commentary by above-mentioned processing logic Estimate, it finally, can be using the New Set by assessment as the corresponding derivative index of target data.
After obtaining the corresponding derivative index of target data, derivative index and base values can be combined, so as to To obtain the corresponding index of target data, so that the corresponding index of target data obtains certain expansion.Obtained target data After corresponding index, further subsequent relevant treatment can be carried out based on the index, for example, can determine phase based on the index The feature answered, can be based on determining feature-modeling or update risk prevention system rule etc., since the corresponding index of target data obtains To certain expansion, the corresponding feature of the index also can be more fine and comprehensive, so that obtained risk prevention system rule is more It is accurate to add.
This specification embodiment provides a kind of deriving method of data, raw by carrying out gridding processing to target data At one or more grids of coverage goal data, it is then possible to generate single dimension letter according to the information for including in each grid Breath and various dimensions information, can be respectively according to the period for including in single dimension information and various dimensions information and each grid And event type, it determines the corresponding base values of target data, finally, base values can be carried out by index derivative algorithm Derivation process obtains the corresponding derivative index of target data, in this way, by the discrete automatic grid division of target data, and will Obtained grid is listed, meanwhile, single dimension information and various dimensions information are generated according to grid, and then construct base values, by It is not only determined, can also be determined by various dimensions information, so that base values by single dimension information in base values The content for including is more extensive, provides data basis for subsequent processing, in addition, realizing by index derivative algorithm to base values The polymerization of middle element, and then the corresponding derivative index of target data is generated, the corresponding index of target data can be made to obtain one Fixed expansion, and then the corresponding feature of the index also can be more fine and comprehensive, to improve the accuracy of subsequent processing, example Such as, the risk prevention system rule that can make is more accurate etc..
Embodiment two
As shown in Fig. 2, this specification embodiment provides a kind of deriving method of data, the executing subject of this method can be Terminal device or server, wherein the terminal device can such as personal computer equipment, can also be such as mobile phone, tablet computer Equal mobile terminal devices, the terminal device can be the terminal device that user uses.The server can be independent server, It is also possible to the server cluster being made of multiple servers, moreover, the server can be the background server of a certain business, It is also possible to the background server etc. of certain website (such as websites or payment application).This method can be used for based on base Plinth data derive in the processing such as more related datas, and in practical applications, this method can be applied under several scenes, example Such as in air control system, during fraud prevention and control rule is usurped in creation, it can be obtained by this specification embodiment more Training data or feature, alternatively, also can be applied to risk find out it is medium.
In order to improve the treatment effeciency of data, it is illustrated so that executing subject is server as an example in the present embodiment, for The case where terminal device, can be handled according to following related contents, and details are not described herein.This method can specifically include following step Suddenly:
In step S202, in a manner of random division, gridding processing is carried out to target data, generates coverage goal number According to one or more grids.
In an implementation, target data can be obtained through a variety of ways, which can be the history of a certain business Data etc. specifically may refer to a kind of related content of step S102 of above-described embodiment, and details are not described herein.Get number of targets It, can be automatic right with code script pre-set in execute server, so that server is in a manner of random division according to rear Target data carries out gridding processing, target data is split, each section data being partitioned into may be constructed one Grid, through the above way one or more grids of available coverage goal data may include target in each grid Partial data in data.
In step S204, dimensional information is generated according to the information for including in each grid, includes single in the dimensional information Dimensional information and various dimensions information.
In an implementation, the concrete processing procedure of above-mentioned steps S204 may refer to the phase of step S104 in above-described embodiment one Hold inside the Pass, in practical applications, for certain dimensional informations, the relevant information in grid including scheduled field can be carried out pre- Fixed processing, may thereby determine that out the corresponding Versatile content of the dimensional information or exact content etc., for example, in dimensional information The period dimension for including, when due to often including transaction for a transaction event, in the information of the transaction event Between, predetermined process can be carried out to time field common in the events such as transaction event, so as to the period traded in grid Rationally divided, determines meet demand or general period.For example, 0 o'clock of American time~6 o'clock is the ebb of transaction Phase, but the period is the account stolen high-risk period, at this point, the period of the corresponding country of this country is 12 o'clock~18 O'clock.At this point it is possible to which with its, corresponding data set association is determined according to time field, 12 o'clock~18 o'clock can be divided into together One section.Meanwhile during determining grid, dimensional information can be taken according to the event magnitude that grid covers House, in this way, can will be improper or do not meet the dimensional information of demand and filter out, and only retain dimension that is suitable or meeting demand Spend information.
In step S206, according to the period and event type for including in above-mentioned dimensional information and each grid, really The corresponding base values of the data that set the goal.
Wherein, event type may include payment class and operation class, and the event of payment class therein may include transaction thing Part, event of transferring accounts and barcode scanning event etc., the event of operation class therein may include log-in events, Modify password event, registration Event and browsing event etc..
The step content of above-mentioned steps S206 is identical as the step content of step S106 in above-described embodiment one, step S206 Concrete processing procedure may refer to the related content of step S106 in above-described embodiment one, details are not described herein.
In step S208, judge whether above-mentioned base values meets the corresponding first selection condition of index derivative algorithm.
Wherein, index derivative algorithm may include genetic algorithm, Random Walk Algorithm and violence derivative algorithm.First chooses Condition may include a variety of, such as the number of iterations condition and/or data volume meet predetermined amount of data threshold value etc., specifically can basis The index derivative algorithm of selection determines that this specification embodiment does not limit this.
In an implementation, it in order to expand grid variable, so that can have more selections in subsequent processing, can be based on Base values derives New Set, and specifically, can be arranged in the server in advance needs index derivative algorithm to be used.Due to Numerous New Set can be derived by index derivative algorithm, and in practical applications, it only may need to derive a fixed number Therefore, in index derivative algorithm the New Set of amount may include the selection condition for choosing a certain number of New Sets (i.e. first chooses condition).First base values can be matched with the first selection condition, if base values meets first Selection condition, then may indicate that for base values, does not need to carry out derivation process, can directly be held by base values The subsequent processing of row.If base values is unsatisfactory for the first selection condition, it may indicate that current base values is not able to satisfy Subsequent processing requirement needs to carry out derivation process, at this point it is possible to execute the processing of following step S210.
In step S210, if conditions are not met, then carrying out derivation process at least once to base values, and holding every time Judge whether the index that derivation process obtains meets the first selection condition after row derivation process, until judging that derivation process obtains The number of the index derivation process that meets the first selection condition or execution reach pre-determined number threshold value.
Wherein, pre-determined number threshold value may be set according to actual conditions, specific such as 100 times or 200 inferior, this specification realities Example is applied not limit this.
It in an implementation, can be using base values as starting for example, by taking index derivative algorithm is Random Walk Algorithm as an example The step-length of each iteration can be set in iteration point, and the numerical value of control precision (can be a very small positive number, be used for Control terminates algorithm), and set the number of iterations (condition can be chosen for first), due to being iterated for the first time, the number of iterations Obviously it will not meet the number of iterations of setting, be worked as it is then possible to increase a step-length by the iteration point of preceding an iteration The iteration point of preceding iteration, the excellent degree for calculating the two can choose the iteration if the iteration point of current iteration is more excellent Then point is further continued for successively obtaining new iteration point by the above process, until reaching the number of iterations of setting.
By taking index derivative algorithm is violence derivative algorithm as an example, violence derivative algorithm may include a variety of, this specification reality It applies example and a kind of optional implementation is only provided, specifically, derivation process number etc. can be preset as first and choose item Part, it is then possible to any two index for including in base values is overlapped or is accordingly calculated a New Set, or Three or more any indexs in base values can also be overlapped or be accordingly calculated a New Set by person, Until reaching the derivation process number of setting.
In step S212, the index that last time derivation process obtains is determined as the corresponding derivative of target data and is referred to Mark.
It should be noted that may indicate that for base values if base values, which meets first, chooses condition, It does not need to carry out derivation process, subsequent processing directly can be performed by base values.
This specification embodiment provides a kind of deriving method of data, raw by carrying out gridding processing to target data At one or more grids of coverage goal data, it is then possible to generate single dimension letter according to the information for including in each grid Breath and various dimensions information, can be respectively according to the period for including in single dimension information and various dimensions information and each grid And event type, it determines the corresponding base values of target data, finally, base values can be carried out by index derivative algorithm Derivation process obtains the corresponding derivative index of target data, in this way, by the discrete automatic grid division of target data, and will Obtained grid is listed, meanwhile, single dimension information and various dimensions information are generated according to grid, and then construct base values, by It is not only determined, can also be determined by various dimensions information, so that base values by single dimension information in base values The content for including is more extensive, provides data basis for subsequent processing, in addition, realizing by index derivative algorithm to base values The polymerization of middle element, and then the corresponding derivative index of target data is generated, the corresponding index of target data can be made to obtain one Fixed expansion, and then the corresponding feature of the index also can be more fine and comprehensive, to improve the accuracy of subsequent processing, example Such as, the risk prevention system rule that can make is more accurate etc..
Embodiment three
As shown in figure 3, this specification embodiment provides a kind of deriving method of data, the executing subject of this method can be Terminal device or server, wherein the terminal device can such as personal computer equipment, can also be such as mobile phone, tablet computer Equal mobile terminal devices, the terminal device can be the terminal device that user uses.The server can be independent server, It is also possible to the server cluster being made of multiple servers, moreover, the server can be the background server of a certain business, It is also possible to the background server etc. of certain website (such as websites or payment application).This method can be used for based on base Plinth data derive in the processing such as more related datas, and in practical applications, this method can be applied under several scenes, example Such as in air control system, during fraud prevention and control rule is usurped in creation, it can be obtained by this specification embodiment more Training data, alternatively, also can be applied to risk find out it is medium.
In order to improve the treatment effeciency of data, it is illustrated so that executing subject is server as an example in the present embodiment, for The case where terminal device, can be handled according to following related contents, and details are not described herein.In order to be based in this specification embodiment Base values chooses the index (including derivative index) to optimization, can be realized by genetic algorithm.Genetic algorithm It is the calculation method of the biological evolution process for the natural selection and genetic mechanisms for simulating theory of biological evolution, is that one kind passes through simulation The method that natural evolution process expands population and searches for optimal solution.Genetic algorithm is from the one of the possible potential disaggregation of the problem that represents What a population started, and a population is then by individual (i.e. base values) group of the certain amount by coding (or gene coding) At.Each individual is actually the entity that chromosome has feature.Main carriers of the chromosome as inhereditary material, i.e., multiple bases The set of cause, it determines the external presentation of the shape of individual.It, can be according to suitable after initial population (i.e. base values) generates The principle of person existence and the survival of the fittest, develops by generation, in every generation, selects individual according to fitness individual in Problem Areas, and It is combined intersection and/or variation by means of the genetic operator of natural genetics, is produced and is represented the population of new disaggregation and (spread out Raw index), the process can cause rear life of the population as natural evolution for population (i.e. derivative index) more than former generation population Add and is adapted to environment, and last reign of a dynasty population (i.e. derivative index) can be used as final derivative index by decoding.
This method can specifically include following steps:
In step s 302, in a manner of random division, gridding processing is carried out to target data, generates coverage goal number According to one or more grids.
In step s 304, dimensional information is generated according to the information for including in each grid, includes single in the dimensional information Dimensional information and various dimensions information.
In step S306, according to the period and event type for including in above-mentioned dimensional information and each grid, really The corresponding base values of the data that set the goal.
Wherein, event type may include payment class and operation class, and the event of the payment class may include transaction event, turn Account event and barcode scanning event etc., the event of the operation class may include log-in events, Modify password event, registered events and browsing Event etc..
In step S308, coded treatment is carried out to base values, obtains basic data.
In an implementation, since genetic algorithm cannot directly handle the parameter of problem space, therefore, it is necessary to turn base values Change the chromosome or individual by gene by certain structure composition in hereditary space into, conversion operation therein is to encode, can also (Representation) is indicated to be referred to as (problem).Therefore, coded treatment can be carried out to base values, by base values It is converted into the basic data in hereditary space.
In step s310, fitness calculating is carried out to the basic data, obtains the first data.
In an implementation, it is contemplated that the fitness in genetic algorithm is to indicate a certain individual to the adaptability of environment, this is suitable Response can be characterized by fitness function, and the fitness function in genetic algorithm is referred to as valuation functions, the adaptation Degree function can be used for judging in population individual superiority and inferiority degree index, it be according to the objective function of required problem come into Row assessment.Genetic algorithm can not need other external informations in search evolutionary process, and assess using only fitness The superiority and inferiority of individual, and the foundation as subsequent operation.Since in genetic algorithm, fitness function needs are compared sequence, and Select probability is calculated on this basis, so the value of fitness function can take positive value.
By above-mentioned related content, fitness function can be chosen in advance, the selection of fitness function can be by a variety of Mode is realized, such as determines fitness function by target data, or fitness function is arranged by using experience, alternatively, The pattern etc. of fitness function can also be provided by business demand, this specification embodiment does not limit this.By above-mentioned After mode selectes fitness function, each of base values index can be input in the fitness function of setting and be carried out It calculates, obtained calculated result is the first data.By the above-mentioned means, in available base values each index first Data.
It should be noted that in order to enable fitness (or fitness function) has more extensively in this specification embodiment Importance Indicator (importance instruction) ability of meaning, the fitness in this specification embodiment can be by multiple Desired indicator obtains after being polymerize, wherein multiple desired indicators may be set according to actual conditions, and this specification embodiment mentions For a kind of optional indicator combination, i.e., multiple desired indicators may include sensitivity ROC (Receiver Operating Characteristic Curve, Receiver operating curve) area under a curve AUC (Area Under Curve), letter At least two in breath amount IV (Information Value) and embeding layer Embedding.
In step S312, judge whether above-mentioned first data meet the corresponding first selection condition of genetic algorithm.
In an implementation, the processing of server S308 through the above steps, it is corresponding to obtain basic data (i.e. initial population) First data, it is then possible to the first data of each index are compared with preset assessment threshold value respectively, if certain First data of index be greater than or equal to the assessment threshold value, then can by the setup measures be by index (alternatively, can be Index addition in base values passes through label etc.), if the first data of certain index are less than the assessment threshold value, can incite somebody to action The setup measures are discarded index (alternatively, can the index based in index add discarded label).Above-mentioned processing is complete Cheng Hou, server can count the quantity for carrying out the index of fitness calculating (or assessment), and obtain the of each index One data calculate the average value for completing the first data of the index calculated, it is then possible to by the index for carrying out fitness calculating Quantity is compared with the relevant information in the first selection condition, and obtained average value is related in the first selection condition Information is compared, if above-mentioned comparison result twice all passes through, can be determined that the first data meet first and choose item Part, if at least one in above-mentioned comparison result twice does not pass through, (i.e. the quantity of the index of progress fitness calculating does not reach Condition is chosen to first, and/or, obtained average value is not up to the first selection condition), then it can be determined that the first data are unsatisfactory for First chooses condition.
If it is determined that the first data are unsatisfactory for the first selection condition, then it may indicate that base values is unsatisfactory for the first selection item Part, at this point it is possible to execute following step S314~step S316 processing.If it is determined that the first data, which meet first, chooses item Part then may indicate that base values meets the first selection condition, at this point it is possible to not need to carry out derivation process, directly pass through base Subsequent processing can be performed in plinth index.
In step S314, if above-mentioned first data are unsatisfactory for the first selection condition, basic data is carried out at least Primary derivation process, and after executing derivation process every time, judge whether the data that derivation process obtains meet the first selection Condition, until judging that the number for the derivation process that the data that derivation process obtains meet the first selection condition or execution reaches pre- Determine frequency threshold value.
Wherein, derivation process may include cross processing and/or variation processing.Cross processing therein can be to existing Basic data in any two data between random switching part content, so that it is (and then raw to form new data At New Set) processing, variation therein processing can be in the part to any one data in existing basic data Hold it is random convert, to form the processing of new data (and then generating New Set).
In an implementation, it if by above-mentioned treatment process, determines that the first data are unsatisfactory for the first selection condition, then can incite somebody to action First data do waste treatment, meanwhile, according to genetic algorithm, letter that the data in the corresponding basic data of the first data include Breath may not be single information, may include much information (combination for being equivalent to multiple genes), therefore, can use base Other data in plinth data and the data carry out the processing such as genetic recombination (intersect including gene and genetic mutation) form, with Form the obtained data of derivation process, therefore, can set and intersect function and variation function, can by intersect function and/or Variation function carries out cross processing to the basic data of generation and/or variation is handled, and finally obtains at cross processing and/or variation Obtained data are managed, it is then possible to carry out fitness calculating to the data that cross processing and/or variation are handled again, are obtained Second data can match the second obtained data with the first selection condition, so that it is determined that whether the second data meet First chooses condition, if the second data, which meet first, chooses condition, can execute the processing of following step S316, if the Two data are unsatisfactory for the first selection condition, then the data handled cross processing and/or variation can be done waste treatment, To terminate the derivation operation to the basic data, alternatively, cross processing that can also again to the first selection condition is unsatisfactory for And/or the data that variation is handled carry out cross processing and/or variation processing again, generate new data, then sentence again Whether new data of breaking meet the first selection condition, and can repeat the above process, until new data meet the first choosing Until taking the cross processing of condition or execution and/or variation number of processing to reach pre-determined number threshold value.
In step S316, the corresponding index of the data that last time cross processing and/or variation are handled is determined For the corresponding derivative index of target data.
This specification embodiment provides a kind of deriving method of data, raw by carrying out gridding processing to target data At one or more grids of coverage goal data, it is then possible to generate single dimension letter according to the information for including in each grid Breath and various dimensions information, can be respectively according to the period for including in single dimension information and various dimensions information and each grid And event type, it determines the corresponding base values of target data, finally, base values can be carried out by index derivative algorithm Derivation process obtains the corresponding derivative index of target data, in this way, by the discrete automatic grid division of target data, and will Obtained grid is listed, meanwhile, single dimension information and various dimensions information are generated according to grid, and then construct base values, by It is not only determined, can also be determined by various dimensions information, so that base values by single dimension information in base values The content for including is more extensive, provides data basis for subsequent processing, in addition, realizing by index derivative algorithm to base values The polymerization of middle element, and then the corresponding derivative index of target data is generated, the corresponding index of target data can be made to obtain one Fixed expansion, and then the corresponding feature of the index also can be more fine and comprehensive, to improve the accuracy of subsequent processing, example Such as, the risk prevention system rule that can make is more accurate etc..
Example IV
The above are the deriving methods for the data that this specification embodiment provides, and are based on same thinking, and this specification is implemented Example also provides a kind of deriving device of data, as shown in Figure 4.
The deriving device of the data includes:Gridding module 401, dimension generation module 402, base values determining module 403 and the derivative module 404 of index, wherein:
Gridding module 401 generates one for covering the target data for carrying out gridding processing to target data Or multiple grids;
Dimension generation module 402, for generating dimensional information, the dimensional information according to the information for including in each grid In include single dimension information and various dimensions information;
Base values determining module 403, for according to period for including in the dimensional information and each grid and Event type determines the corresponding base values of the target data;
Index derives module 404, for carrying out derivation process to the base values by index derivative algorithm, obtains institute State the corresponding derivative index of target data.
In this specification embodiment, the index derivative algorithm includes that genetic algorithm, Random Walk Algorithm and violence are derivative Algorithm.
In this specification embodiment, the derivative module 404 of the index, including:
Judging unit chooses item for judging whether the base values meets the index derivative algorithm corresponding first Part;
Index derived units, for if conditions are not met, then carry out derivation process at least once to the base values, and Judge whether the index that the derivation process obtains meets the first selection condition after executing derivation process every time, until sentencing The number for the derivation process that the index that the derivation process obtains disconnected out meets the first selection condition or execution reaches Pre-determined number threshold value;The index that last time derivation process obtains is determined as the corresponding derivative index of the target data.
In this specification embodiment, the judging unit is used for:
Coded treatment is carried out to the base values, obtains basic data;
Fitness calculating is carried out to the basic data, obtains the first data;
Judge whether first data meet the first selection condition;
If first data meet the first selection condition, the base values meets described first and chooses item Part;
If first data are unsatisfactory for the first selection condition, the base values is unsatisfactory for first choosing Take condition;
The index derived units, for the basic data corresponding to the base values carry out cross processing and/ Or variation processing.
In this specification embodiment, the fitness obtains after being polymerize by multiple desired indicators.
In this specification embodiment, the multiple desired indicator includes area AUC under sensitivity ROC curve, information content At least two in IV and embeding layer Embedding.
In this specification embodiment, the event type includes payment class and operation class, and the event of the payment class includes Transaction event, event of transferring accounts and barcode scanning event, the event of the operation class include log-in events, Modify password event, registration thing Part and browsing event.
In this specification embodiment, the gridding module 401, in a manner of random division, to target data into Row gridding processing, generates the one or more grids for covering the target data.
This specification embodiment provides a kind of deriving device of data, raw by carrying out gridding processing to target data At one or more grids of coverage goal data, it is then possible to generate single dimension letter according to the information for including in each grid Breath and various dimensions information, can be respectively according to the period for including in single dimension information and various dimensions information and each grid And event type, it determines the corresponding base values of target data, finally, base values can be carried out by index derivative algorithm Derivation process obtains the corresponding derivative index of target data, in this way, by the discrete automatic grid division of target data, and will Obtained grid is listed, meanwhile, single dimension information and various dimensions information are generated according to grid, and then construct base values, by It is not only determined, can also be determined by various dimensions information, so that base values by single dimension information in base values The content for including is more extensive, provides data basis for subsequent processing, in addition, realizing by index derivative algorithm to base values The polymerization of middle element, and then the corresponding derivative index of target data is generated, the corresponding index of target data can be made to obtain one Fixed expansion, and then the corresponding feature of the index also can be more fine and comprehensive, to improve the accuracy of subsequent processing, example Such as, the risk prevention system rule that can make is more accurate etc..
Embodiment five
The above are the deriving devices for the data that this specification embodiment provides, and are based on same thinking, and this specification is implemented Example also provides a kind of derivative equipment of data, as shown in Figure 5.
The derivative equipment of the data can be server provided by the above embodiment or terminal device.
The derivative equipment of data can generate bigger difference because configuration or performance are different, may include one or one Above processor 501 and memory 502, can store in memory 502 one or more storage application programs or Data.Wherein, memory 502 can be of short duration storage or persistent storage.The application program for being stored in memory 502 may include One or more modules (diagram is not shown), each module may include the series of computation in the derivative equipment to data Machine executable instruction.Further, processor 501 can be set to communicate with memory 502, in the derivative equipment of data Execute the series of computation machine executable instruction in memory 502.The derivative equipment of data can also include one or one with Upper power supply 503, one or more wired or wireless network interfaces 504, one or more input/output interfaces 505, One or more keyboards 506.
Specifically in the present embodiment, the derivative equipment of data includes memory and one or more program, Perhaps more than one program is stored in memory and one or more than one program may include one or one for one of them It is a with upper module, and each module may include the series of computation machine executable instruction in derivative equipment to data, and pass through Configuration includes for carrying out following calculate to execute this or more than one program by one or more than one processor Machine executable instruction:
Gridding processing is carried out to target data, generates the one or more grids for covering the target data;
Dimensional information is generated according to the information for including in each grid, includes single dimension information in the dimensional information and more Dimensional information;
According to the period and event type for including in the dimensional information and each grid, the number of targets is determined According to corresponding base values;
Derivation process is carried out to the base values by index derivative algorithm, obtains the corresponding derivative of the target data Index.
In this specification embodiment, the index derivative algorithm includes that genetic algorithm, Random Walk Algorithm and violence are derivative Algorithm.
It is described that derivation process is carried out to the base values by index derivative algorithm in this specification embodiment, it obtains The corresponding derivative index of the target data, including:
Judge whether the base values meets the corresponding first selection condition of the index derivative algorithm;
If conditions are not met, then carrying out derivation process at least once to the base values, and executing at derivative every time Judge whether the index that the derivation process obtains meets the first selection condition after reason, until judging the derivation process The number that obtained index meets the derivation process of the first selection condition or execution reaches pre-determined number threshold value;It will most The index that a derivation process obtains afterwards is determined as the corresponding derivative index of the target data.
It is described to judge whether the base values meets the index derivative algorithm corresponding in this specification embodiment One chooses condition, including:
Coded treatment is carried out to the base values, obtains basic data;
Fitness calculating is carried out to the basic data, obtains the first data;
Judge whether first data meet the first selection condition;
If first data meet the first selection condition, the base values meets described first and chooses item Part;
If first data are unsatisfactory for the first selection condition, the base values is unsatisfactory for first choosing Take condition;
It is described that derivation process is carried out to the base values, including:
The basic data corresponding to the base values carries out cross processing and/or variation processing.
In this specification embodiment, the fitness obtains after being polymerize by multiple desired indicators.
In this specification embodiment, the multiple desired indicator includes area AUC under sensitivity ROC curve, information content At least two in IV and embeding layer Embedding.
In this specification embodiment, the event type includes payment class and operation class, and the event of the payment class includes Transaction event, event of transferring accounts and barcode scanning event, the event of the operation class include log-in events, Modify password event, registration thing Part and browsing event.
It is described that gridding processing is carried out to target data in this specification embodiment, it generates and covers the target data One or more grids, including:
In a manner of random division, gridding processing is carried out to target data, generates one for covering the target data Or multiple grids.
This specification embodiment provides a kind of derivative equipment of data, raw by carrying out gridding processing to target data At one or more grids of coverage goal data, it is then possible to generate single dimension letter according to the information for including in each grid Breath and various dimensions information, can be respectively according to the period for including in single dimension information and various dimensions information and each grid And event type, it determines the corresponding base values of target data, finally, base values can be carried out by index derivative algorithm Derivation process obtains the corresponding derivative index of target data, in this way, by the discrete automatic grid division of target data, and will Obtained grid is listed, meanwhile, single dimension information and various dimensions information are generated according to grid, and then construct base values, by It is not only determined, can also be determined by various dimensions information, so that base values by single dimension information in base values The content for including is more extensive, provides data basis for subsequent processing, in addition, realizing by index derivative algorithm to base values The polymerization of middle element, and then the corresponding derivative index of target data is generated, the corresponding index of target data can be made to obtain one Fixed expansion, and then the corresponding feature of the index also can be more fine and comprehensive, to improve the accuracy of subsequent processing, example Such as, the risk prevention system rule that can make is more accurate etc..
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can With or may be advantageous.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, The hardware circuit for realizing the logical method process can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller Device:ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, are deposited Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc. Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when specification one or more embodiment.
It should be understood by those skilled in the art that, the embodiment of this specification can provide as method, system or computer journey Sequence product.Therefore, complete hardware embodiment, complete software embodiment or knot can be used in this specification one or more embodiment The form of embodiment in terms of conjunction software and hardware.Moreover, this specification one or more embodiment can be used at one or more A wherein includes computer-usable storage medium (the including but not limited to magnetic disk storage, CD- of computer usable program code ROM, optical memory etc.) on the form of computer program product implemented.
The embodiment of this specification is referring to the method, equipment (system) and computer journey according to this specification embodiment The flowchart and/or the block diagram of sequence product describes.It should be understood that flow chart and/or box can be realized by computer program instructions The combination of the process and/or box in each flow and/or block and flowchart and/or the block diagram in figure.It can provide this A little computer program instructions are to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices Processor to generate a machine so that the finger executed by the processor of computer or other programmable data processing devices It enables and generates to specify in one or more flows of the flowchart and/or one or more blocks of the block diagram The device of function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.
It will be understood by those skilled in the art that the embodiment of this specification can provide as the production of method, system or computer program Product.Therefore, this specification one or more embodiment can be used complete hardware embodiment, complete software embodiment or combine software With the form of the embodiment of hardware aspect.Moreover, this specification one or more embodiment can be used it is one or more wherein It include computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, the light of computer usable program code Learn memory etc.) on the form of computer program product implemented.
This specification one or more embodiment can computer executable instructions it is general on It hereinafter describes, such as program module.Generally, program module includes executing particular task or realization particular abstract data type Routine, programs, objects, component, data structure etc..Can also practice in a distributed computing environment this specification one or Multiple embodiments, in these distributed computing environments, by being executed by the connected remote processing devices of communication network Task.In a distributed computing environment, the local and remote computer that program module can be located at including storage equipment is deposited In storage media.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The foregoing is merely the embodiments of this specification, are not limited to this specification.For art technology For personnel, this specification can have various modifications and variations.It is all made any within the spirit and principle of this specification Modification, equivalent replacement, improvement etc., should be included within the scope of the claims of this specification.

Claims (17)

1. a kind of deriving method of data, the method includes:
Gridding processing is carried out to target data, generates the one or more grids for covering the target data;
Dimensional information is generated according to the information for including in each grid, includes single dimension information and various dimensions in the dimensional information Information;
According to the period and event type for including in the dimensional information and each grid, the target data pair is determined The base values answered;
Derivation process is carried out to the base values by index derivative algorithm, the corresponding derivative of the target data is obtained and refers to Mark.
2. according to the method described in claim 1, the index derivative algorithm includes genetic algorithm, Random Walk Algorithm and violence Derivative algorithm.
3. according to the method described in claim 2, it is described by index derivative algorithm to the base values carry out derivation process, The corresponding derivative index of the target data is obtained, including:
Judge whether the base values meets the corresponding first selection condition of the index derivative algorithm;
If conditions are not met, then carrying out derivation process at least once to the base values, and after executing derivation process every time Judge whether the index that the derivation process obtains meets the first selection condition, until judging that the derivation process obtains The number of the index derivation process that meets the first selection condition or execution reach pre-determined number threshold value;By last The index that secondary derivation process obtains is determined as the corresponding derivative index of the target data.
4. judging whether the base values meets the index derivative algorithm pair according to the method described in claim 3, described The the first selection condition answered, including:
Coded treatment is carried out to the base values, obtains basic data;
Fitness calculating is carried out to the basic data, obtains the first data;
Judge whether first data meet the first selection condition;
If first data meet the first selection condition, the base values meets the first selection condition;
If first data are unsatisfactory for the first selection condition, the base values is unsatisfactory for described first and chooses item Part;
It is described that derivation process is carried out to the base values, including:
The basic data corresponding to the base values carries out cross processing and/or variation processing.
5. according to the method described in claim 4, the fitness obtains after being polymerize by multiple desired indicators.
6. according to the method described in claim 5, the multiple desired indicator includes area AUC, letter under sensitivity ROC curve At least two in breath amount IV and embeding layer Embedding.
7. according to the method described in claim 1, the event type includes payment class and operation class, the event of the payment class Including transaction event, event of transferring accounts and barcode scanning event, the event of the operation class includes log-in events, Modify password event, note Volume event and browsing event.
8. generating according to the method described in claim 1, described carry out gridding processing to target data and covering the number of targets According to one or more grids, including:
In a manner of random division, gridding processing is carried out to target data, generates one or more for covering the target data A grid.
9. a kind of deriving device of data, described device include:
Gridding module generates the one or more for covering the target data for carrying out gridding processing to target data Grid;
Dimension generation module includes in the dimensional information for generating dimensional information according to the information for including in each grid Single dimension information and various dimensions information;
Base values determining module, for according to the period and event class for including in the dimensional information and each grid Type determines the corresponding base values of the target data;
Index derives module, for carrying out derivation process to the base values by index derivative algorithm, obtains the target The corresponding derivative index of data.
10. device according to claim 9, the index derivative algorithm includes genetic algorithm, Random Walk Algorithm and sudden and violent Power derivative algorithm.
11. device according to claim 10, the derivative module of the index, including:
Judging unit, for judging whether the base values meets the corresponding first selection condition of the index derivative algorithm;
Index derived units, for if conditions are not met, then carry out derivation process at least once to the base values, and every Judge whether the index that the derivation process obtains meets the first selection condition after secondary execution derivation process, until judging The number that the index that the derivation process obtains meets the derivation process of the first selection condition or execution reaches predetermined Frequency threshold value;The index that last time derivation process obtains is determined as the corresponding derivative index of the target data.
12. device according to claim 11, the judging unit, are used for:
Coded treatment is carried out to the base values, obtains basic data;
Fitness calculating is carried out to the basic data, obtains the first data;
Judge whether first data meet the first selection condition;
If first data meet the first selection condition, the base values meets the first selection condition;
If first data are unsatisfactory for the first selection condition, the base values is unsatisfactory for described first and chooses item Part;
The index derived units carry out cross processing and/or change for the basic data corresponding to the base values Different processing.
13. device according to claim 12, the fitness obtains after being polymerize by multiple desired indicators.
14. device according to claim 13, the multiple desired indicator include area AUC under sensitivity ROC curve, At least two in information content IV and embeding layer Embedding.
15. device according to claim 9, the event type includes payment class and operation class, the thing of the payment class Part includes transaction event, event of transferring accounts and barcode scanning event, the event of the operation class include log-in events, Modify password event, Registered events and browsing event.
16. device according to claim 9, the gridding module, in a manner of random division, to target data Gridding processing is carried out, the one or more grids for covering the target data are generated.
17. the derivative equipment of a kind of derivative equipment of data, the data includes:
Processor;And
It is arranged to the memory of storage computer executable instructions, the executable instruction makes the processing when executed Device:
Gridding processing is carried out to target data, generates the one or more grids for covering the target data;
Dimensional information is generated according to the information for including in each grid, includes single dimension information and various dimensions in the dimensional information Information;
According to the period and event type for including in the dimensional information and each grid, the target data pair is determined The base values answered;
Derivation process is carried out to the base values by index derivative algorithm, the corresponding derivative of the target data is obtained and refers to Mark.
CN201810630668.2A 2018-06-19 2018-06-19 Data derivation method, device and equipment Active CN108921693B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810630668.2A CN108921693B (en) 2018-06-19 2018-06-19 Data derivation method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810630668.2A CN108921693B (en) 2018-06-19 2018-06-19 Data derivation method, device and equipment

Publications (2)

Publication Number Publication Date
CN108921693A true CN108921693A (en) 2018-11-30
CN108921693B CN108921693B (en) 2022-04-29

Family

ID=64419394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810630668.2A Active CN108921693B (en) 2018-06-19 2018-06-19 Data derivation method, device and equipment

Country Status (1)

Country Link
CN (1) CN108921693B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967600A (en) * 2020-08-18 2020-11-20 北京睿知图远科技有限公司 Feature derivation system and method based on genetic algorithm in wind control scene
CN112100307A (en) * 2020-09-25 2020-12-18 北京奇艺世纪科技有限公司 Data processing method, path searching processing method and device and electronic equipment
CN112685600A (en) * 2021-03-19 2021-04-20 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium
CN113990068A (en) * 2021-10-27 2022-01-28 阿波罗智联(北京)科技有限公司 Traffic data processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150074025A1 (en) * 2013-09-11 2015-03-12 National Tsing Hua University Multi-objective semiconductor product capacity planning system and method thereof
CN105956744A (en) * 2016-04-20 2016-09-21 淮南矿业(集团)有限责任公司 Multi-mine raw coal blending analysis system and coal blending analysis method thereof
CN106446345A (en) * 2016-08-30 2017-02-22 国网江苏省电力公司 Distribution network operation indicator processing method based on segmented geographic region
CN106529754A (en) * 2016-06-27 2017-03-22 江苏智通交通科技有限公司 Taxi operation condition assessment method based on big data analysis
CN107705199A (en) * 2017-08-07 2018-02-16 阿里巴巴集团控股有限公司 The generation method and device of feature calculation code
CN107862602A (en) * 2017-11-23 2018-03-30 安趣盈(上海)投资咨询有限公司 It is a kind of based on various dimensions index calculate, self study and divide group model apply credit decision-making technique and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150074025A1 (en) * 2013-09-11 2015-03-12 National Tsing Hua University Multi-objective semiconductor product capacity planning system and method thereof
CN105956744A (en) * 2016-04-20 2016-09-21 淮南矿业(集团)有限责任公司 Multi-mine raw coal blending analysis system and coal blending analysis method thereof
CN106529754A (en) * 2016-06-27 2017-03-22 江苏智通交通科技有限公司 Taxi operation condition assessment method based on big data analysis
CN106446345A (en) * 2016-08-30 2017-02-22 国网江苏省电力公司 Distribution network operation indicator processing method based on segmented geographic region
CN107705199A (en) * 2017-08-07 2018-02-16 阿里巴巴集团控股有限公司 The generation method and device of feature calculation code
CN107862602A (en) * 2017-11-23 2018-03-30 安趣盈(上海)投资咨询有限公司 It is a kind of based on various dimensions index calculate, self study and divide group model apply credit decision-making technique and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967600A (en) * 2020-08-18 2020-11-20 北京睿知图远科技有限公司 Feature derivation system and method based on genetic algorithm in wind control scene
CN112100307A (en) * 2020-09-25 2020-12-18 北京奇艺世纪科技有限公司 Data processing method, path searching processing method and device and electronic equipment
CN112685600A (en) * 2021-03-19 2021-04-20 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium
CN113990068A (en) * 2021-10-27 2022-01-28 阿波罗智联(北京)科技有限公司 Traffic data processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN108921693B (en) 2022-04-29

Similar Documents

Publication Publication Date Title
CN110348462B (en) Image feature determination and visual question and answer method, device, equipment and medium
CN108921693A (en) A kind of deriving method of data, device and equipment
Zhou et al. Bankruptcy prediction using SVM models with a new approach to combine features selection and parameter optimisation
WO2019157946A1 (en) Anti-money laundering method, apparatus, and device
CN108460523A (en) A kind of air control rule generating method and device
KR101508361B1 (en) Method for prediction of future stock price using analysis of aggregate market value of listed stock
CN109086961A (en) A kind of Information Risk monitoring method and device
Fisichella et al. Can deep learning improve technical analysis of forex data to predict future price movements?
EP4200761A1 (en) Systems and methods for next basket recommendation with dynamic attributes modeling
CN110110012A (en) User's expectancy appraisal procedure, device, electronic equipment and readable medium
CN109615504A (en) Products Show method, apparatus, electronic equipment and computer readable storage medium
CN108846097A (en) The interest tags representation method of user, article recommended method and device, equipment
CN112035549B (en) Data mining method, device, computer equipment and storage medium
CN111191814A (en) Electricity price prediction method, system and computer readable storage medium
CN109345285A (en) A kind of movable put-on method, device and equipment
CN113837635A (en) Risk detection processing method, device and equipment
Bernardo et al. A genetic type-2 fuzzy logic based system for financial applications modelling and prediction
CN115545103A (en) Abnormal data identification method, label identification method and abnormal data identification device
Hsu et al. Dynamically optimizing parameters in support vector regression: An application of electricity load forecasting
CN109767333A (en) Select based method, device, electronic equipment and computer readable storage medium
Roumboutsos et al. Public private partnerships in transport: Case study structure
CN112801784A (en) Bit currency address mining method and device for digital currency exchange
CN108665142A (en) A kind of the recommendation method, apparatus and equipment of rule
CN110825929B (en) Service permission recommendation method and device
CN116485391A (en) Payment recommendation processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

GR01 Patent grant
GR01 Patent grant