Embodiment
To make the purpose, technical scheme and advantage of the application clearer, below in conjunction with the application specific embodiment and
Technical scheme is clearly and completely described corresponding accompanying drawing.Obviously, described embodiment is only the application one
Section Example, rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing
Go out the every other embodiment obtained under the premise of creative work, belong to the scope of the application protection.
The algorithm evaluation method according to the present invention is described in detail below with reference to Fig. 1.
As shown in figure 1, in step S110, training dataset corresponding with algorithm is extracted from source data set.It should be noted that institute
State the data set that set of source data refers to be stored according to predetermined storage format under same path.The set of source data may be stored
, also can mode be stored in multiple memories in a distributed manner according to demand in same memory.In addition, set of source data includes
The data set obtained from multitype database.After data set is obtained from multitype database, data set is deposited according to predetermined
Form storage is stored up, the predetermined storage format includes the title by warehouse where data set and the set of source data (that is, according to calculation
Method needs the title of the data warehouse created) and data table name (corresponding with the data set) accordingly store, and can lead to
The URL for constituting data set by data warehouse title and data table name is crossed directly to obtain tables of data and utilize the URL quick search
Data set.Above-described multitype database may include mysql databases, hbase databases and odps databases, wherein,
Mysql data sources are the relevant databases of open source code, and hbase databases are the distributed storage systems of unstructured data
System, ODPS databases are open data processing service (Open Data Processing Service), are to be based on Alibaba
The data storage that the cloud computing platform of the entirely autonomous intellectual property of group is built is in analysis platform, it is adaptable to which requirement of real-time is not high
Mass data (TB/PB ranks) processed offline.Alternatively, can be according to SQL (Structured Query Language, knot
Structure query language) extract data set from mysql databases, hbase databases and odps databases, and by the data of extraction
Collection constitutes set of source data.
The set of source data is stored according to each column data sheet form corresponding with feature, and this data format is easy to according to need
Ask and extract corresponding characteristic by row.Tables of data (table) is a kind of data storage cell, and it is logically by row and column group
Into two-dimensional structure, often row represent a record, each column represents an attribute, possesses a word of same data type and title
Section;One record can include one or more row, and the title and type of each row constitute the table schema (schema) of this table.
Multiple tables can be included in one data warehouse.Specifically, the data of source data set can be various according to the feature generation of each column
The tables of data of type, when needing the data of special characteristic according to algorithm, can only extract data corresponding with each feature.
In an alternative embodiment, after data set is extracted from source data set, data cleansing is carried out to data set.Data
Cleaning is one and reduces error in data and inconsistent process, and main task is to detect and delete or correct transaction data.For example,
Data set can be extracted from database by writing SQL, then, data cleansing be carried out to the data set, preserved after data cleansing
Data set.
In an alternative embodiment, can be before step S110 be performed, it is determined that algorithm corresponding with application scenarios.The application
Scene includes the scenes such as monitoring abnormal data, transaction prediction, data mining, and application scenarios can be pre-set as needed, and will
Application scenarios are associated with respective algorithms, for example, can accordingly be stored in application scenarios and respective algorithms in relation table, such as
This, it is determined that in the case of application scenarios, can start corresponding algorithm, and can be added as needed on corresponding scene and its correspondingly
Algorithm, for example, data analysis scene and parser corresponding with data analysis scene can be added as needed on.Due to not
The corresponding training set of same algorithm is different, so it is determined that in the case of algorithm, training corresponding with algorithm is determined according to algorithm
Data set, for example, in the algorithm of abnormal monitoring or transaction prediction is carried out to transaction platform, the training dataset refers to transaction
Data.
Then, in step S120, training dataset is handled by the algorithm, result data collection is generated.Then,
In step S130, according to feedback information, result data collection is labelled, marking data collection is generated, the feedback information is
Refer to and feedack is carried out to result of calculation.For example, in the case of data exception policing algorithm, it is anti-that feedback information includes businessman
The information (for example, transaction is abnormal) of feedback or the information that data tracking acquisition is carried out according to abnormal prompt.Result data collection is beaten
The mode of label has a lot, such as, if determining that arithmetic result is correct according to feedback information, "+" mark is beaten to result data collection
Label, if determining that arithmetic result is incorrect according to feedback information, play result data "-" label, it can thus be concluded that arriving mark number
According to collection, or the correct transaction data of arithmetic result can be labelled " ture " as needed, the transaction to arithmetic result mistake
Data label " false ".It should be noted that the various situations for being only used for concentrating to result data that to result data collection label
Carry out distinctive mark.
In step S140, marking data collection is stored as validation data set according to the storage mode of set of source data.It is described to test
Card data set is the data set for being verified to algorithm.Specifically, marking data collection can be converted to and set of source data
Tables of data identic tables of data;The set of source data institute is arrived using the tables of data after conversion as validation data set storage
Data warehouse in.The title of the corresponding tables of data of the validation data set may differ from data corresponding with training dataset
The title of table, thus, the method according to the invention can call different data sets according to the title of tables of data.Due to checking data
Collection is data set by checking, so can optimize and calculate by being iterated training to validation data set according to algorithm
Method.
In an alternative embodiment, the data in validation data set can be called to evaluate the performance of the algorithm.Specifically
For, programmed instruction (example can be utilized by the data form in the warehouse title and validation data set where validation data set
Such as, SQL statement executable ODPS), the related data in validation data set is called, evaluation result is obtained.For example, passing through standard
, can be by the data for playing "+" label in the validation data set and whole numbers of deals in the case that exactness is evaluated algorithm
According to being compared, so as to reach the degree of accuracy.It should be noted that different algorithm evaluation indexes is different, algorithm can be referred to evaluation in advance
Mark is associated, and when selecting a certain algorithm, can call evaluation index corresponding with the algorithm.
The data processing method of the present invention, can be labelled by using feedback information to result data collection and by mark
Data set is iterated training to algorithm as validation data set, so as to be optimized to algorithm, improves the performance of algorithm.
Further, data processing method of the invention can be evaluated algorithm by using validation data set, so as to straight
Sight and the quantitatively performance of evaluation algorithms.In addition, the data processing method of the present invention can also be to the algorithms of different under different scenes
It is estimated, compatibility is strong.
Fig. 2 shows the flow chart of data processing method in accordance with an alternative illustrative embodiment of the present invention.As shown in Fig. 2
The application scenarios monitor for shop door.According to the application scenarios, it is abnormal data policing algorithm to determine algorithm.
Then corresponding training dataset is extracted according to algorithm, training dataset handled by algorithm, so that raw
Into result data collection, the result data collection includes the abnormal data obtained by algorithm.Then, by feedback information, to knot
Fruit data set is labelled, specifically, result data can be handled by merchant feedback, when merchant feedback is present
During abnormal conditions, then corresponding data set labels as "+".For example, the time fed back according to trade company determines and the time pair
The transaction answered, and will be labelled in the related transaction data of the transaction as "-", or, after result data collection is obtained, to knot
The related transaction of fruit data set is tracked, and according to tracking result, will be labelled with the normal corresponding transaction data of transaction for
"+", will label as "-" with the abnormal corresponding transaction data of transaction.Then, by marking data collection in the way of set of source data
Be converted to validation data set.For example, in set of source data to be stored in what is created in ODPS according to the algorithm in the form of tables of data
In the case of in data warehouse, by storage mode of the marking data collection according to ODPS, by marking data collection in the form of tables of data
It is stored in the data warehouse created in ODPS according to algorithm.Thus, can be according to warehouse title and corresponding with marking data collection
The title of tables of data training dataset is handled, for example, SQL instruction calls, inquiry or change training data can be passed through
Collection.
Next, as described in Figure 2, using the data in SQL instruction calls validation data sets, commenting the algorithm
Valency.It should be clear that evaluation index is to be pre-set by research staff or user according to algorithm, different algorithms, training quota may
It is different.For example, beating the data of "-" label in the case where evaluating algorithm by rate of false alarm, in validation data set to miss
The situation of report, will can obtain rate of false alarm labeled as the number of transaction of "-" compared with total number of transaction.Also there are such a feelings
Be present exception in condition, transaction, but do not detect, this by being the data obtained by merchant feedback, this number of deals
According to that can label as " else " (the only purpose of example), can be by label in the case where being evaluated with " rate of failing to report "
The number of transaction of " else " i.e. available " rate of failing to report " compared with total number of transaction.
In addition, same application scene may correspond to different algorithms, this depends on the contingency table of application scenarios and algorithm.
In the case of same application scenarios correspondence algorithms of different, in the case of selected application scenarios, in addition it is also necessary to it is determined that required algorithm.
In order to more clearly understand the inventive concept of the present invention, describe and utilized under monitor supervision platform hereinafter with reference to Fig. 3
According to the frame diagram of the monitoring system of the data processing method of the present invention.
As described in Figure 3, line discipline monitoring and intelligent monitoring can be entered to the trading situation on monitor supervision platform.The rule monitoring
Refer to be monitored trading situation using the combination of monitoring rules, for example, the single day merchant transaction amount of money>0 and dealing money<
=transaction baseline * 2, the business datum (alternatively referred to as " transaction data ") includes merchant transaction data, door transaction data etc..
Further, it is also possible to select to carry out intelligent monitoring to transaction data.As described in Figure 3, mark is being carried out to transaction data
In the case of, when carrying out intelligent monitoring, the system may include application module, management module and optimization module, wherein, application
Module may include to provide the algorithm being adapted to scene according to input scene;Decision algorithm output result;Alarm response etc. is provided.Pipe
Reason module may include to be managed the data involved by application scenarios, algorithm and its training pattern.Optimization module may include pair
Training, iteration and the evaluation for the algorithm selected.
Fig. 4 shows the block diagram of the data processing equipment of the exemplary embodiment according to the present invention.Data processing equipment includes
Extraction module 410, generation module 420, mark module 430 and memory module 440.Those skilled in the art will appreciate that:Fig. 4
In data processing equipment illustrate only the component related to the present exemplary embodiment, may also include the group except being shown in Fig. 4
General purpose module outside part.
Extraction module 410 extracts corresponding with algorithm training dataset from source data set, wherein, source data is by from many
Plant the data set extracted in database and carry out the data set after data cleansing, after data set is obtained from multitype database,
Data set is stored according to predetermined storage format, the predetermined storage format is included storehouse where data set and the set of source data
The title (that is, the title of data warehouse created is needed according to algorithm) and data table name (corresponding with the data set) in storehouse are right
Ground storage is answered, and can directly obtain tables of data simultaneously by being made up of the URL of data set data warehouse title and data table name
Utilize the URL quick search data sets.
Alternatively, before extraction module 410 carries out extraction operation, data processing equipment (can not shown using determining module
Go out) algorithm corresponding with the application scenarios selected is determined, so that extraction module 410 can extract corresponding with algorithm from source data set
Training dataset, in addition, data processing equipment also include memory module (not shown), memory module can be in advance by application scenarios
Algorithm corresponding with application scenarios is associated storage.
First generation module 420 is handled training dataset by the algorithm, generates result data collection.Then,
Second generation module 430 labels according to feedback information to result data collection, generates marking data collection.The feedback information refers to
Feedack is carried out to result of calculation.For example, in the case of data exception policing algorithm, feedback information includes merchant feedback
Information (for example, transaction abnormal) or the information of data tracking acquisition is carried out according to abnormal prompt.To result data collection mark
The mode of label has a lot, such as, if determining that arithmetic result is correct according to feedback information, "+" label is played to result data collection,
If determining that arithmetic result is incorrect according to feedback information, "-" label is played result data, it can thus be concluded that arriving marking data
Collection, or the correct transaction data of arithmetic result can be labelled " ture " as needed, to the number of deals of arithmetic result mistake
According to label " false ".It is only used for entering the various situations that result data is concentrated it should be noted that to result data collection label
Row distinctive mark.
Marking data collection is stored as validation data set by memory module 440 according to the storage mode of set of source data.It is specific next
Say, memory module 440 is converted to marking data collection the identic tables of data with the tables of data of set of source data;After changing
Tables of data as validation data set store into the data warehouse where the set of source data.The validation data set is corresponding
The title of tables of data may differ from the title of tables of data corresponding with training dataset, thus, and the method according to the invention can root
Different data sets are called according to the title of tables of data.Because validation data set is data set by checking, so can lead to
Cross to be iterated validation data set according to algorithm and train and optimized algorithm.
Alternatively, the data processing equipment also includes evaluation module (not shown), and evaluation module can call checking data
The data of concentration are evaluated the performance of the algorithm.Specifically, can by the warehouse title where validation data set and
Data form in validation data set, using programmed instruction (for example, SQL statement executable ODPS), calls validation data set
In related data, obtain evaluation result.For example, in the case where being evaluated by the degree of accuracy algorithm, can be tested described
The data for playing "+" label in card data set are compared with whole transaction data, so as to reach the degree of accuracy.It should be noted that different
Algorithm evaluation index it is different, algorithm can be associated with evaluation index in advance, when selecting a certain algorithm, can called and this
The corresponding evaluation index of algorithm.
The data processing equipment of the present invention, can be labelled by using feedback information to result data collection and by mark
Data set is iterated training to algorithm as validation data set, so as to be optimized to algorithm, improves the performance of algorithm.
Further, data processing equipment of the invention can be evaluated algorithm by using validation data set, so as to straight
Sight and the quantitatively performance of evaluation algorithms.In addition, the data processing equipment of the present invention can also be to the algorithms of different under different scenes
It is estimated, compatibility is strong.
System, device, module or unit that above-described embodiment is illustrated, can specifically be realized by computer chip or entity,
Or realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used
Think personal computer, laptop computer, cell phone, camera phone, smart phone, personal digital assistant, media play
It is any in device, navigation equipment, electronic mail equipment, game console, tablet PC, wearable device or these equipment
The combination of equipment.
For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, this is being implemented
The function of each unit can be realized in same or multiple softwares and/or hardware during application.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program
Product.Therefore, the present invention can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.Moreover, the present invention can be used in one or more computers for wherein including computer usable program code
The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product
Figure and/or block diagram are described.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram
Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real
The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which is produced, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net
Network interface and internal memory.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved
State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable
Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM),
Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus
Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein
Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability
Comprising so that process, method, commodity or equipment including a series of key elements are not only including those key elements, but also wrap
Include other key elements being not expressly set out, or also include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described
Also there is other identical element in process, method, commodity or the equipment of element.
The application can be described in the general context of computer executable instructions, such as program
Module.Usually, program module includes performing particular task or realizes routine, program, object, the group of particular abstract data type
Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these DCEs, by
Remote processing devices connected by communication network perform task.In a distributed computing environment, program module can be with
Positioned at including in the local and remote computer-readable storage medium including storage device.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment
Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.It is real especially for system
Apply for example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method
Part explanation.
Embodiments herein is the foregoing is only, the application is not limited to.For those skilled in the art
For, the application can have various modifications and variations.It is all any modifications made within spirit herein and principle, equivalent
Replace, improve etc., it should be included within the scope of claims hereof.The foregoing is only embodiments herein and
, it is not limited to the application.To those skilled in the art, the application can have various modifications and variations.It is all
Any modification, equivalent substitution and improvements made within spirit herein and principle etc., the right that should be included in the application will
Within the scope of asking.