CN107016028A - Data processing method and its equipment - Google Patents

Data processing method and its equipment Download PDF

Info

Publication number
CN107016028A
CN107016028A CN201611140090.XA CN201611140090A CN107016028A CN 107016028 A CN107016028 A CN 107016028A CN 201611140090 A CN201611140090 A CN 201611140090A CN 107016028 A CN107016028 A CN 107016028A
Authority
CN
China
Prior art keywords
data
algorithm
collection
source
data collection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611140090.XA
Other languages
Chinese (zh)
Other versions
CN107016028B (en
Inventor
吴娅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201611140090.XA priority Critical patent/CN107016028B/en
Publication of CN107016028A publication Critical patent/CN107016028A/en
Application granted granted Critical
Publication of CN107016028B publication Critical patent/CN107016028B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of data processing method and its equipment are disclosed, methods described includes:Training dataset corresponding with algorithm is extracted from source data set;Training dataset is handled by the algorithm, result data collection is generated;Result data collection is labelled according to feedback information, marking data collection is generated;Marking data collection is stored as validation data set according to the storage mode of set of source data.Methods described is labelled to result data collection by using feedback information and training is iterated to algorithm using marking data collection as validation data set, so as to be optimized to algorithm, improves the performance of algorithm.

Description

Data processing method and its equipment
Technical field
The application is related to computer software technical field, more particularly to a kind of data processing method and its equipment.
Background technology
At present, with the fast development of internet, various forms of online transactions are continued to bring out.For safety, prediction etc. Various purposes, can be handled transaction data by historical trading data and predetermined training pattern, based on result to phase Algorithm is answered to be verified.For example, can be in real time monitored and based on to transaction data by the abnormality alarming algorithm of transaction data Monitored results generate alarm data, and then, according to merchant feedback or data tracking, labelled (letter to corresponding alarm data Claim " mark "), if alarm is correct, mark is "+", if alarm mistake, mark is "-", and then transaction data is beaten Mark.
It can thus be seen that the data after mark are not utilized and managed in existing method.
The content of the invention
It is a primary object of the present invention to provide one kind, it is intended to the problem of solving mentioned above.
The embodiment of the present application provides a kind of data processing method, and methods described includes:Extracted from source data set and algorithm Corresponding training dataset;Training dataset is handled by the algorithm, result data collection is generated;According to feedback information Result data collection is labelled, marking data collection is generated;Marking data collection is stored as testing according to the storage mode of set of source data Demonstrate,prove data set.
Another embodiment of the application provides a kind of data processing equipment, and the equipment includes:Extraction module, from set of source data It is middle to extract training dataset corresponding with algorithm;First generation module, is handled training dataset by the algorithm, raw Into result data collection;Second generation module, labels according to feedback information to result data collection, generates marking data collection;Storage Module, validation data set is stored as by marking data collection according to the storage mode of set of source data.
At least one above-mentioned technical scheme that the embodiment of the present application is used is entered by using feedback information to result data collection Row labels and training is iterated to algorithm using marking data collection as validation data set, excellent so as to be carried out to algorithm Change, improve the performance of algorithm.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please is used to explain the application, does not constitute the improper restriction to the application.In the accompanying drawings:
Fig. 1 shows the flow chart of the data processing method of the exemplary embodiment according to the present invention;
Fig. 2 shows the flow chart of data processing method in accordance with an alternative illustrative embodiment of the present invention;
Fig. 3 shows to utilize the frame diagram of the monitoring system of the data processing method according to the present invention under monitor supervision platform;
Fig. 4 shows the block diagram of the data processing equipment of the exemplary embodiment according to the present invention.
Embodiment
To make the purpose, technical scheme and advantage of the application clearer, below in conjunction with the application specific embodiment and Technical scheme is clearly and completely described corresponding accompanying drawing.Obviously, described embodiment is only the application one Section Example, rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing Go out the every other embodiment obtained under the premise of creative work, belong to the scope of the application protection.
The algorithm evaluation method according to the present invention is described in detail below with reference to Fig. 1.
As shown in figure 1, in step S110, training dataset corresponding with algorithm is extracted from source data set.It should be noted that institute State the data set that set of source data refers to be stored according to predetermined storage format under same path.The set of source data may be stored , also can mode be stored in multiple memories in a distributed manner according to demand in same memory.In addition, set of source data includes The data set obtained from multitype database.After data set is obtained from multitype database, data set is deposited according to predetermined Form storage is stored up, the predetermined storage format includes the title by warehouse where data set and the set of source data (that is, according to calculation Method needs the title of the data warehouse created) and data table name (corresponding with the data set) accordingly store, and can lead to The URL for constituting data set by data warehouse title and data table name is crossed directly to obtain tables of data and utilize the URL quick search Data set.Above-described multitype database may include mysql databases, hbase databases and odps databases, wherein, Mysql data sources are the relevant databases of open source code, and hbase databases are the distributed storage systems of unstructured data System, ODPS databases are open data processing service (Open Data Processing Service), are to be based on Alibaba The data storage that the cloud computing platform of the entirely autonomous intellectual property of group is built is in analysis platform, it is adaptable to which requirement of real-time is not high Mass data (TB/PB ranks) processed offline.Alternatively, can be according to SQL (Structured Query Language, knot Structure query language) extract data set from mysql databases, hbase databases and odps databases, and by the data of extraction Collection constitutes set of source data.
The set of source data is stored according to each column data sheet form corresponding with feature, and this data format is easy to according to need Ask and extract corresponding characteristic by row.Tables of data (table) is a kind of data storage cell, and it is logically by row and column group Into two-dimensional structure, often row represent a record, each column represents an attribute, possesses a word of same data type and title Section;One record can include one or more row, and the title and type of each row constitute the table schema (schema) of this table. Multiple tables can be included in one data warehouse.Specifically, the data of source data set can be various according to the feature generation of each column The tables of data of type, when needing the data of special characteristic according to algorithm, can only extract data corresponding with each feature.
In an alternative embodiment, after data set is extracted from source data set, data cleansing is carried out to data set.Data Cleaning is one and reduces error in data and inconsistent process, and main task is to detect and delete or correct transaction data.For example, Data set can be extracted from database by writing SQL, then, data cleansing be carried out to the data set, preserved after data cleansing Data set.
In an alternative embodiment, can be before step S110 be performed, it is determined that algorithm corresponding with application scenarios.The application Scene includes the scenes such as monitoring abnormal data, transaction prediction, data mining, and application scenarios can be pre-set as needed, and will Application scenarios are associated with respective algorithms, for example, can accordingly be stored in application scenarios and respective algorithms in relation table, such as This, it is determined that in the case of application scenarios, can start corresponding algorithm, and can be added as needed on corresponding scene and its correspondingly Algorithm, for example, data analysis scene and parser corresponding with data analysis scene can be added as needed on.Due to not The corresponding training set of same algorithm is different, so it is determined that in the case of algorithm, training corresponding with algorithm is determined according to algorithm Data set, for example, in the algorithm of abnormal monitoring or transaction prediction is carried out to transaction platform, the training dataset refers to transaction Data.
Then, in step S120, training dataset is handled by the algorithm, result data collection is generated.Then, In step S130, according to feedback information, result data collection is labelled, marking data collection is generated, the feedback information is Refer to and feedack is carried out to result of calculation.For example, in the case of data exception policing algorithm, it is anti-that feedback information includes businessman The information (for example, transaction is abnormal) of feedback or the information that data tracking acquisition is carried out according to abnormal prompt.Result data collection is beaten The mode of label has a lot, such as, if determining that arithmetic result is correct according to feedback information, "+" mark is beaten to result data collection Label, if determining that arithmetic result is incorrect according to feedback information, play result data "-" label, it can thus be concluded that arriving mark number According to collection, or the correct transaction data of arithmetic result can be labelled " ture " as needed, the transaction to arithmetic result mistake Data label " false ".It should be noted that the various situations for being only used for concentrating to result data that to result data collection label Carry out distinctive mark.
In step S140, marking data collection is stored as validation data set according to the storage mode of set of source data.It is described to test Card data set is the data set for being verified to algorithm.Specifically, marking data collection can be converted to and set of source data Tables of data identic tables of data;The set of source data institute is arrived using the tables of data after conversion as validation data set storage Data warehouse in.The title of the corresponding tables of data of the validation data set may differ from data corresponding with training dataset The title of table, thus, the method according to the invention can call different data sets according to the title of tables of data.Due to checking data Collection is data set by checking, so can optimize and calculate by being iterated training to validation data set according to algorithm Method.
In an alternative embodiment, the data in validation data set can be called to evaluate the performance of the algorithm.Specifically For, programmed instruction (example can be utilized by the data form in the warehouse title and validation data set where validation data set Such as, SQL statement executable ODPS), the related data in validation data set is called, evaluation result is obtained.For example, passing through standard , can be by the data for playing "+" label in the validation data set and whole numbers of deals in the case that exactness is evaluated algorithm According to being compared, so as to reach the degree of accuracy.It should be noted that different algorithm evaluation indexes is different, algorithm can be referred to evaluation in advance Mark is associated, and when selecting a certain algorithm, can call evaluation index corresponding with the algorithm.
The data processing method of the present invention, can be labelled by using feedback information to result data collection and by mark Data set is iterated training to algorithm as validation data set, so as to be optimized to algorithm, improves the performance of algorithm. Further, data processing method of the invention can be evaluated algorithm by using validation data set, so as to straight Sight and the quantitatively performance of evaluation algorithms.In addition, the data processing method of the present invention can also be to the algorithms of different under different scenes It is estimated, compatibility is strong.
Fig. 2 shows the flow chart of data processing method in accordance with an alternative illustrative embodiment of the present invention.As shown in Fig. 2 The application scenarios monitor for shop door.According to the application scenarios, it is abnormal data policing algorithm to determine algorithm.
Then corresponding training dataset is extracted according to algorithm, training dataset handled by algorithm, so that raw Into result data collection, the result data collection includes the abnormal data obtained by algorithm.Then, by feedback information, to knot Fruit data set is labelled, specifically, result data can be handled by merchant feedback, when merchant feedback is present During abnormal conditions, then corresponding data set labels as "+".For example, the time fed back according to trade company determines and the time pair The transaction answered, and will be labelled in the related transaction data of the transaction as "-", or, after result data collection is obtained, to knot The related transaction of fruit data set is tracked, and according to tracking result, will be labelled with the normal corresponding transaction data of transaction for "+", will label as "-" with the abnormal corresponding transaction data of transaction.Then, by marking data collection in the way of set of source data Be converted to validation data set.For example, in set of source data to be stored in what is created in ODPS according to the algorithm in the form of tables of data In the case of in data warehouse, by storage mode of the marking data collection according to ODPS, by marking data collection in the form of tables of data It is stored in the data warehouse created in ODPS according to algorithm.Thus, can be according to warehouse title and corresponding with marking data collection The title of tables of data training dataset is handled, for example, SQL instruction calls, inquiry or change training data can be passed through Collection.
Next, as described in Figure 2, using the data in SQL instruction calls validation data sets, commenting the algorithm Valency.It should be clear that evaluation index is to be pre-set by research staff or user according to algorithm, different algorithms, training quota may It is different.For example, beating the data of "-" label in the case where evaluating algorithm by rate of false alarm, in validation data set to miss The situation of report, will can obtain rate of false alarm labeled as the number of transaction of "-" compared with total number of transaction.Also there are such a feelings Be present exception in condition, transaction, but do not detect, this by being the data obtained by merchant feedback, this number of deals According to that can label as " else " (the only purpose of example), can be by label in the case where being evaluated with " rate of failing to report " The number of transaction of " else " i.e. available " rate of failing to report " compared with total number of transaction.
In addition, same application scene may correspond to different algorithms, this depends on the contingency table of application scenarios and algorithm. In the case of same application scenarios correspondence algorithms of different, in the case of selected application scenarios, in addition it is also necessary to it is determined that required algorithm.
In order to more clearly understand the inventive concept of the present invention, describe and utilized under monitor supervision platform hereinafter with reference to Fig. 3 According to the frame diagram of the monitoring system of the data processing method of the present invention.
As described in Figure 3, line discipline monitoring and intelligent monitoring can be entered to the trading situation on monitor supervision platform.The rule monitoring Refer to be monitored trading situation using the combination of monitoring rules, for example, the single day merchant transaction amount of money>0 and dealing money< =transaction baseline * 2, the business datum (alternatively referred to as " transaction data ") includes merchant transaction data, door transaction data etc..
Further, it is also possible to select to carry out intelligent monitoring to transaction data.As described in Figure 3, mark is being carried out to transaction data In the case of, when carrying out intelligent monitoring, the system may include application module, management module and optimization module, wherein, application Module may include to provide the algorithm being adapted to scene according to input scene;Decision algorithm output result;Alarm response etc. is provided.Pipe Reason module may include to be managed the data involved by application scenarios, algorithm and its training pattern.Optimization module may include pair Training, iteration and the evaluation for the algorithm selected.
Fig. 4 shows the block diagram of the data processing equipment of the exemplary embodiment according to the present invention.Data processing equipment includes Extraction module 410, generation module 420, mark module 430 and memory module 440.Those skilled in the art will appreciate that:Fig. 4 In data processing equipment illustrate only the component related to the present exemplary embodiment, may also include the group except being shown in Fig. 4 General purpose module outside part.
Extraction module 410 extracts corresponding with algorithm training dataset from source data set, wherein, source data is by from many Plant the data set extracted in database and carry out the data set after data cleansing, after data set is obtained from multitype database, Data set is stored according to predetermined storage format, the predetermined storage format is included storehouse where data set and the set of source data The title (that is, the title of data warehouse created is needed according to algorithm) and data table name (corresponding with the data set) in storehouse are right Ground storage is answered, and can directly obtain tables of data simultaneously by being made up of the URL of data set data warehouse title and data table name Utilize the URL quick search data sets.
Alternatively, before extraction module 410 carries out extraction operation, data processing equipment (can not shown using determining module Go out) algorithm corresponding with the application scenarios selected is determined, so that extraction module 410 can extract corresponding with algorithm from source data set Training dataset, in addition, data processing equipment also include memory module (not shown), memory module can be in advance by application scenarios Algorithm corresponding with application scenarios is associated storage.
First generation module 420 is handled training dataset by the algorithm, generates result data collection.Then, Second generation module 430 labels according to feedback information to result data collection, generates marking data collection.The feedback information refers to Feedack is carried out to result of calculation.For example, in the case of data exception policing algorithm, feedback information includes merchant feedback Information (for example, transaction abnormal) or the information of data tracking acquisition is carried out according to abnormal prompt.To result data collection mark The mode of label has a lot, such as, if determining that arithmetic result is correct according to feedback information, "+" label is played to result data collection, If determining that arithmetic result is incorrect according to feedback information, "-" label is played result data, it can thus be concluded that arriving marking data Collection, or the correct transaction data of arithmetic result can be labelled " ture " as needed, to the number of deals of arithmetic result mistake According to label " false ".It is only used for entering the various situations that result data is concentrated it should be noted that to result data collection label Row distinctive mark.
Marking data collection is stored as validation data set by memory module 440 according to the storage mode of set of source data.It is specific next Say, memory module 440 is converted to marking data collection the identic tables of data with the tables of data of set of source data;After changing Tables of data as validation data set store into the data warehouse where the set of source data.The validation data set is corresponding The title of tables of data may differ from the title of tables of data corresponding with training dataset, thus, and the method according to the invention can root Different data sets are called according to the title of tables of data.Because validation data set is data set by checking, so can lead to Cross to be iterated validation data set according to algorithm and train and optimized algorithm.
Alternatively, the data processing equipment also includes evaluation module (not shown), and evaluation module can call checking data The data of concentration are evaluated the performance of the algorithm.Specifically, can by the warehouse title where validation data set and Data form in validation data set, using programmed instruction (for example, SQL statement executable ODPS), calls validation data set In related data, obtain evaluation result.For example, in the case where being evaluated by the degree of accuracy algorithm, can be tested described The data for playing "+" label in card data set are compared with whole transaction data, so as to reach the degree of accuracy.It should be noted that different Algorithm evaluation index it is different, algorithm can be associated with evaluation index in advance, when selecting a certain algorithm, can called and this The corresponding evaluation index of algorithm.
The data processing equipment of the present invention, can be labelled by using feedback information to result data collection and by mark Data set is iterated training to algorithm as validation data set, so as to be optimized to algorithm, improves the performance of algorithm. Further, data processing equipment of the invention can be evaluated algorithm by using validation data set, so as to straight Sight and the quantitatively performance of evaluation algorithms.In addition, the data processing equipment of the present invention can also be to the algorithms of different under different scenes It is estimated, compatibility is strong.
System, device, module or unit that above-described embodiment is illustrated, can specifically be realized by computer chip or entity, Or realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cell phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet PC, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, this is being implemented The function of each unit can be realized in same or multiple softwares and/or hardware during application.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can be used in one or more computers for wherein including computer usable program code The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram are described.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which is produced, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and internal memory.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of key elements are not only including those key elements, but also wrap Include other key elements being not expressly set out, or also include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described Also there is other identical element in process, method, commodity or the equipment of element.
The application can be described in the general context of computer executable instructions, such as program Module.Usually, program module includes performing particular task or realizes routine, program, object, the group of particular abstract data type Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these DCEs, by Remote processing devices connected by communication network perform task.In a distributed computing environment, program module can be with Positioned at including in the local and remote computer-readable storage medium including storage device.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.It is real especially for system Apply for example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.
Embodiments herein is the foregoing is only, the application is not limited to.For those skilled in the art For, the application can have various modifications and variations.It is all any modifications made within spirit herein and principle, equivalent Replace, improve etc., it should be included within the scope of claims hereof.The foregoing is only embodiments herein and , it is not limited to the application.To those skilled in the art, the application can have various modifications and variations.It is all Any modification, equivalent substitution and improvements made within spirit herein and principle etc., the right that should be included in the application will Within the scope of asking.

Claims (14)

1. a kind of data processing method, it is characterised in that including:
Training dataset corresponding with algorithm is extracted from source data set;
Training dataset is handled by the algorithm, result data collection is generated;
Result data collection is labelled according to feedback information, marking data collection is generated;
Marking data collection is stored as validation data set according to the storage mode of set of source data.
2. the method as described in claim 1, it is characterised in that extracting training data corresponding with algorithm from source data set Before collection, in addition to:It is determined that algorithm corresponding with the application scenarios of selection.
3. method as claimed in claim 2, it is characterised in that it is determined that before algorithm corresponding with the application scenarios of selection, Also include:In advance by application scenarios and algorithm corresponding with application scenarios is associated storage.
4. the method as described in any claim in claims 1 to 3, it is characterised in that the set of source data be by from The data set extracted in multitype database carries out the data set after data cleansing.
5. the method as described in claim 1, it is characterised in that the set of source data is according to each column tables of data corresponding with feature Form is stored.
6. the method as described in claim 1, it is characterised in that after generation marking data collection, in addition to:Call checking number The performance of the algorithm is evaluated according to the data of concentration.
7. the method as described in claim 1, marking data collection is stored as according to the storage mode of set of source data to verify data The step of collection, includes:
Marking data collection is converted into the identic tables of data of tables of data with set of source data;
Stored the tables of data after conversion as validation data set into the data warehouse where the set of source data.
8. a kind of data processing equipment, it is characterised in that including:
Extraction module, training dataset corresponding with algorithm is extracted from source data set;
First generation module, is handled training dataset by the algorithm, generates result data collection;
Second generation module, labels according to feedback information to result data collection, generates marking data collection;
Memory module, validation data set is stored as by marking data collection according to the storage mode of set of source data.
9. equipment as claimed in claim 8, it is characterised in that also include:Determining module, it is determined that the application scenarios pair with selection The algorithm answered, so that extraction module extracts training dataset corresponding with the algorithm from source data set.
10. equipment as claimed in claim 9, it is characterised in that also include:Memory module, in advance by application scenarios and with answering Storage is associated with the corresponding algorithm of scene.
11. the equipment as described in any claim in claim 8 to 10, it is characterised in that the set of source data be by The data set extracted from multitype database carries out the data set after data cleansing.
12. equipment as claimed in claim 8, it is characterised in that the set of source data is according to each column data corresponding with feature Sheet form is stored.
13. equipment as claimed in claim 8, it is characterised in that also include:Evaluation module, calls the number in validation data set Evaluated according to the performance to the algorithm.
14. equipment as claimed in claim 8, it is characterised in that memory module is converted to marking data collection and set of source data Tables of data identic tables of data;The set of source data institute is arrived using the tables of data after conversion as validation data set storage Data warehouse in.
CN201611140090.XA 2016-12-12 2016-12-12 Data processing method and apparatus thereof Active CN107016028B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611140090.XA CN107016028B (en) 2016-12-12 2016-12-12 Data processing method and apparatus thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611140090.XA CN107016028B (en) 2016-12-12 2016-12-12 Data processing method and apparatus thereof

Publications (2)

Publication Number Publication Date
CN107016028A true CN107016028A (en) 2017-08-04
CN107016028B CN107016028B (en) 2020-07-14

Family

ID=59439591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611140090.XA Active CN107016028B (en) 2016-12-12 2016-12-12 Data processing method and apparatus thereof

Country Status (1)

Country Link
CN (1) CN107016028B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108198268A (en) * 2017-12-19 2018-06-22 江苏极熵物联科技有限公司 A kind of production equipment data scaling method
CN109213656A (en) * 2018-07-23 2019-01-15 武汉智领云科技有限公司 A kind of interactive mode big data dysgnosis detection system and method
CN110954110A (en) * 2019-12-10 2020-04-03 西安电子科技大学 X-ray pulsar navigation processing system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120221510A1 (en) * 2010-03-31 2012-08-30 International Business Machines Corporation Method and system for validating data
CN105120217A (en) * 2015-08-21 2015-12-02 上海小蚁科技有限公司 Intelligent camera motion detection alarm system and method based on big data analysis and user feedback
CN105279382A (en) * 2015-11-10 2016-01-27 成都数联易康科技有限公司 Medical insurance abnormal data on-line intelligent detection method
CN106097346A (en) * 2016-06-13 2016-11-09 中国科学技术大学 A kind of video fire hazard detection method of self study

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120221510A1 (en) * 2010-03-31 2012-08-30 International Business Machines Corporation Method and system for validating data
CN105120217A (en) * 2015-08-21 2015-12-02 上海小蚁科技有限公司 Intelligent camera motion detection alarm system and method based on big data analysis and user feedback
CN105279382A (en) * 2015-11-10 2016-01-27 成都数联易康科技有限公司 Medical insurance abnormal data on-line intelligent detection method
CN106097346A (en) * 2016-06-13 2016-11-09 中国科学技术大学 A kind of video fire hazard detection method of self study

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108198268A (en) * 2017-12-19 2018-06-22 江苏极熵物联科技有限公司 A kind of production equipment data scaling method
CN109213656A (en) * 2018-07-23 2019-01-15 武汉智领云科技有限公司 A kind of interactive mode big data dysgnosis detection system and method
CN110954110A (en) * 2019-12-10 2020-04-03 西安电子科技大学 X-ray pulsar navigation processing system

Also Published As

Publication number Publication date
CN107016028B (en) 2020-07-14

Similar Documents

Publication Publication Date Title
US9774681B2 (en) Cloud process for rapid data investigation and data integrity analysis
CN107957957A (en) The acquisition methods and device of test case
CN107038484A (en) Method and apparatus for handling service request
US20210112101A1 (en) Data set and algorithm validation, bias characterization, and valuation
EP3227799A2 (en) Auto-encoder enhanced self-diagnostic components for model monitoring
US20160154840A1 (en) Avoid double counting of mapped database data
CN104778123B (en) A kind of method and device of detecting system performance
US9304991B2 (en) Method and apparatus for using monitoring intent to match business processes or monitoring templates
CN107248052A (en) A kind of commodity stocks information determines method, apparatus and system
CN109472609A (en) A kind of air control method for determining reason and device
CN107330776A (en) One kind book keeping operation and the detailed detection method and device of abnormal book keeping operation
CN107633015A (en) A kind of data processing method, device and equipment
CN111260368A (en) Account transaction risk judgment method and device and electronic equipment
US20230126764A1 (en) Mixed quantum-classical method for fraud detection with quantum feature selection
CN107016028A (en) Data processing method and its equipment
CN107330572A (en) Air control method, apparatus and system
CN109711849B (en) Ether house address portrait generation method and device, electronic equipment and storage medium
US20110191143A1 (en) Method and Apparatus for Specifying Monitoring Intent of a Business Process or Monitoring Template
CN110458581B (en) Method and device for identifying business turnover abnormality of commercial tenant
Mitheran et al. Introducing self-attention to target attentive graph neural networks
CN113094414B (en) Method and device for generating circulation map
CN106997350A (en) A kind of method and device of data processing
US20240152818A1 (en) Methods for mitigation of algorithmic bias discrimination, proxy discrimination and disparate impact
Li et al. Fault diagnosis of PLC-based discrete event systems using Petri nets
CN108228560A (en) A kind of determining method and device of data type

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.