CN108960561A - A kind of air control model treatment method, device and equipment based on unbalanced data - Google Patents

A kind of air control model treatment method, device and equipment based on unbalanced data Download PDF

Info

Publication number
CN108960561A
CN108960561A CN201810417845.9A CN201810417845A CN108960561A CN 108960561 A CN108960561 A CN 108960561A CN 201810417845 A CN201810417845 A CN 201810417845A CN 108960561 A CN108960561 A CN 108960561A
Authority
CN
China
Prior art keywords
sample
processed
sample data
data
white
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810417845.9A
Other languages
Chinese (zh)
Inventor
肖凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810417845.9A priority Critical patent/CN108960561A/en
Publication of CN108960561A publication Critical patent/CN108960561A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features

Abstract

This specification embodiment discloses a kind of air control model treatment method, device and equipment based on unbalanced data, for the sample data to be processed for data nonbalance problem occur, it can be divided for wherein a fairly large number of sample data, obtain multiple data sets, on this basis, the Rating Model of each data set and unallocated sample data building respective numbers can be used, and further scored respectively using Rating Model the sample data to be processed divided.The scoring is able to reflect out the correlation between the sample data to be processed divided and the sample data to be processed not divided.So as to carry out sample sampling based on scoring, certainly, sampling is for this divided part sample data.May finally according to the sample data sampled out and not divided sample data construct air control model, and dispose.

Description

A kind of air control model treatment method, device and equipment based on unbalanced data
Technical field
This application involves field of computer technology more particularly to a kind of air control model treatment sides based on unbalanced data Method, device and equipment.
Background technique
Currently, data nonbalance is a relatively conventional phenomenon in air control scene, the number of black and white sample is typically referred to Amount differs larger (such as: the ratio of black and white sample may only have one thousandth or so).Such case will affect machine learning algorithm Performance, and can further influence the recognition accuracy of risk identification model.
In the prior art, in order to reduce the influence of unbalanced data, the method for generalling use random sampling, to reduce white sample Difference between sheet and black sample quantitatively, and constructed with this and dispose multiple risk identification models.
Based on the prior art, it would be desirable to a kind of more efficiently air control model treatment mode.
Summary of the invention
This specification embodiment provides a kind of air control model treatment method, device and equipment based on unbalanced data, uses To provide a kind of processing mode of significantly more efficient air control model.
A kind of air control model treatment method based on unbalanced data that this specification embodiment provides, comprising:
Obtain the sample data to be processed comprising uneven sample;
It is divided for the sample data to be processed, obtains multiple sample data sets to be processed;
Rating Model is constructed according to obtained the multiple sample data set to be processed is divided, and to the sample to be processed It scores;Wherein, to the scoring of the sample to be processed for characterizing the correlation between unbalanced sample to be processed;
Sample to be processed is sampled according to the scoring, and based on sampling results building air control model and is disposed.
A kind of air control model treatment device based on unbalanced data that this specification embodiment also provides, comprising:
Module is obtained, the sample data to be processed comprising uneven sample is obtained;
Division module is divided for the sample data to be processed, obtains multiple sample data sets to be processed;
Grading module constructs Rating Model according to obtained the multiple sample data set to be processed is divided, and to described Sample to be processed scores;Wherein, to the scoring of the sample to be processed for characterizing between unbalanced sample to be processed Correlation;
Deployment module is constructed, sample to be processed is sampled according to the scoring, and construct air control based on sampling results Model is simultaneously disposed.
A kind of air control model treatment equipment based on unbalanced data that this specification embodiment also provides, comprising:
Memory stores the air control model treatment program based on unbalanced data;
Processor calls the air control model treatment program based on unbalanced data stored in memory, and executes:
Obtain the sample data to be processed comprising uneven sample;
It is divided for the sample data to be processed, obtains multiple sample data sets to be processed;
Rating Model is constructed according to obtained the multiple sample data set to be processed is divided, and to the sample to be processed It scores;Wherein, to the scoring of the sample to be processed for characterizing the correlation between unbalanced sample to be processed;
Sample to be processed is sampled according to the scoring, and based on sampling results building air control model and is disposed.
This specification embodiment use at least one above-mentioned technical solution can reach it is following the utility model has the advantages that
It, can be for wherein a fairly large number of sample for the sample data to be processed for data nonbalance problem occur Data are divided, and multiple data sets are obtained, and on this basis, each data set and unallocated sample data can be used to construct The Rating Model of respective numbers, and further the sample data to be processed divided is commented using Rating Model respectively Point.The scoring is able to reflect out the sample data to be processed divided between the sample data to be processed that is not divided Correlation.So as to carry out sample sampling based on scoring, certainly, sampling is for this divided part sample data. May finally according to the sample data sampled out and not divided sample data construct air control model, and dispose.
Using the above method in this specification embodiment, the mode based on scoring, can more efficiently from quantity compared with The higher sample data of sample data correlation with negligible amounts, the sample thus selected are selected in more sample datas Data can optimize risk model, and eliminate existing data nonbalance problem between sample data.Moreover, in the process, Multiple Rating Models not will do it deployment, and finally only risk model is disposed, so as to reduce the deployment to model Cost.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 is the framework that the air control model treatment method based on unbalanced data that this specification embodiment provides is based on Schematic diagram;
Fig. 2 is the air control model treatment process based on unbalanced data that this specification embodiment provides;
Fig. 3 is the implementation procedure under the scene of black and white sample imbalance that this specification embodiment provides;
Fig. 4 is the structural representation for the air control model treatment device based on unbalanced data that this specification embodiment provides Figure.
Specific embodiment
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with the application specific embodiment and Technical scheme is clearly and completely described in corresponding attached drawing.Obviously, described embodiment is only the application one Section Example, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall in the protection scope of this application.
It should be noted that data nonbalance problem be for data carry out assorting process in caused by, that is, by The difference of sample size included in different data collection is larger after classification.This specification method as described in the examples, in addition to suitable For the imbalance of the black and white sample size in air control scene, the data nonbalance being equally applicable in other two classification scenes is asked Topic.In subsequent description explanation, it will focus on and be illustrated with the black and white sample imbalance problem in air control scene.
Generally, in one or more embodiments of this specification, training sample can be from the history having occurred and that It is determined in business event or in existing business division, specifically, black sample may be considered the business with high risk Event or business division, and white sample is then regarded as normal business event or business division, such as: fraudulent trading is believed that It is black sample, normal transaction is regarded as white sample;Another example is: risk account is regarded as black sample, normal account can recognize To be white sample.
Wherein, the history service event, it is believed that be the business for having executed and having produced in history accordingly result Operation, such as: pay, place an order, transfer accounts, draw a lottery, vote, here mentioned business as a result, may include: successfully, unsuccessfully, Limit power etc., then, it can be according to the business corresponding to business event as a result, to determine that business event is white sample or black sample.
The business division, it is believed that it is the main body for issuing business operation, in this specification embodiment, business division It can include but is not limited to: account, user itself, terminal, server of user etc..Further, user here include but It is not limited to: personal user, enterprise customer, trade company, business provider etc..
Certainly, in practical applications, for black sample and white sample, specifically can according to the needs of practical application into Row definition, is not construed as the restriction to the application here.
This specification air control model treatment method as described in the examples based on data nonbalance can be used as shown in Figure 1 Framework.
In Fig. 1, it (includes: the business event occurred or in sample data that processing equipment, which can obtain sample data, The mark of some business divisions), in these sample datas, the side that corresponding identification model can be first passed through in advance or artificially marked Formula determines black sample therein and white sample.As a result, processing equipment can be executed based on black, white sample model foundation and Deployment.
In general, processing equipment is regarded as providing the business device of business service, such as: being capable of providing transactional services Trading server, be capable of providing down unifunctional server, the server etc. with prize drawing algorithm.Certainly, it is actually answering In, processing equipment should not be limited only to server, it is also possible to the equipment such as mobile phone, tablet computer, computer.
In the case where processing equipment is the scene of server, specifically can be used such as cluster server, distributed server or The framework of single server will be specifically arranged according to the needs of practical application as which kind of framework used, and not make here specific It limits.
In addition, it is as described in the examples based on data to be able to carry out this specification in addition to processing equipment shown in Fig. 1 The executing subject of unbalanced air control model treatment method, it is also possible to the non-hardware main body such as application program/service.Likewise, It will specifically be determined according to the needs of practical application, and should not constitute the restriction to the application here.
It should be noted that predefining, the black and white sample is usually unbalanced, and white sample is often in the great majority.Therefore it needs Execute the method in this specification embodiment.
Technical solution in this specification embodiment described in detail below.
A kind of air control model treatment method based on unbalanced data is provided in this specification embodiment, as shown in Fig. 2, May include specifically following steps:
S201: the sample data to be processed comprising uneven sample is obtained.
Content based on description above-mentioned is it is found that sample data to be processed can derive from corresponding business device, such as: industry Business database, server or the terminal for participating in business etc. can also will generally depend upon specific from the business occurred in real time Business, the acquisition process of sample data to be processed is repeated without excessive here.
It is understood that the sample data to be processed got is determined black sample data and white sample number According to, and there is imbalance problems for black, white sample data, so in order to weaken as much as possible or eliminate black, white sample data it Between imbalance, subsequent step will be executed.
Step S203: being divided for the sample data to be processed, obtains multiple sample data sets to be processed.
In this specification embodiment, division to sample data to be processed is usually directed to a fairly large number of sample number According to being divided, and for the sample data of negligible amounts itself, usually without dividing.After being divided, phase can be obtained Answer the sample data set to be processed of quantity.
What needs to be explained here is that in practical applications, corresponding division number can be arranged according to the needs of practical application Amount, here and is not especially limited.
Step S205: Rating Model is constructed according to obtained the multiple sample data set to be processed is divided, and to described Sample to be processed scores.
Wherein, the scoring is for characterizing the correlation between unbalanced data.
As previously mentioned, in this specification embodiment, sample data to be processed generally comprise two class sample datas (such as: it is black, White sample), in these sample datas, especially in that biggish a kind of sample data of quantity, with the lesser one kind of data volume Correlation between sample data is different.
For example, the risk trade under the scene using historical trading as sample data to be processed, in historical trading It can be regarded as black sample, and other transaction for not being judged as risk then can be regarded as white sample.May have in white sample Part transaction be actually risk trade, for no other reason than that it is unrecognized go out, to be classified as white sample.But this part is unrecognized It is that there is certain general character (that is, having correlation) between risk trade out and black sample.Therefore, in order to quantify therebetween Correlation, so that it may construct corresponding Rating Model.
Here it is noted that in this specification embodiment, above-mentioned Rating Model not will do it deployment publication, and It is only to carry out scoring processing for sample data to be processed.
Step S207: sample to be processed is sampled according to the scoring, and constructs air control model based on sampling results And it disposes.
After being scored for each sample data to be processed, it can be carried out based on the score value of each sample data to be processed Sampling, to construct corresponding air control model.
In conjunction with aforementioned, the model of final deployment publication only has one, that is, air control model described in this step S207.
It through the above steps, can be for wherein for the sample data to be processed for data nonbalance problem occur A fairly large number of sample data is divided, and multiple data sets are obtained, on this basis, can be used each data set with it is unallocated Sample data building respective numbers Rating Model, and further respectively using Rating Model to the sample to be processed divided Notebook data scores.The scoring is able to reflect out the sample data to be processed divided and the sample to be processed not divided Correlation between notebook data.So as to carry out sample sampling based on scoring, certainly, sampling is for this divided portion Divide sample data.May finally according to the sample data sampled out and not divided sample data construct air control model, and portion Administration.
Using the above method in this specification embodiment, the mode based on scoring, can more efficiently from quantity compared with The higher sample data of sample data correlation with negligible amounts, the sample thus selected are selected in more sample datas Data can optimize risk model, and eliminate existing data nonbalance problem between sample data.Moreover, in the process, Multiple Rating Models not will do it deployment, and finally only risk model is disposed, so as to reduce the deployment to model Cost.
For above content, now it is illustrated by taking black, the white sample under practical application scene as an example.
In practical applications, the quantity of white sample is typically much deeper than the quantity of black sample.So, in this specification embodiment In, it can be divided for white sample.That is, being divided for the sample data to be processed, obtain multiple wait locate Sample data set is managed, which can be with are as follows: divide according to the division numbers of setting to the white sample, obtain setting quantity Multiple white sample sets.
Wherein, the quantity of white sample included in each white sample set may be the same or different.Here do not make It is specific to limit.
After having obtained multiple white sample sets, each white sample set and the building of the black sample of full dose can be used to comment Sub-model.It is to be appreciated that the quantity of obtained Rating Model and the quantity of white sample set are consistent.
Hereafter, Rating Model can be used to score sample data, in this specification embodiment, will used each A Rating Model respectively scores to each white sample data, that is, if there is M Rating Model, then, after scoring, Each white equal M scoring score value of sample data.So the process to score to the sample to be processed can be with are as follows: respectively It is scored using the obtained multiple Rating Models of building for each white sample data, for each white sample data, The scoring score value for counting multiple Rating Models, obtains summarizing score value.
Summarized it is possible to further the M scoring for each white sample data.Certainly, the mode summarized has It is a variety of, such as: summation, weighting etc., as a kind of feasible embodiment, by the way of summation.It should be noted that the remittance The size of total score reflects white sample close to the degree of black sample, that is, score value is bigger, and the white sample is closer to black sample area Domain is also more difficult to train.
Next, can be sampled based on the score value that summarizes of each white sample data, dialogue sample.Specifically, It, can be by the way of weighted sample in this specification embodiment, that is, sample to be processed is sampled according to the scoring Process can be with are as follows: summarize score value based on described, determine the weight of each white sample data, carried out for white sample data Weighted sample, the white sample data after being sampled.
Wherein, in practical applications, white sample data can be summarized into score value directly as the power of the white sample data Weight, and the bigger weight the easier is drawn.
The quantity of process more than, the quantity and black sample data of the white sample data sampled is almost the same, To eliminate the data nonbalance between black, white sample.
Finally risk model, and carry out portion can be constructed based on the white sample data and black sample data that sampling obtains Administration.
As shown in figure 3, the practical implementation of above-mentioned scene specifically can comprise the following steps that
Step S301: being divided for white sample data, obtains the white sample set of M group.
Step S303: M Rating Model is constructed based on each group of white black sample data of sample set full dose.
Step S305: it is scored using M Rating Model each white sample data, and summarizes point of each white sample data Value.
Step S307: sampling is weighted based on score value dialogue sample data is summarized.
Step S309: the white sample data and black sample data building risk model after sampling are used.
The above are the data processing methods that this specification embodiment provides, and are based on same thinking, this specification embodiment A kind of air control model treatment device based on unbalanced data is also provided, as shown in figure 4, described device includes:
Module 401 is obtained, the sample data to be processed comprising uneven sample is obtained;
Division module 402 is divided for the sample data to be processed, obtains multiple sample data sets to be processed;
Grading module 403 constructs Rating Model according to obtained the multiple sample data set to be processed is divided, and to institute Sample to be processed is stated to score;Wherein, to the scoring of the sample to be processed for characterize unbalanced sample to be processed it Between correlation;
Deployment module 404 is constructed, sample to be processed is sampled according to the scoring, and construct wind based on sampling results Control model is simultaneously disposed.
Further, the acquisition module 401 obtains to be processed and predetermined black sample data and white sample number According to.
The division module 402 divides the white sample data according to the division numbers of setting, obtains setting number Multiple white sample sets of amount.
Institute's scoring module 403 divides obtained white sample set for any, according to the black of the white sample set and full dose Sample data constructs Rating Model;
Wherein, the quantity of the Rating Model of building is consistent with the quantity of white sample set that division obtains.
Institute's scoring module 402, the multiple Rating Models obtained respectively using building are carried out for each white sample data Scoring, for each white sample data, counts the scoring score value of multiple Rating Models, obtains summarizing score value.
The building deployment module 403, summarizes score value based on described, determines the weight of each white sample data, needle Dialogue sample data is weighted sampling, the white sample data after being sampled.
The building deployment module 403, based on the white sample data and the black sample data building air control after sampling Model, and dispose
Based on device as shown in Figure 4, this specification embodiment also provides a kind of air control model based on unbalanced data Processing equipment (specifically can be such as: server, computer), comprising:
Memory stores the air control model treatment program based on unbalanced data;
Processor calls the air control model treatment program based on unbalanced data stored in memory, and executes:
Obtain the sample data to be processed comprising uneven sample;
It is divided for the sample data to be processed, obtains multiple sample data sets to be processed;
Rating Model is constructed according to obtained the multiple sample data set to be processed is divided, and to the sample to be processed It scores;Wherein, to the scoring of the sample to be processed for characterizing the correlation between unbalanced sample to be processed;
Sample to be processed is sampled according to the scoring, and based on sampling results building air control model and is disposed.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For equipment and medium class embodiment, since it is substantially similar to the method embodiment, so being described relatively simple, related place Illustrate referring to the part of embodiment of the method, just no longer repeats one by one here.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the movement or step recorded in detail in the claims or module can be according to different from embodiments Sequence executes and still may be implemented desired result.In addition, process depicted in the drawing is not necessarily required and is shown Particular order or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing It is also possible or may be advantageous.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, The hardware circuit for realizing the logical method process can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller Device: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc. Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when application.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), the data letter number and carrier wave of such as modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routine, programs, objects, the group for executing particular transaction or realizing particular abstract data type Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Affairs are executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal Replacement, improvement etc., should be included among the interest field of the application.

Claims (15)

1. a kind of air control model treatment method based on unbalanced data, comprising:
Obtain the sample data to be processed comprising uneven sample;
It is divided for the sample data to be processed, obtains multiple sample data sets to be processed;
Rating Model is constructed according to obtained the multiple sample data set to be processed is divided, and the sample to be processed is carried out Scoring;Wherein, to the scoring of the sample to be processed for characterizing the correlation between unbalanced sample to be processed;
Sample to be processed is sampled according to the scoring, and based on sampling results building air control model and is disposed.
2. the method as described in claim 1, the sample data to be processed is obtained, is specifically included: obtained to be processed and in advance The black sample data and white sample data first determined.
3. method according to claim 2 is divided for the sample data to be processed, multiple samples to be processed are obtained Data set specifically includes:
The white sample data is divided according to the division numbers of setting, obtains multiple white sample sets of setting quantity.
4. method according to claim 2 constructs scoring mould according to obtained the multiple sample data set to be processed is divided Type specifically includes:
For any white sample set for dividing and obtaining, scoring mould is constructed according to the black sample data of the white sample set and full dose Type;
Wherein, the quantity of the Rating Model of building is consistent with the quantity of white sample set that division obtains.
5. method according to claim 2 scores to the sample to be processed, specifically includes:
The multiple Rating Models obtained respectively using building are scored for each white sample data;
For each white sample data, the scoring score value of multiple Rating Models is counted, obtains summarizing score value.
6. method as claimed in claim 5 is sampled sample to be processed according to the scoring, specifically includes:
Summarize score value based on described, determines the weight of each white sample data;
It is weighted sampling for white sample data, the white sample data after being sampled.
7. method as claimed in claim 6, constructing air control model based on sampling results and disposing, specifically include:
Based on after sampling white sample data and the black sample data construct air control model, and dispose.
8. a kind of air control model treatment device based on unbalanced data, comprising:
Module is obtained, the sample data to be processed comprising uneven sample is obtained;
Division module is divided for the sample data to be processed, obtains multiple sample data sets to be processed;
Grading module constructs Rating Model according to obtained the multiple sample data set to be processed is divided, and to described wait locate Reason sample scores;Wherein, to the scoring of the sample to be processed for characterizing the phase between unbalanced sample to be processed Guan Xing;
Deployment module is constructed, sample to be processed is sampled according to the scoring, and construct air control model based on sampling results And it disposes.
9. device as claimed in claim 8, the acquisition module, obtain to be processed and predetermined black sample data and White sample data.
10. device as claimed in claim 9, the division module, according to the division numbers of setting to the white sample data It is divided, obtains multiple white sample sets of setting quantity.
11. device as claimed in claim 9, institute's scoring module, for any white sample set for dividing and obtaining, according to described The black sample data of white sample set and full dose constructs Rating Model;
Wherein, the quantity of the Rating Model of building is consistent with the quantity of white sample set that division obtains.
12. device as claimed in claim 9, institute's scoring module are directed to using multiple Rating Models that building obtains respectively Each white sample data scores, and for each white sample data, counts the scoring score value of multiple Rating Models, obtains Summarize score value.
13. device as claimed in claim 12, the building deployment module summarize score value based on described, determine each described The weight of white sample data, is weighted sampling for white sample data, the white sample data after being sampled.
14. device as claimed in claim 13, the building deployment module, based on white sample data after sampling and described Black sample data constructs air control model, and disposes.
15. a kind of air control model treatment equipment based on unbalanced data, comprising:
Memory stores the air control model treatment program based on unbalanced data;
Processor calls the air control model treatment program based on unbalanced data stored in memory, and executes:
Obtain the sample data to be processed comprising uneven sample;
It is divided for the sample data to be processed, obtains multiple sample data sets to be processed;
Rating Model is constructed according to obtained the multiple sample data set to be processed is divided, and the sample to be processed is carried out Scoring;Wherein, to the scoring of the sample to be processed for characterizing the correlation between unbalanced sample to be processed;
Sample to be processed is sampled according to the scoring, and based on sampling results building air control model and is disposed.
CN201810417845.9A 2018-05-04 2018-05-04 A kind of air control model treatment method, device and equipment based on unbalanced data Pending CN108960561A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810417845.9A CN108960561A (en) 2018-05-04 2018-05-04 A kind of air control model treatment method, device and equipment based on unbalanced data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810417845.9A CN108960561A (en) 2018-05-04 2018-05-04 A kind of air control model treatment method, device and equipment based on unbalanced data

Publications (1)

Publication Number Publication Date
CN108960561A true CN108960561A (en) 2018-12-07

Family

ID=64498917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810417845.9A Pending CN108960561A (en) 2018-05-04 2018-05-04 A kind of air control model treatment method, device and equipment based on unbalanced data

Country Status (1)

Country Link
CN (1) CN108960561A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740750A (en) * 2018-12-17 2019-05-10 北京深极智能科技有限公司 Method of data capture and device
CN109903165A (en) * 2018-12-14 2019-06-18 阿里巴巴集团控股有限公司 A kind of model merging method and device
CN111242195A (en) * 2020-01-06 2020-06-05 支付宝(杭州)信息技术有限公司 Model, insurance wind control model training method and device and electronic equipment
CN111581197A (en) * 2020-04-30 2020-08-25 中国工商银行股份有限公司 Method and device for sampling and checking data table in data set

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980202A (en) * 2010-11-04 2011-02-23 西安电子科技大学 Semi-supervised classification method of unbalance data
CN103049759A (en) * 2012-12-14 2013-04-17 上海邮政科学研究院 Postal code recognition method for postal sorting system
CN103106365A (en) * 2013-01-25 2013-05-15 北京工业大学 Detection method for malicious application software on mobile terminal
CN104503874A (en) * 2014-12-29 2015-04-08 南京大学 Hard disk failure prediction method for cloud computing platform
CN105760889A (en) * 2016-03-01 2016-07-13 中国科学技术大学 Efficient imbalanced data set classification method
CN106651373A (en) * 2016-12-02 2017-05-10 中国银联股份有限公司 Method and device for establishing mixed fraudulent trading detection classifier
CN106778853A (en) * 2016-12-07 2017-05-31 中南大学 Unbalanced data sorting technique based on weight cluster and sub- sampling
CN106960387A (en) * 2017-04-28 2017-07-18 浙江工商大学 Individual credit risk appraisal procedure and system
CN107944460A (en) * 2016-10-12 2018-04-20 甘肃农业大学 One kind is applied to class imbalance sorting technique in bioinformatics

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980202A (en) * 2010-11-04 2011-02-23 西安电子科技大学 Semi-supervised classification method of unbalance data
CN103049759A (en) * 2012-12-14 2013-04-17 上海邮政科学研究院 Postal code recognition method for postal sorting system
CN103106365A (en) * 2013-01-25 2013-05-15 北京工业大学 Detection method for malicious application software on mobile terminal
CN104503874A (en) * 2014-12-29 2015-04-08 南京大学 Hard disk failure prediction method for cloud computing platform
CN105760889A (en) * 2016-03-01 2016-07-13 中国科学技术大学 Efficient imbalanced data set classification method
CN107944460A (en) * 2016-10-12 2018-04-20 甘肃农业大学 One kind is applied to class imbalance sorting technique in bioinformatics
CN106651373A (en) * 2016-12-02 2017-05-10 中国银联股份有限公司 Method and device for establishing mixed fraudulent trading detection classifier
CN106778853A (en) * 2016-12-07 2017-05-31 中南大学 Unbalanced data sorting technique based on weight cluster and sub- sampling
CN106960387A (en) * 2017-04-28 2017-07-18 浙江工商大学 Individual credit risk appraisal procedure and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孟晓龙: "基于机器学习的推荐技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109903165A (en) * 2018-12-14 2019-06-18 阿里巴巴集团控股有限公司 A kind of model merging method and device
CN109903165B (en) * 2018-12-14 2020-10-16 阿里巴巴集团控股有限公司 Model merging method and device
TWI718690B (en) * 2018-12-14 2021-02-11 開曼群島商創新先進技術有限公司 Model merging method and device
CN109740750A (en) * 2018-12-17 2019-05-10 北京深极智能科技有限公司 Method of data capture and device
CN111242195A (en) * 2020-01-06 2020-06-05 支付宝(杭州)信息技术有限公司 Model, insurance wind control model training method and device and electronic equipment
CN111242195B (en) * 2020-01-06 2023-06-20 蚂蚁胜信(上海)信息技术有限公司 Model, insurance wind control model training method and device and electronic equipment
CN111581197A (en) * 2020-04-30 2020-08-25 中国工商银行股份有限公司 Method and device for sampling and checking data table in data set
CN111581197B (en) * 2020-04-30 2023-06-13 中国工商银行股份有限公司 Method and device for sampling and checking data table in data set

Similar Documents

Publication Publication Date Title
CN108960561A (en) A kind of air control model treatment method, device and equipment based on unbalanced data
TW201942826A (en) Payment mode recommendation method and device and equipment
WO2019154162A1 (en) Risk control rule generation method and apparatus
CN108345580A (en) A kind of term vector processing method and processing device
CN109101620A (en) Similarity calculating method, clustering method, device, storage medium and electronic equipment
CN109447469A (en) A kind of Method for text detection, device and equipment
CN107679082A (en) Question and answer searching method, device and electronic equipment
CN109003075A (en) A kind of Risk Identification Method and device
CN107679700A (en) Business flow processing method, apparatus and server
CN107391527A (en) A kind of data processing method and equipment based on block chain
CN109299269A (en) A kind of file classification method and device
CN109086961A (en) A kind of Information Risk monitoring method and device
CN109389974A (en) A kind of method and device of voice operating
CN108346107A (en) A kind of social content Risk Identification Method, device and equipment
CN108876102A (en) A kind of risk trade method for digging, device and equipment
CN110457578A (en) A kind of customer service demand recognition methods and device
CN107622413A (en) A kind of price sensitivity computational methods, device and its equipment
CN109710732A (en) Information query method, device, storage medium and electronic equipment
CN109299096A (en) A kind of processing method of pipelined data, device and equipment
CN109615171A (en) Characteristic threshold value determines that method and device, problem objects determine method and device
CN110390182A (en) A kind of method, system and the equipment of determining small routine classification
CN109597678A (en) Task processing method and device
CN110033092B (en) Data label generation method, data label training device, event recognition method and event recognition device
CN108875743A (en) A kind of text recognition method and device
CN109598513A (en) A kind of Risk Identification Method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201028

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201028

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.