CN107679734A - It is a kind of to be used for the method and system without label data classification prediction - Google Patents

It is a kind of to be used for the method and system without label data classification prediction Download PDF

Info

Publication number
CN107679734A
CN107679734A CN201710890305.8A CN201710890305A CN107679734A CN 107679734 A CN107679734 A CN 107679734A CN 201710890305 A CN201710890305 A CN 201710890305A CN 107679734 A CN107679734 A CN 107679734A
Authority
CN
China
Prior art keywords
data
business tine
business
characteristic index
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710890305.8A
Other languages
Chinese (zh)
Inventor
田斌
王纯斌
赵红军
覃进学
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sefon Software Co Ltd
Original Assignee
Chengdu Sefon Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sefon Software Co Ltd filed Critical Chengdu Sefon Software Co Ltd
Priority to CN201710890305.8A priority Critical patent/CN107679734A/en
Publication of CN107679734A publication Critical patent/CN107679734A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

It is used for the method and system without label data classification prediction the invention discloses a kind of, it both make use of historical data, there is no hard requirement to historical data label again, and combine the posterior infromation database accumulated in business procedure, it can subsequently be continued to optimize by True Data, reach the automatic precision of prediction that improves and obtain purpose.This method includes incoming traffic flow data, obtains multiple business scenario data in operation flow;Incoming traffic content-data, and business tine data are grouped according to business scenario data;According to business scenario data and corresponding business tine database, the characteristic index of construction business tine data;The characteristic index of business tine data is cleaned;Business tine data by characteristic index cleaning are clustered, and determine all kinds of class centers;Calculate all kinds of sample weights;Business tine data are sampled according to sample weight, and to sampled result data markers prediction label.

Description

It is a kind of to be used for the method and system without label data classification prediction
Technical field
The present invention relates to data classification electric powder prediction, more particularly to a kind of side for being used to predict without label data classification Method and system.
Background technology
Risk supervision is conventional quality determining method, and this method is widely used in the business diagnosis of every profession and trade, for examining Potential risks in survey business, to find and to control in advance.For general enterprises or supervision department, the mode master of risk supervision It is divided into three kinds:One is one by one checking detected object using Quality Inspector, find to be detected the risk of product;The second is Detected object is inspected by random samples, finds to be detected the risk of product;The third is by produce the product information based data and Historical data, the probability of each detected object risk is predicted, actual take out then is carried out to the detected product of high risk Inspection.
In three kinds of risk supervision modes of foregoing description, the first is that full dose data are checked, suitable for detection Mesh is few, the less product of technical difficulty, is often applied to detect product (, the technology single with product that this enterprise is produced The characteristics of simple).Second of detection method usage scenario is similar with the first scene, more to product category, the production of technical sophistication Product do not apply to, and qualified (normal) accounting for being detected product which can count, but can let off a certain proportion of risk and be detected Survey product.The third mainly utilizes existing information system, by being modeled to historical data and (really constructing a grader), root Risk probability is found according to the characteristic of detected product, as long as historical data has label, multiclass product can be applicable, and completely Rule is found from data, is related to less ins and outs, has a wide range of application.
In government regulator, supervised entities' industry for being related to is numerous, rich choice of products.As customs is false to inlet and outlet Trade detects, and will be related to all industries and product for participating in trade.Therefore, above-mentioned first two detection mode needs Expend substantial amounts of manpower and time and seem less suitable.The third mode detects the risk of each detected product from data, needs History tab data are wanted, but due to a variety of causes, many systems are not stored in label data, therefore this method has precision of prediction Low technical problem, also, due to the label that its heavy dependence historical data has marked, can not be applied to pre- without label data The environment of survey, also it is not used to the business scenario to abnormality detection.
The content of the invention
An object of the present invention at least that, for how to overcome the above-mentioned problems of the prior art, there is provided a kind of For the method and system without label data classification prediction, it both make use of historical data, and not hard to historical data label Property require, and combine the posterior infromation database accumulated in business procedure, can subsequently be continued to optimize, reached by True Data Purpose is obtained to the automatic precision of prediction that improves.
To achieve these goals, the technical solution adopted by the present invention includes following aspects.
A kind of to be used for the method without label data classification prediction, it comprises the following steps:
Incoming traffic flow data, obtain multiple business scenario data in operation flow;Incoming traffic content-data, and Business tine data are grouped according to business scenario data;According to business scenario data and corresponding business tine data Storehouse, construct the characteristic index of business tine data;The characteristic index of business tine data is cleaned;To by characteristic index The business tine data of cleaning are clustered, and determine all kinds of class centers;Calculate all kinds of sample weights;According to sample weight Business tine data are sampled, and to sampled result data markers prediction label.
Preferably, the characteristic index of the construction business tine data includes:By the business tine data value and industry of input History service content-data value in business content data base under the identical services scene that stores is compared, according to business tine The degree of closeness construction risk forward direction index of the business tine data value of false trading is produced in database.
Preferably, the characteristic index to business tine data, which carries out cleaning, includes:At characteristic index missing Reason, it is the neutral characteristic index value of business tine data distribution;The characteristic index value of business tine data is normalized in [0,1] To eliminate dimension impact, and characteristic index database is established to record the weighted value of the characteristic index of every business tine data.
Preferably, methods described is used in KMeans, DBSCAN, hierarchical clustering, partition clustering and spectral clustering One or more persons are clustered;
Clustering the distance metric used is included in Euclidean distance, Minkowski Distance and cosine similarity One or more persons.
Preferably, all kinds of sample weight of the calculating includes:Using equation below Calculate all kinds of sample weight wi, wherein, j is the quantity of class,Represent the i-th class center to square of origin of coordinates distance.
Preferably, methods described further comprises:Labeled sampled data is divided into instruction according to default ratio Practice collection and test set;According to new business tine data and its corresponding prediction result and actual result data are inputted, to training Collection is expanded.
Preferably, methods described further comprises:Bayes classifier is constructed using labeled sampled data, and Prediction result is tested on test set, if accuracy rate is less than predetermined threshold value, the data in training set change business The characteristic index of content-data, and all kinds of sample weights, to obtain the higher prediction result of accuracy rate.
Preferably, methods described further comprises:Bayes classifier is constructed using labeled sampled data, and Prediction result is tested on test set, if the accuracy rate of prediction is more than or equal to predetermined threshold value, inputs new business tine Data, and prediction result is obtained, and indicating risk is carried out according to prediction result.
Preferably, methods described further comprises:Repeat one of described each step or more persons, and each step In to the selection of distance function, class in the construction of business tine data, the number of characteristic index, the selection of clustering algorithm, cluster The selection of number, the calculation formula at class center, the ratio of sampled data is some or all of when repeating differs.
A kind of to be used for the system without label data classification prediction, it includes at least electronic equipment by network connection With a database server;
Wherein, the electronic equipment includes at least one processor, and is connected with least one processor communication Memory;The memory storage have can by the instruction of at least one computing device, the instruction by it is described at least One computing device, so that at least one processor is able to carry out preceding method;
The database server is used for storage service content data base and characteristic index database.
In summary, by adopting the above-described technical solution, the present invention at least has the advantages that:
(1) present invention is applied to most of environment to being predicted without label data, is particularly suitable for use in and exception (risk) is examined The business scenario of survey, solve the problems, such as under this kind of environment can not Direct Modeling new data is given a forecast.
(2) present invention need not artificially label to the historical data without label, save the great effort of mankind's mark And the time, it with only historical data and the prior information of business expert, you can modeling analysis.
(3) present invention utilizes the thought of classical Bayes, that is, priori and possibility predication posterior value, Ke Yizhi are passed through Connect the data identified after reaching the standard grade and add model, Optimized model prediction accuracy.
Brief description of the drawings
Fig. 1 is a kind of flow chart for being used for the method without label data classification prediction according to an embodiment of the invention.
Fig. 2 is a kind of flow chart for being used for the method without label data classification prediction according to another embodiment of the present invention.
Fig. 3 is a kind of structural representation for being used for the system without label data classification prediction according to an embodiment of the invention Figure.
Embodiment
Below in conjunction with the accompanying drawings and embodiment, the present invention will be described in further detail, so that the purpose of the present invention, technology Scheme and advantage are more clearly understood.It should be appreciated that specific embodiment described herein is only to explain the present invention, and do not have to It is of the invention in limiting.
Fig. 1 shows a kind of method for being used to predict without label data classification according to embodiments of the present invention, hereafter with it Exemplified by being detected applied to customs service false trading, this method is described in detail.
Step 101:Incoming traffic flow data, obtain multiple business scenario data in operation flow
For example, for the whole process of customs's clearance, customs's clearance business scenario is divided into:Live customs's reception customs declaration, It is total close concentrate document examination, live customs examination & verification customs declaration, examination & verification examination result list, determine collection of duties and fees, to sign and issue customs document etc. more Individual business scenario.
Step 102:Incoming traffic content-data, and business tine data are grouped according to business scenario data
Specifically, can will Description of Goods corresponding with packing list, invoice, contract, verifying and writing-off instrument, port clearance or strip, commodity The information such as coding, freight charges, premium, number of packages, weight, size, value of goods are classified according to business scenario, so that clearance Each single item business tine data can be corresponding with one or more business scenario.
Step 103:According to business scenario data and corresponding business tine database, the feature of construction business tine data Index
Wherein it is possible to the identical services field that will be stored in the business tine data value currently inputted and business tine database History service content-data value under scape is compared, the business tine data with producing false trading in business tine database It is bigger to be worth the possibility of the business tine data value generation false trading of closer current input, and is configured to risk accordingly just To index, i.e. desired value is bigger, then the possibility for producing false trading is bigger.Also, when desired value is more than default threshold value, The characteristic index label such as false trading is stamped for the business tine data.
In a preferred embodiment, method of the invention can be applied equally to other it is various there are a large number of services data, But the result label that the characteristic index of business tine data is analyzed is stored in operation system because other a variety of causes do not have, and Subsequently wish to realize the particular surroundings of intelligence aided decision.On the characteristic index label of business tine data, preferably business Clearer and more definite two classification problem of meaning, such as normally with it is abnormal, qualified with unqualified, normal with false two classification problem.And And influence direction of each index to prediction result is determined according to business tine database (forward direction influences or negative sense influences). All characteristics can be the data after going dimension, or the data gone before dimension, but feature need to be referred to Mark carries out sliding-model control.
Step 104:The characteristic index of business tine data is cleaned
The step specifically includes both sides content, first, handling characteristic index missing, refers to second, eliminating feature Dimension impact between mark.Wherein, carrying out processing to characteristic index missing includes, and is the neutral feature of business tine data distribution Desired value.Normalization operation can be used by eliminating dimension impact, such as the characteristic index value normalization of business tine data is existed [0,1], and weighting operations are done to index according to the importance of characteristic index.Specifically, characteristic index database can be established Record the weighted value of the characteristic index of every business tine data.
Step 105:Business tine data by characteristic index cleaning are clustered, and determine all kinds of class centers
It is for instance possible to use such as KMeans, DBSCAN, hierarchical clustering, partition clustering and spectral clustering clustering algorithm come really Determine class center, the coordinate of each business tine data can also be averaged or median determines class center.Cluster use away from Various distance metric methods, such as Euclidean distance, Minkowski Distance, cosine similarity can be used from measurement.
In same embodiment, class center can be determined respectively to improve accuracy with a variety of clustering algorithms.Wherein, gather The quality of class result is assessed and can examined by two ways, and a kind of is the test rating based on distance, such as silhouette coefficient;It is another Kind it is to combine following step 210 to sample out the data risk that data rely on artificial micro-judgment to be sampled out.
Step 106:Calculate all kinds of sample weights
Specifically, because what is taken in step 103 during latent structure is risk forward direction index, i.e. index is bigger, and risk is got over Greatly.Therefore the probability of the false risk of all kinds of generations may be different, and is 0 class because achievement data all normalizes minimum value, Therefore the distance of class center to origin is bigger, then the risk that false trading occurs should be bigger, and sample weight should be bigger, here Equation below can be used to calculate all kinds of sample weights:Wherein, j is the quantity of class,Represent the i-th class center to square of origin of coordinates distance.
Step 107:Business tine data are sampled according to sample weight, and to the pre- mark of sampled result data markers Label
Specifically, can be according to the history false trading ratio data in business tine database, it is determined that needing what is sampled Sample size accounts for the ratio of total number of samples amount, then according to the use weight in step 106 (for example, being more than to sample weight value 0.5 data are sampled) every business tine data of input are sampled, and by the sampled data of acquisition labeled as void False trade, the data not being sampled are then labeled as normal trade, so as to obtain the tentative prediction result of false trading and deposit Enter business tine database.Wherein, the step may occur the sampled data bar number of certain class and be more than such sample total, this feelings Condition just directly samples such all data.
Fig. 2 is shown is used for the method without label data classification prediction according to a kind of of further embodiment of the present invention, its Further comprise following individual steps on the basis of above-described embodiment, wherein, step 201~207 respectively with step 101~107 Corresponding, difference is, step 203 is according to business scenario data and corresponding business tine database, in construction business Holding the characteristic index of data also includes modifying to it afterwards;Step 206 also includes after all kinds of sample weights is calculated It is modified.
Step 208:Structure/expansion training set and test set
Specifically, can by labeled sampled data according to default ratio (for example, 8:2) be divided into training set and Test set.Further, can also according to inputting new business tine data and its corresponding prediction result and actual result data, Training set is expanded.
Step 209:Bayes classifier is constructed, and effect detection is predicted to test set
Wherein it is possible to construct Bayes classifier using labeled sampled data, and tested in advance on test set Result is surveyed, if the accuracy rate of prediction is more than or equal to predetermined threshold value (for example, 75%), performs step 210;If accuracy rate Less than predetermined threshold value, then perform step 203 and step 206 respectively, and the data in training set change business tine number According to characteristic index, and all kinds of sample weight, to obtain the higher prediction result of accuracy rate.Wherein, in training set Characteristic can be data after nonterminal character cleans or not make the data of characteristic processing and (but need to fill Missing values), Bayes classifier is built after then making sliding-model control to characteristic, prediction effect is examined on test set.
Step 210:New business tine data are inputted, and obtain prediction result, and risk is carried out according to prediction result and carried Show
Specifically, for marking the sampled data output for being in the prediction result of the new business content-data of acquisition During more than threshold value under specific transactions scene, false Risk-warning information is sent by human-computer interaction interface, reminds customs's work Personnel carry out artificial detection, and testing result is stored in business tine database, and training set is expanded by step 208 Fill, further to improve predictablity rate.And it is possible to one or more in above steps is repeated, And to distance function in the construction or the number of extraction feature of business and data, the selection of clustering algorithm, cluster in each step Selection, the selection of classification number, the calculation formula at class center, ratio of sampled data etc., it can be carried out when repeating different Selection, combined with forming the higher hyper parameter of predictablity rate.And this method can be repaiied with the abnormal data detected Positive prior information, so as to lift precision of prediction.
Fig. 3 shows a kind of system for being used to predict without label data classification according to embodiments of the present invention, and it includes logical Cross an at least electronic equipment 310 and a database server 330 for the connection of network 320.
Wherein, the electronic equipment 310 includes at least one processor 311, and leads to at least one processor Believe the memory 312 of connection;The memory 312 is stored with the instruction that can be performed by least one processor 311, described Instruction is performed by least one processor 311, so that at least one processor 311 is able to carry out foregoing any implementation Method disclosed in example.The database server 330 is used for storage service content data base 331 and characteristic index database 332。
Above-described embodiment first extracts or constructed the characteristic index significant to class categories according to business, and characteristic is entered Row cleaning, then make cluster operation in characteristic index after cleaning, all kinds of cluster centres is obtained, according to each coordinate value in class center Size determine marking data sample weight, then weight sampling Various types of data be used as priori as abnormal data, finally The data structure Bayes classifier of existing label is used to classify to new data, detects business risk.In practical business system , can be by determining true tag to excessive risk Data Detection in system, and database is stored in, correct prior probability so that model Automatically adjust accuracy.The present invention is applied to most of environment to being predicted without label data, solves the nothing under this kind of environment The problem of method Direct Modeling gives a forecast to new data, the prior information of business expert is make use of well.
It is described above, the only detailed description of the specific embodiment of the invention, rather than limitation of the present invention.Correlation technique The technical staff in field is not in the case where departing from the principle and scope of the present invention, various replacements, modification and the improvement made It should be included in the scope of the protection.

Claims (10)

1. a kind of be used for the method without label data classification prediction, it is characterised in that the described method comprises the following steps:
Incoming traffic flow data, obtain multiple business scenario data in operation flow;Incoming traffic content-data, and according to Business scenario data are grouped to business tine data;According to business scenario data and corresponding business tine database, structure Make the characteristic index of business tine data;The characteristic index of business tine data is cleaned;To being cleaned by characteristic index Business tine data clustered, and determine all kinds of class centers;Calculate all kinds of sample weights;According to sample weight to industry Business content-data is sampled, and to sampled result data markers prediction label.
2. according to the method for claim 1, it is characterised in that the characteristic index of the construction business tine data includes: History service content number under the identical services scene that will be stored in the business tine data value of input and business tine database It is compared according to value, is constructed according to the degree of closeness of the business tine data value with producing false trading in business tine database Risk forward direction index.
3. according to the method for claim 1, it is characterised in that the characteristic index to business tine data is cleaned Including:Characteristic index missing is handled, is the neutral characteristic index value of business tine data distribution;By business tine data Characteristic index value normalize and eliminate dimension impact in [0,1], and establish characteristic index database to record in every business Hold the weighted value of the characteristic index of data.
4. according to the method for claim 1, it is characterised in that methods described using KMeans, DBSCAN, hierarchical clustering, One of partition clustering and spectral clustering or more persons are clustered;
Clustering the distance metric used includes one of Euclidean distance, Minkowski Distance and cosine similarity Or more persons.
5. according to the method for claim 1, it is characterised in that all kinds of sample weight of the calculating includes:Using as follows FormulaCalculate all kinds of sample weight wi, wherein, j is the quantity of class,Represent i-th Square of the class center to origin of coordinates distance.
6. according to the method for claim 1, it is characterised in that methods described further comprises:By labeled sampling Data are divided into training set and test set according to default ratio;According to the new business tine data of input and its corresponding prediction As a result with actual result data, training set is expanded.
7. according to the method for claim 6, it is characterised in that methods described further comprises:Adopted using labeled Sample data construct Bayes classifier, and prediction result is tested on test set, if accuracy rate is less than predetermined threshold value, root The characteristic index of business tine data, and all kinds of sample weights are changed according to the data in training set, it is accurate to obtain The higher prediction result of rate.
8. according to the method for claim 6, it is characterised in that methods described further comprises:Adopted using labeled Sample data construct Bayes classifier, and prediction result is tested on test set, if the accuracy rate of prediction is more than or equal to Predetermined threshold value, then new business tine data are inputted, and obtain prediction result, and indicating risk is carried out according to prediction result.
9. according to the method for claim 1, it is characterised in that methods described further comprises:Repeat each step One of rapid or more persons, and to the construction of business tine data, the choosing of the number of characteristic index, clustering algorithm in each step Select, cluster in the selection of distance function, the selection of class number, the calculation formula at class center, the ratio of sampled data repeat hold It is some or all of during row to differ.
10. a kind of be used for the system without label data classification prediction, it is characterised in that the system is included by network connection An at least electronic equipment and a database server;
Wherein, the electronic equipment includes at least one processor, and is deposited with what at least one processor communication was connected Reservoir;The memory storage has can be by the instruction of at least one computing device, and the instruction is by described at least one Computing device, so that at least one processor is able to carry out the side any one of foregoing any claim 1 to 9 Method;
The database server is used for storage service content data base and characteristic index database.
CN201710890305.8A 2017-09-27 2017-09-27 It is a kind of to be used for the method and system without label data classification prediction Pending CN107679734A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710890305.8A CN107679734A (en) 2017-09-27 2017-09-27 It is a kind of to be used for the method and system without label data classification prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710890305.8A CN107679734A (en) 2017-09-27 2017-09-27 It is a kind of to be used for the method and system without label data classification prediction

Publications (1)

Publication Number Publication Date
CN107679734A true CN107679734A (en) 2018-02-09

Family

ID=61137558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710890305.8A Pending CN107679734A (en) 2017-09-27 2017-09-27 It is a kind of to be used for the method and system without label data classification prediction

Country Status (1)

Country Link
CN (1) CN107679734A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108933785A (en) * 2018-06-29 2018-12-04 平安科技(深圳)有限公司 Network risks monitoring method, device, computer equipment and storage medium
CN109034209A (en) * 2018-07-03 2018-12-18 阿里巴巴集团控股有限公司 The training method and device of the real-time identification model of active risk
CN109086975A (en) * 2018-07-10 2018-12-25 阿里巴巴集团控股有限公司 A kind of recognition methods of transaction risk and device
CN109472293A (en) * 2018-10-12 2019-03-15 国家电网有限公司 A kind of grid equipment file data error correction method based on machine learning
CN110175113A (en) * 2019-04-18 2019-08-27 阿里巴巴集团控股有限公司 Business scenario determines method and apparatus
CN110297909A (en) * 2019-07-05 2019-10-01 中国工商银行股份有限公司 A kind of classification method and device of no label corpus
CN110750681A (en) * 2018-07-05 2020-02-04 武汉斗鱼网络科技有限公司 Account similarity calculation method, storage medium, electronic device and system
CN110807024A (en) * 2019-10-12 2020-02-18 广州市申迪计算机系统有限公司 Dynamic threshold anomaly detection method and system, storage medium and intelligent device
CN111192126A (en) * 2019-12-27 2020-05-22 航天信息股份有限公司 Invoice false-proof method and system based on big data analysis
CN114332500A (en) * 2021-09-14 2022-04-12 腾讯科技(深圳)有限公司 Image processing model training method and device, computer equipment and storage medium
CN114741673A (en) * 2022-06-13 2022-07-12 深圳竹云科技股份有限公司 Behavior risk detection method, clustering model construction method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331502A (en) * 2014-11-19 2015-02-04 亚信科技(南京)有限公司 Identifying method for courier data for courier surrounding crowd marketing
CN106446230A (en) * 2016-10-08 2017-02-22 国云科技股份有限公司 Method for optimizing word classification in machine learning text
CN106503438A (en) * 2016-10-20 2017-03-15 上海科瓴医疗科技有限公司 A kind of H RFM user modeling method and system for pharmacy member analysis
CN106778042A (en) * 2017-01-26 2017-05-31 中电科软件信息服务有限公司 Cardio-cerebral vascular disease patient similarity analysis method and system
CN106974660A (en) * 2017-04-20 2017-07-25 重庆邮电大学 The method that blood oxygen feature in being detected based on cerebration realizes sex determination
CN107085729A (en) * 2017-03-13 2017-08-22 西安电子科技大学 A kind of personnel's testing result modification method based on Bayesian inference
CN107093005A (en) * 2017-03-24 2017-08-25 北明软件有限公司 The method that tax handling service hall's automatic classification is realized based on big data mining algorithm

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331502A (en) * 2014-11-19 2015-02-04 亚信科技(南京)有限公司 Identifying method for courier data for courier surrounding crowd marketing
CN106446230A (en) * 2016-10-08 2017-02-22 国云科技股份有限公司 Method for optimizing word classification in machine learning text
CN106503438A (en) * 2016-10-20 2017-03-15 上海科瓴医疗科技有限公司 A kind of H RFM user modeling method and system for pharmacy member analysis
CN106778042A (en) * 2017-01-26 2017-05-31 中电科软件信息服务有限公司 Cardio-cerebral vascular disease patient similarity analysis method and system
CN107085729A (en) * 2017-03-13 2017-08-22 西安电子科技大学 A kind of personnel's testing result modification method based on Bayesian inference
CN107093005A (en) * 2017-03-24 2017-08-25 北明软件有限公司 The method that tax handling service hall's automatic classification is realized based on big data mining algorithm
CN106974660A (en) * 2017-04-20 2017-07-25 重庆邮电大学 The method that blood oxygen feature in being detected based on cerebration realizes sex determination

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108933785A (en) * 2018-06-29 2018-12-04 平安科技(深圳)有限公司 Network risks monitoring method, device, computer equipment and storage medium
CN109034209B (en) * 2018-07-03 2021-07-30 创新先进技术有限公司 Training method and device for active risk real-time recognition model
CN109034209A (en) * 2018-07-03 2018-12-18 阿里巴巴集团控股有限公司 The training method and device of the real-time identification model of active risk
CN110750681A (en) * 2018-07-05 2020-02-04 武汉斗鱼网络科技有限公司 Account similarity calculation method, storage medium, electronic device and system
CN109086975A (en) * 2018-07-10 2018-12-25 阿里巴巴集团控股有限公司 A kind of recognition methods of transaction risk and device
CN109472293A (en) * 2018-10-12 2019-03-15 国家电网有限公司 A kind of grid equipment file data error correction method based on machine learning
CN110175113A (en) * 2019-04-18 2019-08-27 阿里巴巴集团控股有限公司 Business scenario determines method and apparatus
CN110175113B (en) * 2019-04-18 2023-07-14 创新先进技术有限公司 Service scene determination method and device
CN110297909A (en) * 2019-07-05 2019-10-01 中国工商银行股份有限公司 A kind of classification method and device of no label corpus
CN110297909B (en) * 2019-07-05 2021-07-02 中国工商银行股份有限公司 Method and device for classifying unlabeled corpora
CN110807024B (en) * 2019-10-12 2022-04-19 广州市申迪计算机系统有限公司 Dynamic threshold anomaly detection method and system, storage medium and intelligent device
CN110807024A (en) * 2019-10-12 2020-02-18 广州市申迪计算机系统有限公司 Dynamic threshold anomaly detection method and system, storage medium and intelligent device
CN111192126A (en) * 2019-12-27 2020-05-22 航天信息股份有限公司 Invoice false-proof method and system based on big data analysis
CN114332500A (en) * 2021-09-14 2022-04-12 腾讯科技(深圳)有限公司 Image processing model training method and device, computer equipment and storage medium
CN114741673A (en) * 2022-06-13 2022-07-12 深圳竹云科技股份有限公司 Behavior risk detection method, clustering model construction method and device
CN114741673B (en) * 2022-06-13 2022-08-26 深圳竹云科技股份有限公司 Behavior risk detection method, clustering model construction method and device

Similar Documents

Publication Publication Date Title
CN107679734A (en) It is a kind of to be used for the method and system without label data classification prediction
CN111178456B (en) Abnormal index detection method and device, computer equipment and storage medium
Ali et al. An overview of control charts for high‐quality processes
CN108520357B (en) Method and device for judging line loss abnormality reason and server
WO2021052031A1 (en) Statistical interquartile range-based commodity inventory risk early warning method and system, and computer readable storage medium
CN110852856B (en) Invoice false invoice identification method based on dynamic network representation
CN104520806B (en) Abnormality detection for cloud monitoring
CN107454105B (en) Multidimensional network security assessment method based on AHP and grey correlation
CN103370722B (en) The system and method that actual volatility is predicted by small echo and nonlinear kinetics
WO2020062702A9 (en) Method and device for sending text messages, computer device and storage medium
CN112700324A (en) User loan default prediction method based on combination of Catboost and restricted Boltzmann machine
CN111709668A (en) Power grid equipment parameter risk identification method and device based on data mining technology
CN111340086A (en) Method, system, medium and terminal for processing label-free data
CN114266289A (en) Complex equipment health state assessment method
CN115563477B (en) Harmonic data identification method, device, computer equipment and storage medium
WO2023134188A1 (en) Index determination method and apparatus, and electronic device and computer-readable medium
CN109784352A (en) A kind of method and apparatus for assessing disaggregated model
CN112949735A (en) Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining
CN114676749A (en) Power distribution network operation data abnormity judgment method based on data mining
CN115617784A (en) Data processing system and processing method for informationized power distribution
CN115545103A (en) Abnormal data identification method, label identification method and abnormal data identification device
WO2023029065A1 (en) Method and apparatus for evaluating data set quality, computer device, and storage medium
CN114118793A (en) Local exchange risk early warning method, device and equipment
CN114219003A (en) Training method and device of sample generation model and electronic equipment
CN113987240B (en) Customs inspection sample tracing method and system based on knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180209