CN107679734A - It is a kind of to be used for the method and system without label data classification prediction - Google Patents
It is a kind of to be used for the method and system without label data classification prediction Download PDFInfo
- Publication number
- CN107679734A CN107679734A CN201710890305.8A CN201710890305A CN107679734A CN 107679734 A CN107679734 A CN 107679734A CN 201710890305 A CN201710890305 A CN 201710890305A CN 107679734 A CN107679734 A CN 107679734A
- Authority
- CN
- China
- Prior art keywords
- data
- business tine
- business
- characteristic index
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Data Mining & Analysis (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Marketing (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
It is used for the method and system without label data classification prediction the invention discloses a kind of, it both make use of historical data, there is no hard requirement to historical data label again, and combine the posterior infromation database accumulated in business procedure, it can subsequently be continued to optimize by True Data, reach the automatic precision of prediction that improves and obtain purpose.This method includes incoming traffic flow data, obtains multiple business scenario data in operation flow;Incoming traffic content-data, and business tine data are grouped according to business scenario data;According to business scenario data and corresponding business tine database, the characteristic index of construction business tine data;The characteristic index of business tine data is cleaned;Business tine data by characteristic index cleaning are clustered, and determine all kinds of class centers;Calculate all kinds of sample weights;Business tine data are sampled according to sample weight, and to sampled result data markers prediction label.
Description
Technical field
The present invention relates to data classification electric powder prediction, more particularly to a kind of side for being used to predict without label data classification
Method and system.
Background technology
Risk supervision is conventional quality determining method, and this method is widely used in the business diagnosis of every profession and trade, for examining
Potential risks in survey business, to find and to control in advance.For general enterprises or supervision department, the mode master of risk supervision
It is divided into three kinds:One is one by one checking detected object using Quality Inspector, find to be detected the risk of product;The second is
Detected object is inspected by random samples, finds to be detected the risk of product;The third is by produce the product information based data and
Historical data, the probability of each detected object risk is predicted, actual take out then is carried out to the detected product of high risk
Inspection.
In three kinds of risk supervision modes of foregoing description, the first is that full dose data are checked, suitable for detection
Mesh is few, the less product of technical difficulty, is often applied to detect product (, the technology single with product that this enterprise is produced
The characteristics of simple).Second of detection method usage scenario is similar with the first scene, more to product category, the production of technical sophistication
Product do not apply to, and qualified (normal) accounting for being detected product which can count, but can let off a certain proportion of risk and be detected
Survey product.The third mainly utilizes existing information system, by being modeled to historical data and (really constructing a grader), root
Risk probability is found according to the characteristic of detected product, as long as historical data has label, multiclass product can be applicable, and completely
Rule is found from data, is related to less ins and outs, has a wide range of application.
In government regulator, supervised entities' industry for being related to is numerous, rich choice of products.As customs is false to inlet and outlet
Trade detects, and will be related to all industries and product for participating in trade.Therefore, above-mentioned first two detection mode needs
Expend substantial amounts of manpower and time and seem less suitable.The third mode detects the risk of each detected product from data, needs
History tab data are wanted, but due to a variety of causes, many systems are not stored in label data, therefore this method has precision of prediction
Low technical problem, also, due to the label that its heavy dependence historical data has marked, can not be applied to pre- without label data
The environment of survey, also it is not used to the business scenario to abnormality detection.
The content of the invention
An object of the present invention at least that, for how to overcome the above-mentioned problems of the prior art, there is provided a kind of
For the method and system without label data classification prediction, it both make use of historical data, and not hard to historical data label
Property require, and combine the posterior infromation database accumulated in business procedure, can subsequently be continued to optimize, reached by True Data
Purpose is obtained to the automatic precision of prediction that improves.
To achieve these goals, the technical solution adopted by the present invention includes following aspects.
A kind of to be used for the method without label data classification prediction, it comprises the following steps:
Incoming traffic flow data, obtain multiple business scenario data in operation flow;Incoming traffic content-data, and
Business tine data are grouped according to business scenario data;According to business scenario data and corresponding business tine data
Storehouse, construct the characteristic index of business tine data;The characteristic index of business tine data is cleaned;To by characteristic index
The business tine data of cleaning are clustered, and determine all kinds of class centers;Calculate all kinds of sample weights;According to sample weight
Business tine data are sampled, and to sampled result data markers prediction label.
Preferably, the characteristic index of the construction business tine data includes:By the business tine data value and industry of input
History service content-data value in business content data base under the identical services scene that stores is compared, according to business tine
The degree of closeness construction risk forward direction index of the business tine data value of false trading is produced in database.
Preferably, the characteristic index to business tine data, which carries out cleaning, includes:At characteristic index missing
Reason, it is the neutral characteristic index value of business tine data distribution;The characteristic index value of business tine data is normalized in [0,1]
To eliminate dimension impact, and characteristic index database is established to record the weighted value of the characteristic index of every business tine data.
Preferably, methods described is used in KMeans, DBSCAN, hierarchical clustering, partition clustering and spectral clustering
One or more persons are clustered;
Clustering the distance metric used is included in Euclidean distance, Minkowski Distance and cosine similarity
One or more persons.
Preferably, all kinds of sample weight of the calculating includes:Using equation below
Calculate all kinds of sample weight wi, wherein, j is the quantity of class,Represent the i-th class center to square of origin of coordinates distance.
Preferably, methods described further comprises:Labeled sampled data is divided into instruction according to default ratio
Practice collection and test set;According to new business tine data and its corresponding prediction result and actual result data are inputted, to training
Collection is expanded.
Preferably, methods described further comprises:Bayes classifier is constructed using labeled sampled data, and
Prediction result is tested on test set, if accuracy rate is less than predetermined threshold value, the data in training set change business
The characteristic index of content-data, and all kinds of sample weights, to obtain the higher prediction result of accuracy rate.
Preferably, methods described further comprises:Bayes classifier is constructed using labeled sampled data, and
Prediction result is tested on test set, if the accuracy rate of prediction is more than or equal to predetermined threshold value, inputs new business tine
Data, and prediction result is obtained, and indicating risk is carried out according to prediction result.
Preferably, methods described further comprises:Repeat one of described each step or more persons, and each step
In to the selection of distance function, class in the construction of business tine data, the number of characteristic index, the selection of clustering algorithm, cluster
The selection of number, the calculation formula at class center, the ratio of sampled data is some or all of when repeating differs.
A kind of to be used for the system without label data classification prediction, it includes at least electronic equipment by network connection
With a database server;
Wherein, the electronic equipment includes at least one processor, and is connected with least one processor communication
Memory;The memory storage have can by the instruction of at least one computing device, the instruction by it is described at least
One computing device, so that at least one processor is able to carry out preceding method;
The database server is used for storage service content data base and characteristic index database.
In summary, by adopting the above-described technical solution, the present invention at least has the advantages that:
(1) present invention is applied to most of environment to being predicted without label data, is particularly suitable for use in and exception (risk) is examined
The business scenario of survey, solve the problems, such as under this kind of environment can not Direct Modeling new data is given a forecast.
(2) present invention need not artificially label to the historical data without label, save the great effort of mankind's mark
And the time, it with only historical data and the prior information of business expert, you can modeling analysis.
(3) present invention utilizes the thought of classical Bayes, that is, priori and possibility predication posterior value, Ke Yizhi are passed through
Connect the data identified after reaching the standard grade and add model, Optimized model prediction accuracy.
Brief description of the drawings
Fig. 1 is a kind of flow chart for being used for the method without label data classification prediction according to an embodiment of the invention.
Fig. 2 is a kind of flow chart for being used for the method without label data classification prediction according to another embodiment of the present invention.
Fig. 3 is a kind of structural representation for being used for the system without label data classification prediction according to an embodiment of the invention
Figure.
Embodiment
Below in conjunction with the accompanying drawings and embodiment, the present invention will be described in further detail, so that the purpose of the present invention, technology
Scheme and advantage are more clearly understood.It should be appreciated that specific embodiment described herein is only to explain the present invention, and do not have to
It is of the invention in limiting.
Fig. 1 shows a kind of method for being used to predict without label data classification according to embodiments of the present invention, hereafter with it
Exemplified by being detected applied to customs service false trading, this method is described in detail.
Step 101:Incoming traffic flow data, obtain multiple business scenario data in operation flow
For example, for the whole process of customs's clearance, customs's clearance business scenario is divided into:Live customs's reception customs declaration,
It is total close concentrate document examination, live customs examination & verification customs declaration, examination & verification examination result list, determine collection of duties and fees, to sign and issue customs document etc. more
Individual business scenario.
Step 102:Incoming traffic content-data, and business tine data are grouped according to business scenario data
Specifically, can will Description of Goods corresponding with packing list, invoice, contract, verifying and writing-off instrument, port clearance or strip, commodity
The information such as coding, freight charges, premium, number of packages, weight, size, value of goods are classified according to business scenario, so that clearance
Each single item business tine data can be corresponding with one or more business scenario.
Step 103:According to business scenario data and corresponding business tine database, the feature of construction business tine data
Index
Wherein it is possible to the identical services field that will be stored in the business tine data value currently inputted and business tine database
History service content-data value under scape is compared, the business tine data with producing false trading in business tine database
It is bigger to be worth the possibility of the business tine data value generation false trading of closer current input, and is configured to risk accordingly just
To index, i.e. desired value is bigger, then the possibility for producing false trading is bigger.Also, when desired value is more than default threshold value,
The characteristic index label such as false trading is stamped for the business tine data.
In a preferred embodiment, method of the invention can be applied equally to other it is various there are a large number of services data,
But the result label that the characteristic index of business tine data is analyzed is stored in operation system because other a variety of causes do not have, and
Subsequently wish to realize the particular surroundings of intelligence aided decision.On the characteristic index label of business tine data, preferably business
Clearer and more definite two classification problem of meaning, such as normally with it is abnormal, qualified with unqualified, normal with false two classification problem.And
And influence direction of each index to prediction result is determined according to business tine database (forward direction influences or negative sense influences).
All characteristics can be the data after going dimension, or the data gone before dimension, but feature need to be referred to
Mark carries out sliding-model control.
Step 104:The characteristic index of business tine data is cleaned
The step specifically includes both sides content, first, handling characteristic index missing, refers to second, eliminating feature
Dimension impact between mark.Wherein, carrying out processing to characteristic index missing includes, and is the neutral feature of business tine data distribution
Desired value.Normalization operation can be used by eliminating dimension impact, such as the characteristic index value normalization of business tine data is existed
[0,1], and weighting operations are done to index according to the importance of characteristic index.Specifically, characteristic index database can be established
Record the weighted value of the characteristic index of every business tine data.
Step 105:Business tine data by characteristic index cleaning are clustered, and determine all kinds of class centers
It is for instance possible to use such as KMeans, DBSCAN, hierarchical clustering, partition clustering and spectral clustering clustering algorithm come really
Determine class center, the coordinate of each business tine data can also be averaged or median determines class center.Cluster use away from
Various distance metric methods, such as Euclidean distance, Minkowski Distance, cosine similarity can be used from measurement.
In same embodiment, class center can be determined respectively to improve accuracy with a variety of clustering algorithms.Wherein, gather
The quality of class result is assessed and can examined by two ways, and a kind of is the test rating based on distance, such as silhouette coefficient;It is another
Kind it is to combine following step 210 to sample out the data risk that data rely on artificial micro-judgment to be sampled out.
Step 106:Calculate all kinds of sample weights
Specifically, because what is taken in step 103 during latent structure is risk forward direction index, i.e. index is bigger, and risk is got over
Greatly.Therefore the probability of the false risk of all kinds of generations may be different, and is 0 class because achievement data all normalizes minimum value,
Therefore the distance of class center to origin is bigger, then the risk that false trading occurs should be bigger, and sample weight should be bigger, here
Equation below can be used to calculate all kinds of sample weights:Wherein, j is the quantity of class,Represent the i-th class center to square of origin of coordinates distance.
Step 107:Business tine data are sampled according to sample weight, and to the pre- mark of sampled result data markers
Label
Specifically, can be according to the history false trading ratio data in business tine database, it is determined that needing what is sampled
Sample size accounts for the ratio of total number of samples amount, then according to the use weight in step 106 (for example, being more than to sample weight value
0.5 data are sampled) every business tine data of input are sampled, and by the sampled data of acquisition labeled as void
False trade, the data not being sampled are then labeled as normal trade, so as to obtain the tentative prediction result of false trading and deposit
Enter business tine database.Wherein, the step may occur the sampled data bar number of certain class and be more than such sample total, this feelings
Condition just directly samples such all data.
Fig. 2 is shown is used for the method without label data classification prediction according to a kind of of further embodiment of the present invention, its
Further comprise following individual steps on the basis of above-described embodiment, wherein, step 201~207 respectively with step 101~107
Corresponding, difference is, step 203 is according to business scenario data and corresponding business tine database, in construction business
Holding the characteristic index of data also includes modifying to it afterwards;Step 206 also includes after all kinds of sample weights is calculated
It is modified.
Step 208:Structure/expansion training set and test set
Specifically, can by labeled sampled data according to default ratio (for example, 8:2) be divided into training set and
Test set.Further, can also according to inputting new business tine data and its corresponding prediction result and actual result data,
Training set is expanded.
Step 209:Bayes classifier is constructed, and effect detection is predicted to test set
Wherein it is possible to construct Bayes classifier using labeled sampled data, and tested in advance on test set
Result is surveyed, if the accuracy rate of prediction is more than or equal to predetermined threshold value (for example, 75%), performs step 210;If accuracy rate
Less than predetermined threshold value, then perform step 203 and step 206 respectively, and the data in training set change business tine number
According to characteristic index, and all kinds of sample weight, to obtain the higher prediction result of accuracy rate.Wherein, in training set
Characteristic can be data after nonterminal character cleans or not make the data of characteristic processing and (but need to fill
Missing values), Bayes classifier is built after then making sliding-model control to characteristic, prediction effect is examined on test set.
Step 210:New business tine data are inputted, and obtain prediction result, and risk is carried out according to prediction result and carried
Show
Specifically, for marking the sampled data output for being in the prediction result of the new business content-data of acquisition
During more than threshold value under specific transactions scene, false Risk-warning information is sent by human-computer interaction interface, reminds customs's work
Personnel carry out artificial detection, and testing result is stored in business tine database, and training set is expanded by step 208
Fill, further to improve predictablity rate.And it is possible to one or more in above steps is repeated,
And to distance function in the construction or the number of extraction feature of business and data, the selection of clustering algorithm, cluster in each step
Selection, the selection of classification number, the calculation formula at class center, ratio of sampled data etc., it can be carried out when repeating different
Selection, combined with forming the higher hyper parameter of predictablity rate.And this method can be repaiied with the abnormal data detected
Positive prior information, so as to lift precision of prediction.
Fig. 3 shows a kind of system for being used to predict without label data classification according to embodiments of the present invention, and it includes logical
Cross an at least electronic equipment 310 and a database server 330 for the connection of network 320.
Wherein, the electronic equipment 310 includes at least one processor 311, and leads to at least one processor
Believe the memory 312 of connection;The memory 312 is stored with the instruction that can be performed by least one processor 311, described
Instruction is performed by least one processor 311, so that at least one processor 311 is able to carry out foregoing any implementation
Method disclosed in example.The database server 330 is used for storage service content data base 331 and characteristic index database
332。
Above-described embodiment first extracts or constructed the characteristic index significant to class categories according to business, and characteristic is entered
Row cleaning, then make cluster operation in characteristic index after cleaning, all kinds of cluster centres is obtained, according to each coordinate value in class center
Size determine marking data sample weight, then weight sampling Various types of data be used as priori as abnormal data, finally
The data structure Bayes classifier of existing label is used to classify to new data, detects business risk.In practical business system
, can be by determining true tag to excessive risk Data Detection in system, and database is stored in, correct prior probability so that model
Automatically adjust accuracy.The present invention is applied to most of environment to being predicted without label data, solves the nothing under this kind of environment
The problem of method Direct Modeling gives a forecast to new data, the prior information of business expert is make use of well.
It is described above, the only detailed description of the specific embodiment of the invention, rather than limitation of the present invention.Correlation technique
The technical staff in field is not in the case where departing from the principle and scope of the present invention, various replacements, modification and the improvement made
It should be included in the scope of the protection.
Claims (10)
1. a kind of be used for the method without label data classification prediction, it is characterised in that the described method comprises the following steps:
Incoming traffic flow data, obtain multiple business scenario data in operation flow;Incoming traffic content-data, and according to
Business scenario data are grouped to business tine data;According to business scenario data and corresponding business tine database, structure
Make the characteristic index of business tine data;The characteristic index of business tine data is cleaned;To being cleaned by characteristic index
Business tine data clustered, and determine all kinds of class centers;Calculate all kinds of sample weights;According to sample weight to industry
Business content-data is sampled, and to sampled result data markers prediction label.
2. according to the method for claim 1, it is characterised in that the characteristic index of the construction business tine data includes:
History service content number under the identical services scene that will be stored in the business tine data value of input and business tine database
It is compared according to value, is constructed according to the degree of closeness of the business tine data value with producing false trading in business tine database
Risk forward direction index.
3. according to the method for claim 1, it is characterised in that the characteristic index to business tine data is cleaned
Including:Characteristic index missing is handled, is the neutral characteristic index value of business tine data distribution;By business tine data
Characteristic index value normalize and eliminate dimension impact in [0,1], and establish characteristic index database to record in every business
Hold the weighted value of the characteristic index of data.
4. according to the method for claim 1, it is characterised in that methods described using KMeans, DBSCAN, hierarchical clustering,
One of partition clustering and spectral clustering or more persons are clustered;
Clustering the distance metric used includes one of Euclidean distance, Minkowski Distance and cosine similarity
Or more persons.
5. according to the method for claim 1, it is characterised in that all kinds of sample weight of the calculating includes:Using as follows
FormulaCalculate all kinds of sample weight wi, wherein, j is the quantity of class,Represent i-th
Square of the class center to origin of coordinates distance.
6. according to the method for claim 1, it is characterised in that methods described further comprises:By labeled sampling
Data are divided into training set and test set according to default ratio;According to the new business tine data of input and its corresponding prediction
As a result with actual result data, training set is expanded.
7. according to the method for claim 6, it is characterised in that methods described further comprises:Adopted using labeled
Sample data construct Bayes classifier, and prediction result is tested on test set, if accuracy rate is less than predetermined threshold value, root
The characteristic index of business tine data, and all kinds of sample weights are changed according to the data in training set, it is accurate to obtain
The higher prediction result of rate.
8. according to the method for claim 6, it is characterised in that methods described further comprises:Adopted using labeled
Sample data construct Bayes classifier, and prediction result is tested on test set, if the accuracy rate of prediction is more than or equal to
Predetermined threshold value, then new business tine data are inputted, and obtain prediction result, and indicating risk is carried out according to prediction result.
9. according to the method for claim 1, it is characterised in that methods described further comprises:Repeat each step
One of rapid or more persons, and to the construction of business tine data, the choosing of the number of characteristic index, clustering algorithm in each step
Select, cluster in the selection of distance function, the selection of class number, the calculation formula at class center, the ratio of sampled data repeat hold
It is some or all of during row to differ.
10. a kind of be used for the system without label data classification prediction, it is characterised in that the system is included by network connection
An at least electronic equipment and a database server;
Wherein, the electronic equipment includes at least one processor, and is deposited with what at least one processor communication was connected
Reservoir;The memory storage has can be by the instruction of at least one computing device, and the instruction is by described at least one
Computing device, so that at least one processor is able to carry out the side any one of foregoing any claim 1 to 9
Method;
The database server is used for storage service content data base and characteristic index database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710890305.8A CN107679734A (en) | 2017-09-27 | 2017-09-27 | It is a kind of to be used for the method and system without label data classification prediction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710890305.8A CN107679734A (en) | 2017-09-27 | 2017-09-27 | It is a kind of to be used for the method and system without label data classification prediction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107679734A true CN107679734A (en) | 2018-02-09 |
Family
ID=61137558
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710890305.8A Pending CN107679734A (en) | 2017-09-27 | 2017-09-27 | It is a kind of to be used for the method and system without label data classification prediction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107679734A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108933785A (en) * | 2018-06-29 | 2018-12-04 | 平安科技(深圳)有限公司 | Network risks monitoring method, device, computer equipment and storage medium |
CN109034209A (en) * | 2018-07-03 | 2018-12-18 | 阿里巴巴集团控股有限公司 | The training method and device of the real-time identification model of active risk |
CN109086975A (en) * | 2018-07-10 | 2018-12-25 | 阿里巴巴集团控股有限公司 | A kind of recognition methods of transaction risk and device |
CN109472293A (en) * | 2018-10-12 | 2019-03-15 | 国家电网有限公司 | A kind of grid equipment file data error correction method based on machine learning |
CN110175113A (en) * | 2019-04-18 | 2019-08-27 | 阿里巴巴集团控股有限公司 | Business scenario determines method and apparatus |
CN110297909A (en) * | 2019-07-05 | 2019-10-01 | 中国工商银行股份有限公司 | A kind of classification method and device of no label corpus |
CN110750681A (en) * | 2018-07-05 | 2020-02-04 | 武汉斗鱼网络科技有限公司 | Account similarity calculation method, storage medium, electronic device and system |
CN110807024A (en) * | 2019-10-12 | 2020-02-18 | 广州市申迪计算机系统有限公司 | Dynamic threshold anomaly detection method and system, storage medium and intelligent device |
CN111192126A (en) * | 2019-12-27 | 2020-05-22 | 航天信息股份有限公司 | Invoice false-proof method and system based on big data analysis |
CN114332500A (en) * | 2021-09-14 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Image processing model training method and device, computer equipment and storage medium |
CN114741673A (en) * | 2022-06-13 | 2022-07-12 | 深圳竹云科技股份有限公司 | Behavior risk detection method, clustering model construction method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104331502A (en) * | 2014-11-19 | 2015-02-04 | 亚信科技(南京)有限公司 | Identifying method for courier data for courier surrounding crowd marketing |
CN106446230A (en) * | 2016-10-08 | 2017-02-22 | 国云科技股份有限公司 | Method for optimizing word classification in machine learning text |
CN106503438A (en) * | 2016-10-20 | 2017-03-15 | 上海科瓴医疗科技有限公司 | A kind of H RFM user modeling method and system for pharmacy member analysis |
CN106778042A (en) * | 2017-01-26 | 2017-05-31 | 中电科软件信息服务有限公司 | Cardio-cerebral vascular disease patient similarity analysis method and system |
CN106974660A (en) * | 2017-04-20 | 2017-07-25 | 重庆邮电大学 | The method that blood oxygen feature in being detected based on cerebration realizes sex determination |
CN107085729A (en) * | 2017-03-13 | 2017-08-22 | 西安电子科技大学 | A kind of personnel's testing result modification method based on Bayesian inference |
CN107093005A (en) * | 2017-03-24 | 2017-08-25 | 北明软件有限公司 | The method that tax handling service hall's automatic classification is realized based on big data mining algorithm |
-
2017
- 2017-09-27 CN CN201710890305.8A patent/CN107679734A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104331502A (en) * | 2014-11-19 | 2015-02-04 | 亚信科技(南京)有限公司 | Identifying method for courier data for courier surrounding crowd marketing |
CN106446230A (en) * | 2016-10-08 | 2017-02-22 | 国云科技股份有限公司 | Method for optimizing word classification in machine learning text |
CN106503438A (en) * | 2016-10-20 | 2017-03-15 | 上海科瓴医疗科技有限公司 | A kind of H RFM user modeling method and system for pharmacy member analysis |
CN106778042A (en) * | 2017-01-26 | 2017-05-31 | 中电科软件信息服务有限公司 | Cardio-cerebral vascular disease patient similarity analysis method and system |
CN107085729A (en) * | 2017-03-13 | 2017-08-22 | 西安电子科技大学 | A kind of personnel's testing result modification method based on Bayesian inference |
CN107093005A (en) * | 2017-03-24 | 2017-08-25 | 北明软件有限公司 | The method that tax handling service hall's automatic classification is realized based on big data mining algorithm |
CN106974660A (en) * | 2017-04-20 | 2017-07-25 | 重庆邮电大学 | The method that blood oxygen feature in being detected based on cerebration realizes sex determination |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108933785A (en) * | 2018-06-29 | 2018-12-04 | 平安科技(深圳)有限公司 | Network risks monitoring method, device, computer equipment and storage medium |
CN109034209B (en) * | 2018-07-03 | 2021-07-30 | 创新先进技术有限公司 | Training method and device for active risk real-time recognition model |
CN109034209A (en) * | 2018-07-03 | 2018-12-18 | 阿里巴巴集团控股有限公司 | The training method and device of the real-time identification model of active risk |
CN110750681A (en) * | 2018-07-05 | 2020-02-04 | 武汉斗鱼网络科技有限公司 | Account similarity calculation method, storage medium, electronic device and system |
CN109086975A (en) * | 2018-07-10 | 2018-12-25 | 阿里巴巴集团控股有限公司 | A kind of recognition methods of transaction risk and device |
CN109472293A (en) * | 2018-10-12 | 2019-03-15 | 国家电网有限公司 | A kind of grid equipment file data error correction method based on machine learning |
CN110175113A (en) * | 2019-04-18 | 2019-08-27 | 阿里巴巴集团控股有限公司 | Business scenario determines method and apparatus |
CN110175113B (en) * | 2019-04-18 | 2023-07-14 | 创新先进技术有限公司 | Service scene determination method and device |
CN110297909A (en) * | 2019-07-05 | 2019-10-01 | 中国工商银行股份有限公司 | A kind of classification method and device of no label corpus |
CN110297909B (en) * | 2019-07-05 | 2021-07-02 | 中国工商银行股份有限公司 | Method and device for classifying unlabeled corpora |
CN110807024B (en) * | 2019-10-12 | 2022-04-19 | 广州市申迪计算机系统有限公司 | Dynamic threshold anomaly detection method and system, storage medium and intelligent device |
CN110807024A (en) * | 2019-10-12 | 2020-02-18 | 广州市申迪计算机系统有限公司 | Dynamic threshold anomaly detection method and system, storage medium and intelligent device |
CN111192126A (en) * | 2019-12-27 | 2020-05-22 | 航天信息股份有限公司 | Invoice false-proof method and system based on big data analysis |
CN114332500A (en) * | 2021-09-14 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Image processing model training method and device, computer equipment and storage medium |
CN114741673A (en) * | 2022-06-13 | 2022-07-12 | 深圳竹云科技股份有限公司 | Behavior risk detection method, clustering model construction method and device |
CN114741673B (en) * | 2022-06-13 | 2022-08-26 | 深圳竹云科技股份有限公司 | Behavior risk detection method, clustering model construction method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107679734A (en) | It is a kind of to be used for the method and system without label data classification prediction | |
CN111178456B (en) | Abnormal index detection method and device, computer equipment and storage medium | |
Ali et al. | An overview of control charts for high‐quality processes | |
CN108520357B (en) | Method and device for judging line loss abnormality reason and server | |
WO2021052031A1 (en) | Statistical interquartile range-based commodity inventory risk early warning method and system, and computer readable storage medium | |
CN110852856B (en) | Invoice false invoice identification method based on dynamic network representation | |
CN104520806B (en) | Abnormality detection for cloud monitoring | |
CN107454105B (en) | Multidimensional network security assessment method based on AHP and grey correlation | |
CN103370722B (en) | The system and method that actual volatility is predicted by small echo and nonlinear kinetics | |
WO2020062702A9 (en) | Method and device for sending text messages, computer device and storage medium | |
CN112700324A (en) | User loan default prediction method based on combination of Catboost and restricted Boltzmann machine | |
CN111709668A (en) | Power grid equipment parameter risk identification method and device based on data mining technology | |
CN111340086A (en) | Method, system, medium and terminal for processing label-free data | |
CN114266289A (en) | Complex equipment health state assessment method | |
CN115563477B (en) | Harmonic data identification method, device, computer equipment and storage medium | |
WO2023134188A1 (en) | Index determination method and apparatus, and electronic device and computer-readable medium | |
CN109784352A (en) | A kind of method and apparatus for assessing disaggregated model | |
CN112949735A (en) | Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining | |
CN114676749A (en) | Power distribution network operation data abnormity judgment method based on data mining | |
CN115617784A (en) | Data processing system and processing method for informationized power distribution | |
CN115545103A (en) | Abnormal data identification method, label identification method and abnormal data identification device | |
WO2023029065A1 (en) | Method and apparatus for evaluating data set quality, computer device, and storage medium | |
CN114118793A (en) | Local exchange risk early warning method, device and equipment | |
CN114219003A (en) | Training method and device of sample generation model and electronic equipment | |
CN113987240B (en) | Customs inspection sample tracing method and system based on knowledge graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180209 |