CN110189035A - A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm - Google Patents
A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm Download PDFInfo
- Publication number
- CN110189035A CN110189035A CN201910470749.5A CN201910470749A CN110189035A CN 110189035 A CN110189035 A CN 110189035A CN 201910470749 A CN201910470749 A CN 201910470749A CN 110189035 A CN110189035 A CN 110189035A
- Authority
- CN
- China
- Prior art keywords
- data set
- insider trading
- sample
- test target
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Marketing (AREA)
- Evolutionary Biology (AREA)
- Entrepreneurship & Innovation (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Quality & Reliability (AREA)
- Probability & Statistics with Applications (AREA)
- Technology Law (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Tourism & Hospitality (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention discloses one kind to be based onThe insider trading recognition methods of mean cluster and KNN algorithm obtains the insider trading sample data set under the different event time window phase;Sample data set is carried outMean cluster is divided into different Sub Data Sets;Select the cluster centre corresponding Sub Data Set nearest from test target, using KNN algorithm obtain whether the result of insider trading.The present invention is based onMean cluster, establish multiple clusters, the sample for the corresponding cluster of cluster centre for selecting distance test target nearest carries out similarity degree relatively and determines classification, the problem that insider trading stock a variety of causes causes characteristic index to differ greatly and cannot be effectively recognized has been well solved, the differentiation accuracy of insider trading is improved.
Description
Technical field
The invention belongs to Securities Market Regulation fields, and in particular to a kind of inside story friendship based on K mean cluster and KNN algorithm
Method easy to identify.
Background technique
Insider trading refers to that corporate insider obtains insider information with improper means, reveals insider information, is believed according to inside story
Breath dealing security suggest that other people buy and sell the behavior of security according to insider information.Insider trading involved party makes a profit or keeps away to reach
The purpose of damage obtains insider information using its special status or chance and carries out securities trading, and behavior violates securities market three
Public principle, the equality right to know and interest in property of the investment public that constituted a serious infringement.Insider trading behavior in recent years presents high-incidence
Situation is related to mainboard, middle platelet, the multiple plates in GEM;Insider trading is related to that personnel are extensive, and the means of insider trading more become
Concealment and complexity;And insider trading stock is because of its characteristic index caused by a variety of causes such as industry difference, market value scale difference
It differs greatly, increases the identification difficulty of insider trading.
Therefore, the present invention studies a kind of first clustered to sample and carries out insider trading knowledge method for distinguishing again.
Summary of the invention
The purpose of the present invention is for insider trading sample because of characteristic index caused by many factors such as industry, market value scale
It differs greatly, and then indiscernible technical problem, a kind of insider trading identification side based on K mean cluster Yu KNN algorithm is provided
Method establishes multiple clusters according to sample data set, test target is selected apart from nearest cluster centre using K mean cluster
Then the sample of cluster carries out similarity degree using KNN algorithm again and relatively and to test target carries out identification classification.
The technical scheme is that a kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm, including with
Lower step:
Step 1: obtaining the insider trading sample data set under the different event time window phase, and select, white sample is added
This is non-insider trading sample;
Step 2: sample data set progress K mean cluster is divided into different Sub Data Sets;
Step 3: each Sub Data Set being calculated, determines cluster centre, and for each Sub Data Set, KNN is respectively adopted
Algorithm establishes Sub Data Set KNN insider trading and distinguishes model;
Step 4: obtaining test target and collect test target data set;
Step 5: the cluster centre corresponding Sub Data Set nearest from test target is selected, using son corresponding in step 3
Data set KNN insider trading distinguishes that model judgement show whether it belongs to insider trading;
Step 6: judging whether insider trading recognition result is correct;
Step 6.1: if recognition result is correct, thening follow the steps 8;
Step 6.2: if recognition result is incorrect, thening follow the steps 7;
Step 7: by test target data set and its whether be insider trading label be added sample data set, and update pair
The cluster centre for the Sub Data Set answered;
Step 8: judging whether there is next test target;
Step 8.1: if there is next test target, thening follow the steps 4;
Step 8.2: if terminating without next test target.
Further, sample data set include sample index feature and sample whether the label of insider trading.
Further, characteristic index includes that Company Financial index and company governance disclosed in securities market personal share refer to
Mark, further includes by CAMP (Capital Asset Pricing Model), GARCH (Generalized Auto
Regressive Conditional Heteroscedasticity Model) model calculate the microcosmic finger of personal share securities market
Mark.
Further, the KNN algorithm measures the similarity degree of feature between sample using Euclidean distance degree.
Further, in step 5, the cluster centre corresponding Sub Data Set nearest from test target is selected, calculating is passed through
Test target data set selects the Sub Data Set of corresponding sample data set using nearby principle at a distance from each cluster centre.
Further, specific step is as follows for KNN algorithm:
Step 1: selecting estimating for the similarity degree for the feature for determining test target data and sample data;
Step 2: the Likelihood Computation test target data selected using step 1 are concentrated at a distance from data point with sample data;
Step 3: the test target data calculated according to step 2 are concentrated at a distance from data point with sample data, will test mesh
Mark data are concentrated with sample data and are ranked up at a distance from data point, selection and the immediate K of test target2A sample;
Step 4: according in step 3 with test target immediate K2The ratio that insider trading occurs for a sample judges to survey
Whether examination target has occurred insider trading;
Step 4.1: if K2It is more than 0.5 that the ratio of insider trading occurs in a sample, then judges that inside story occurs for test target
Transaction;
Step 4.2: if K2The ratio that insider trading occurs in a sample is no more than 0.5, then judges that test target does not occur
Insider trading.
Beneficial effects of the present invention:
1) it is based on K mean cluster algorithm, establishes multiple classifications, belonging to the cluster centre for selecting distance test target nearest
Cluster sample set carry out similarity degree relatively and determine test sample classification, preferably solve insider trading stock sample because
The problem of various factors causes characteristic index to differ greatly and cannot be effectively recognized;
2) classified to test target using KNN algorithm, be should be readily appreciated that, it is easy to accomplish while and execution efficiency it is high.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples.
Fig. 1 is the flow diagram based on K mean cluster Yu the insider trading recognition methods of KNN algorithm.
Fig. 2 is the flow diagram of KNN algorithm.
Specific embodiment
As shown in Figure 1, a kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm, specifically includes following step
It is rapid:
Step 1: obtaining the stock sample for the generation insider trading that stock supervisory committee announces by China Securities Regulatory Commission official website, obtain phase
The securities market Microscopic Indexes and Corporate Finance index closed under the insider trading sample corresponding different event time window phase are made
For sample data set;It is suitable with the sample size that insider trading occurred and affiliated same and according to insider trading never occurred
One industry, inside news sensitive event belong to the standard collection white sample in same time, are added to sample data set;Sample data
Collection include characteristic index and whether the label of insider trading;
Step 2: sample data set progress K mean cluster is divided into K1Different Sub Data Sets;
Step 3: each Sub Data Set being calculated, determines cluster centre, and for each Sub Data Set, KNN is respectively adopted
Algorithm establishes Sub Data Set KNN insider trading and distinguishes model;
Step 4: obtaining test target and collect test target data set;
Step 5: the cluster centre corresponding Sub Data Set nearest from test target is selected, using son corresponding in step 3
Data set KNN insider trading distinguishes that model judgement show whether it belongs to insider trading;
Step 6: judging whether recognition result is correct;
Step 6.1: if recognition result is correct, thening follow the steps 8;
Step 6.2: if recognition result is incorrect, thening follow the steps 7;
Step 7: by test target data set and whether sample data set is added in the label of insider trading, and updates corresponding
The cluster centre of Sub Data Set;
Step 8: judging whether there is next test target;
Step 8.1: if there is next test target, thening follow the steps 4;
Step 8.2: if terminating without next test target.
Characteristic index includes Company Financial index and company governance index disclosed in securities market personal share in step 1,
It further include by CAMP (Capital Asset Pricing Model), GARCH (Generalized Auto Regressive
Conditional Heteroscedasticity Model) model calculate personal share securities market Microscopic Indexes.
In step 5, KNN algorithm measures the similarity degree of feature between sample using Euclidean distance degree.
In step 2, the K mean cluster algorithm of the K mean cluster the following steps are included:
Step 1: choosing K object in data space as initial center, each object represents a cluster centre;
Step 2: for the data object in sample, according to the Euclidean distance of they and these cluster centres, most by distance
They are assigned to class corresponding to the cluster centre nearest apart from them (most like) by close criterion;
Step 3: updating the value of cluster centre and calculating target function;
Step 4: whether judgment criteria cluster measure function restrains, if convergence, exports result;If not restraining, return
Step 2.
In step 1, ifFor n RAThe data in space choose K before cluster starts1It is initial
Cluster centre number.
In step 2, according to step 1 choose initial cluster center, by other objects according to initial cluster center
Similarity is separately dispensed into most like sample classification.The formula for calculating similarity is as follows:
Wherein d (xi,cj) it is data object xi estimating at a distance from cluster centre cj, in embodiment, distance is estimated
Expression formula is as follows
In step 3, the value of cluster centre and calculating target function is updated: assuming that the sample in j class is
It include njA sample, cluster centre areWherein,For
Cluster centre cjK-th of attribute, cluster centre is updated using following expression:
In step 4, when judging whether to meet termination condition, measure function, table are clustered as standard using mean square deviation
Up to formula are as follows:
If being unsatisfactory for termination condition, constantly repeat the above process, until standard cluster measure function convergence.
As shown in Fig. 2, specific step is as follows for KNN algorithm:
Step 1: selecting estimating for the similarity degree for the feature for determining test target data and sample data;
Step 2: the Likelihood Computation test target data selected using step 1 are concentrated at a distance from data point with sample data;
Step 3: the test target data calculated according to step 2 are concentrated at a distance from data point with sample data, will test mesh
Mark data are concentrated with sample data and are ranked up at a distance from data point, selection and the immediate K of test target2A sample;
Step 4: according in step 3 with test target immediate K2The ratio that insider trading occurs for a sample judges to survey
Whether examination target occurs insider trading;
Step 4.1: if K2It is more than 0.5 that the ratio of insider trading occurs in a sample, then judges that inside story occurs for test target
Transaction;
Step 4.2: if K2The ratio that insider trading occurs in a sample is no more than 0.5, then judges that test target does not occur
Insider trading.
In the step 1 of KNN algorithm, using the distance between Euclidean distance measurement sample.
The Euclidean distance d of hyperspace two o'clockeucCalculation method it is as follows:
Wherein Xi=(X1,X2,…Xn) and Yj=(Y1,Y2,…Yn) it is respectively the vector that two sample datas represent, n is sample
Eigen attribute number.
In embodiment, the stock sample of insider trading occurs between collection China Securities Regulatory Commission announces first 2001 to 2017 years
171.Then with same type inside news sensitive event, same industry, the same time, be not affected by stock supervisory committee punishment
White sample corresponding with insider trading stock sample about 1 to 1 occurs is chosen for selection principle, and in embodiment, total collection meets
It is required that 164, white sample.It is sample data by the sample that insider trading occurs and the sample group cooperation that insider trading does not occur
Collection.Then, respectively subsidiary company financial data with and securities market Microscopic in terms of have chosen 16 characteristic indexs, such as one institute of table
Show.
One characteristic index table of table
In order to detect the recognition effect of insider trading recognition methods of the invention on stock sample, by collected sample
Data set is according to training set: test set=8:2 points are training set and test set, establish K according to training set1A cluster centre, this
K in embodiment1It is 4, K2It is 3.Then, test set is divided into 4 sub- test sets, respectively to 4 sub- test sets using of the invention
Insider trading recognition methods is made whether that the identification of insider trading occurs.
Firstly, all sample sets have been divided into four classes using K mean cluster, in test set, four class testing collection are included
Sample number be respectively 30,14,4 and 20.
Insider trading identification, identification knot are carried out using the sample of insider trading recognition methods of the invention to sub- test set 1
Fruit is as shown in Table 2, and the recognition correct rate for insider trading stock sample is 86.67%.
The insider trading recognition result of the sub- test set 1 of table two
Insider trading identification, identification knot are carried out using the sample of insider trading recognition methods of the invention to sub- test set 2
Fruit is as shown in Table 3, and the recognition correct rate for insider trading stock sample is 75%.
The insider trading recognition result of the sub- test set 2 of table three
Insider trading identification, identification knot are carried out using the sample of insider trading recognition methods of the invention to sub- test set 3
Fruit is as shown in Table 4, and the recognition correct rate for insider trading stock sample is 100%.
The insider trading recognition result of the sub- test set 3 of table four
Insider trading identification, identification knot are carried out using the sample of insider trading recognition methods of the invention to sub- test set 4
Fruit is as shown in Table 5, and the recognition correct rate for insider trading stock sample is 70%.
The insider trading recognition result of the sub- test set 4 of table five
Consolidated statement two arrives table five, calculates the present invention and proposes that the whole accuracy of identification of model is 80%.In embodiment also into
It has gone and has not included K mean cluster, and directly insider trading has been differentiated using KNN algorithm, differentiated that earning rate is 73%, such as
Shown in table six.After this explanation combines K mean cluster, the relatively common KNN algorithm of the method for the present invention differentiates in insider trading
Accuracy in terms of be significantly improved.
The comparison in difference result table of six insider trading of table discrimination accuracy
Claims (4)
1. a kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm, which is characterized in that include the following steps,
Step 1: obtaining the insider trading sample data set under the different event time window phase, and select, white sample is added i.e.
Non- insider trading sample;
Step 2: sample data set progress K mean cluster is divided into different Sub Data Sets;
Step 3: each Sub Data Set being calculated, determines cluster centre, and for each Sub Data Set, KNN algorithm is respectively adopted
It establishes Sub Data Set KNN insider trading and distinguishes model;
Step 4: obtaining test target and collect test target data set;
Step 5: the cluster centre corresponding Sub Data Set nearest from test target is selected, using subdata corresponding in step 3
Collection KNN insider trading distinguishes that model judgement show whether it belongs to insider trading;
Step 6: judging whether recognition result is correct;
Step 6.1: if recognition result is correct, thening follow the steps 8;
Step 6.2: if recognition result is incorrect, thening follow the steps 7;
Step 7: by test target data set and whether sample data set is added in the label of insider trading, and updates corresponding subnumber
According to the cluster centre of collection;
Step 8: judging whether there is next test target;
Step 8.1: if there is next test target, thening follow the steps 4;
Step 8.2: if terminating without next test target.
2. the insider trading recognition methods according to claim 1 based on K mean cluster Yu KNN algorithm, which is characterized in that
Sample data set include index feature and whether the label of insider trading.
3. the insider trading recognition methods according to claim 1 based on K mean cluster Yu KNN algorithm, which is characterized in that
The KNN algorithm, using the similarity degree of feature between euclidean distance metric sample.
4. the insider trading recognition methods based on K mean cluster Yu KNN algorithm according to claim 1 to 3,
It is characterized in that, selecting the cluster centre corresponding Sub Data Set nearest from test target in step 5, mesh is tested by calculating
Data set is marked at a distance from each cluster centre, the Sub Data Set of corresponding sample data set is selected using nearby principle.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910470749.5A CN110189035A (en) | 2019-05-31 | 2019-05-31 | A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910470749.5A CN110189035A (en) | 2019-05-31 | 2019-05-31 | A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110189035A true CN110189035A (en) | 2019-08-30 |
Family
ID=67719446
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910470749.5A Pending CN110189035A (en) | 2019-05-31 | 2019-05-31 | A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110189035A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111179077A (en) * | 2019-12-19 | 2020-05-19 | 成都数联铭品科技有限公司 | Method and system for identifying abnormal stock transaction |
CN111199419A (en) * | 2019-12-19 | 2020-05-26 | 成都数联铭品科技有限公司 | Method and system for identifying abnormal stock transaction |
CN111833175A (en) * | 2020-06-03 | 2020-10-27 | 百维金科(上海)信息科技有限公司 | Internet financial platform application fraud behavior detection method based on KNN algorithm |
-
2019
- 2019-05-31 CN CN201910470749.5A patent/CN110189035A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111179077A (en) * | 2019-12-19 | 2020-05-19 | 成都数联铭品科技有限公司 | Method and system for identifying abnormal stock transaction |
CN111199419A (en) * | 2019-12-19 | 2020-05-26 | 成都数联铭品科技有限公司 | Method and system for identifying abnormal stock transaction |
CN111179077B (en) * | 2019-12-19 | 2023-09-12 | 成都数联铭品科技有限公司 | Stock abnormal transaction identification method and system |
CN111199419B (en) * | 2019-12-19 | 2023-09-15 | 成都数联铭品科技有限公司 | Stock abnormal transaction identification method and system |
CN111833175A (en) * | 2020-06-03 | 2020-10-27 | 百维金科(上海)信息科技有限公司 | Internet financial platform application fraud behavior detection method based on KNN algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104679777B (en) | A kind of method and system for being used to detect fraudulent trading | |
CN110189035A (en) | A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm | |
Camacho et al. | Are European business cycles close enough to be just one? | |
CN109165566A (en) | A kind of recognition of face convolutional neural networks training method based on novel loss function | |
CN109692877A (en) | A kind of Cold-strip Steel Surface quality control system and method | |
CN109034194A (en) | Transaction swindling behavior depth detection method based on feature differentiation | |
CN106326913A (en) | Money laundering account determination method and device | |
CN106952159A (en) | A kind of real security risk control method, system and storage medium | |
CN105095238A (en) | Decision tree generation method used for detecting fraudulent trade | |
CN103544499B (en) | The textural characteristics dimension reduction method that a kind of surface blemish based on machine vision is detected | |
CN105825078B (en) | Small sample Classification of Gene Expression Data method based on gene big data | |
CN106485528A (en) | The method and apparatus of detection data | |
CN110580510B (en) | Clustering result evaluation method and system | |
CN104850868A (en) | Customer segmentation method based on k-means and neural network cluster | |
CN112633337A (en) | Unbalanced data processing method based on clustering and boundary points | |
Callot et al. | Oracle efficient estimation and forecasting with the adaptive lasso and the adaptive group lasso in vector autoregressions | |
CN114997612A (en) | Cluster analysis method and device for abnormal information of large grain pile | |
CN107274025B (en) | System and method for realizing intelligent identification and management of power consumption mode | |
CN105894023A (en) | Support vector data description improved algorithm based on clusters | |
CN113919932A (en) | Client scoring deviation detection method based on loan application scoring model | |
CN109657122A (en) | A kind of Academic Teams' important member's recognition methods based on academic big data | |
CN107886217A (en) | A kind of labor turnover Risk Forecast Method and device based on clustering algorithm | |
CN111598116B (en) | Data classification method, device, electronic equipment and readable storage medium | |
CN112288561A (en) | Internet financial fraud behavior detection method based on DBSCAN algorithm | |
CN114742655B (en) | Anti-money laundering behavior recognition system based on machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190830 |