CN110189035A - A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm - Google Patents

A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm Download PDF

Info

Publication number
CN110189035A
CN110189035A CN201910470749.5A CN201910470749A CN110189035A CN 110189035 A CN110189035 A CN 110189035A CN 201910470749 A CN201910470749 A CN 201910470749A CN 110189035 A CN110189035 A CN 110189035A
Authority
CN
China
Prior art keywords
data set
insider trading
sample
test target
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910470749.5A
Other languages
Chinese (zh)
Inventor
邓尚昆
王晨光
徐乔林
危晨阳
王明月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Three Gorges University CTGU
Original Assignee
China Three Gorges University CTGU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Three Gorges University CTGU filed Critical China Three Gorges University CTGU
Priority to CN201910470749.5A priority Critical patent/CN110189035A/en
Publication of CN110189035A publication Critical patent/CN110189035A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Evolutionary Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses one kind to be based onThe insider trading recognition methods of mean cluster and KNN algorithm obtains the insider trading sample data set under the different event time window phase;Sample data set is carried outMean cluster is divided into different Sub Data Sets;Select the cluster centre corresponding Sub Data Set nearest from test target, using KNN algorithm obtain whether the result of insider trading.The present invention is based onMean cluster, establish multiple clusters, the sample for the corresponding cluster of cluster centre for selecting distance test target nearest carries out similarity degree relatively and determines classification, the problem that insider trading stock a variety of causes causes characteristic index to differ greatly and cannot be effectively recognized has been well solved, the differentiation accuracy of insider trading is improved.

Description

A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm
Technical field
The invention belongs to Securities Market Regulation fields, and in particular to a kind of inside story friendship based on K mean cluster and KNN algorithm Method easy to identify.
Background technique
Insider trading refers to that corporate insider obtains insider information with improper means, reveals insider information, is believed according to inside story Breath dealing security suggest that other people buy and sell the behavior of security according to insider information.Insider trading involved party makes a profit or keeps away to reach The purpose of damage obtains insider information using its special status or chance and carries out securities trading, and behavior violates securities market three Public principle, the equality right to know and interest in property of the investment public that constituted a serious infringement.Insider trading behavior in recent years presents high-incidence Situation is related to mainboard, middle platelet, the multiple plates in GEM;Insider trading is related to that personnel are extensive, and the means of insider trading more become Concealment and complexity;And insider trading stock is because of its characteristic index caused by a variety of causes such as industry difference, market value scale difference It differs greatly, increases the identification difficulty of insider trading.
Therefore, the present invention studies a kind of first clustered to sample and carries out insider trading knowledge method for distinguishing again.
Summary of the invention
The purpose of the present invention is for insider trading sample because of characteristic index caused by many factors such as industry, market value scale It differs greatly, and then indiscernible technical problem, a kind of insider trading identification side based on K mean cluster Yu KNN algorithm is provided Method establishes multiple clusters according to sample data set, test target is selected apart from nearest cluster centre using K mean cluster Then the sample of cluster carries out similarity degree using KNN algorithm again and relatively and to test target carries out identification classification.
The technical scheme is that a kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm, including with Lower step:
Step 1: obtaining the insider trading sample data set under the different event time window phase, and select, white sample is added This is non-insider trading sample;
Step 2: sample data set progress K mean cluster is divided into different Sub Data Sets;
Step 3: each Sub Data Set being calculated, determines cluster centre, and for each Sub Data Set, KNN is respectively adopted Algorithm establishes Sub Data Set KNN insider trading and distinguishes model;
Step 4: obtaining test target and collect test target data set;
Step 5: the cluster centre corresponding Sub Data Set nearest from test target is selected, using son corresponding in step 3 Data set KNN insider trading distinguishes that model judgement show whether it belongs to insider trading;
Step 6: judging whether insider trading recognition result is correct;
Step 6.1: if recognition result is correct, thening follow the steps 8;
Step 6.2: if recognition result is incorrect, thening follow the steps 7;
Step 7: by test target data set and its whether be insider trading label be added sample data set, and update pair The cluster centre for the Sub Data Set answered;
Step 8: judging whether there is next test target;
Step 8.1: if there is next test target, thening follow the steps 4;
Step 8.2: if terminating without next test target.
Further, sample data set include sample index feature and sample whether the label of insider trading.
Further, characteristic index includes that Company Financial index and company governance disclosed in securities market personal share refer to Mark, further includes by CAMP (Capital Asset Pricing Model), GARCH (Generalized Auto Regressive Conditional Heteroscedasticity Model) model calculate the microcosmic finger of personal share securities market Mark.
Further, the KNN algorithm measures the similarity degree of feature between sample using Euclidean distance degree.
Further, in step 5, the cluster centre corresponding Sub Data Set nearest from test target is selected, calculating is passed through Test target data set selects the Sub Data Set of corresponding sample data set using nearby principle at a distance from each cluster centre.
Further, specific step is as follows for KNN algorithm:
Step 1: selecting estimating for the similarity degree for the feature for determining test target data and sample data;
Step 2: the Likelihood Computation test target data selected using step 1 are concentrated at a distance from data point with sample data;
Step 3: the test target data calculated according to step 2 are concentrated at a distance from data point with sample data, will test mesh Mark data are concentrated with sample data and are ranked up at a distance from data point, selection and the immediate K of test target2A sample;
Step 4: according in step 3 with test target immediate K2The ratio that insider trading occurs for a sample judges to survey Whether examination target has occurred insider trading;
Step 4.1: if K2It is more than 0.5 that the ratio of insider trading occurs in a sample, then judges that inside story occurs for test target Transaction;
Step 4.2: if K2The ratio that insider trading occurs in a sample is no more than 0.5, then judges that test target does not occur Insider trading.
Beneficial effects of the present invention:
1) it is based on K mean cluster algorithm, establishes multiple classifications, belonging to the cluster centre for selecting distance test target nearest Cluster sample set carry out similarity degree relatively and determine test sample classification, preferably solve insider trading stock sample because The problem of various factors causes characteristic index to differ greatly and cannot be effectively recognized;
2) classified to test target using KNN algorithm, be should be readily appreciated that, it is easy to accomplish while and execution efficiency it is high.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples.
Fig. 1 is the flow diagram based on K mean cluster Yu the insider trading recognition methods of KNN algorithm.
Fig. 2 is the flow diagram of KNN algorithm.
Specific embodiment
As shown in Figure 1, a kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm, specifically includes following step It is rapid:
Step 1: obtaining the stock sample for the generation insider trading that stock supervisory committee announces by China Securities Regulatory Commission official website, obtain phase The securities market Microscopic Indexes and Corporate Finance index closed under the insider trading sample corresponding different event time window phase are made For sample data set;It is suitable with the sample size that insider trading occurred and affiliated same and according to insider trading never occurred One industry, inside news sensitive event belong to the standard collection white sample in same time, are added to sample data set;Sample data Collection include characteristic index and whether the label of insider trading;
Step 2: sample data set progress K mean cluster is divided into K1Different Sub Data Sets;
Step 3: each Sub Data Set being calculated, determines cluster centre, and for each Sub Data Set, KNN is respectively adopted Algorithm establishes Sub Data Set KNN insider trading and distinguishes model;
Step 4: obtaining test target and collect test target data set;
Step 5: the cluster centre corresponding Sub Data Set nearest from test target is selected, using son corresponding in step 3 Data set KNN insider trading distinguishes that model judgement show whether it belongs to insider trading;
Step 6: judging whether recognition result is correct;
Step 6.1: if recognition result is correct, thening follow the steps 8;
Step 6.2: if recognition result is incorrect, thening follow the steps 7;
Step 7: by test target data set and whether sample data set is added in the label of insider trading, and updates corresponding The cluster centre of Sub Data Set;
Step 8: judging whether there is next test target;
Step 8.1: if there is next test target, thening follow the steps 4;
Step 8.2: if terminating without next test target.
Characteristic index includes Company Financial index and company governance index disclosed in securities market personal share in step 1, It further include by CAMP (Capital Asset Pricing Model), GARCH (Generalized Auto Regressive Conditional Heteroscedasticity Model) model calculate personal share securities market Microscopic Indexes.
In step 5, KNN algorithm measures the similarity degree of feature between sample using Euclidean distance degree.
In step 2, the K mean cluster algorithm of the K mean cluster the following steps are included:
Step 1: choosing K object in data space as initial center, each object represents a cluster centre;
Step 2: for the data object in sample, according to the Euclidean distance of they and these cluster centres, most by distance They are assigned to class corresponding to the cluster centre nearest apart from them (most like) by close criterion;
Step 3: updating the value of cluster centre and calculating target function;
Step 4: whether judgment criteria cluster measure function restrains, if convergence, exports result;If not restraining, return Step 2.
In step 1, ifFor n RAThe data in space choose K before cluster starts1It is initial Cluster centre number.
In step 2, according to step 1 choose initial cluster center, by other objects according to initial cluster center Similarity is separately dispensed into most like sample classification.The formula for calculating similarity is as follows:
Wherein d (xi,cj) it is data object xi estimating at a distance from cluster centre cj, in embodiment, distance is estimated Expression formula is as follows
In step 3, the value of cluster centre and calculating target function is updated: assuming that the sample in j class is
It include njA sample, cluster centre areWherein,For Cluster centre cjK-th of attribute, cluster centre is updated using following expression:
In step 4, when judging whether to meet termination condition, measure function, table are clustered as standard using mean square deviation Up to formula are as follows:
If being unsatisfactory for termination condition, constantly repeat the above process, until standard cluster measure function convergence.
As shown in Fig. 2, specific step is as follows for KNN algorithm:
Step 1: selecting estimating for the similarity degree for the feature for determining test target data and sample data;
Step 2: the Likelihood Computation test target data selected using step 1 are concentrated at a distance from data point with sample data;
Step 3: the test target data calculated according to step 2 are concentrated at a distance from data point with sample data, will test mesh Mark data are concentrated with sample data and are ranked up at a distance from data point, selection and the immediate K of test target2A sample;
Step 4: according in step 3 with test target immediate K2The ratio that insider trading occurs for a sample judges to survey Whether examination target occurs insider trading;
Step 4.1: if K2It is more than 0.5 that the ratio of insider trading occurs in a sample, then judges that inside story occurs for test target Transaction;
Step 4.2: if K2The ratio that insider trading occurs in a sample is no more than 0.5, then judges that test target does not occur Insider trading.
In the step 1 of KNN algorithm, using the distance between Euclidean distance measurement sample.
The Euclidean distance d of hyperspace two o'clockeucCalculation method it is as follows:
Wherein Xi=(X1,X2,…Xn) and Yj=(Y1,Y2,…Yn) it is respectively the vector that two sample datas represent, n is sample Eigen attribute number.
In embodiment, the stock sample of insider trading occurs between collection China Securities Regulatory Commission announces first 2001 to 2017 years 171.Then with same type inside news sensitive event, same industry, the same time, be not affected by stock supervisory committee punishment White sample corresponding with insider trading stock sample about 1 to 1 occurs is chosen for selection principle, and in embodiment, total collection meets It is required that 164, white sample.It is sample data by the sample that insider trading occurs and the sample group cooperation that insider trading does not occur Collection.Then, respectively subsidiary company financial data with and securities market Microscopic in terms of have chosen 16 characteristic indexs, such as one institute of table Show.
One characteristic index table of table
In order to detect the recognition effect of insider trading recognition methods of the invention on stock sample, by collected sample Data set is according to training set: test set=8:2 points are training set and test set, establish K according to training set1A cluster centre, this K in embodiment1It is 4, K2It is 3.Then, test set is divided into 4 sub- test sets, respectively to 4 sub- test sets using of the invention Insider trading recognition methods is made whether that the identification of insider trading occurs.
Firstly, all sample sets have been divided into four classes using K mean cluster, in test set, four class testing collection are included Sample number be respectively 30,14,4 and 20.
Insider trading identification, identification knot are carried out using the sample of insider trading recognition methods of the invention to sub- test set 1 Fruit is as shown in Table 2, and the recognition correct rate for insider trading stock sample is 86.67%.
The insider trading recognition result of the sub- test set 1 of table two
Insider trading identification, identification knot are carried out using the sample of insider trading recognition methods of the invention to sub- test set 2 Fruit is as shown in Table 3, and the recognition correct rate for insider trading stock sample is 75%.
The insider trading recognition result of the sub- test set 2 of table three
Insider trading identification, identification knot are carried out using the sample of insider trading recognition methods of the invention to sub- test set 3 Fruit is as shown in Table 4, and the recognition correct rate for insider trading stock sample is 100%.
The insider trading recognition result of the sub- test set 3 of table four
Insider trading identification, identification knot are carried out using the sample of insider trading recognition methods of the invention to sub- test set 4 Fruit is as shown in Table 5, and the recognition correct rate for insider trading stock sample is 70%.
The insider trading recognition result of the sub- test set 4 of table five
Consolidated statement two arrives table five, calculates the present invention and proposes that the whole accuracy of identification of model is 80%.In embodiment also into It has gone and has not included K mean cluster, and directly insider trading has been differentiated using KNN algorithm, differentiated that earning rate is 73%, such as Shown in table six.After this explanation combines K mean cluster, the relatively common KNN algorithm of the method for the present invention differentiates in insider trading Accuracy in terms of be significantly improved.
The comparison in difference result table of six insider trading of table discrimination accuracy

Claims (4)

1. a kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm, which is characterized in that include the following steps,
Step 1: obtaining the insider trading sample data set under the different event time window phase, and select, white sample is added i.e. Non- insider trading sample;
Step 2: sample data set progress K mean cluster is divided into different Sub Data Sets;
Step 3: each Sub Data Set being calculated, determines cluster centre, and for each Sub Data Set, KNN algorithm is respectively adopted It establishes Sub Data Set KNN insider trading and distinguishes model;
Step 4: obtaining test target and collect test target data set;
Step 5: the cluster centre corresponding Sub Data Set nearest from test target is selected, using subdata corresponding in step 3 Collection KNN insider trading distinguishes that model judgement show whether it belongs to insider trading;
Step 6: judging whether recognition result is correct;
Step 6.1: if recognition result is correct, thening follow the steps 8;
Step 6.2: if recognition result is incorrect, thening follow the steps 7;
Step 7: by test target data set and whether sample data set is added in the label of insider trading, and updates corresponding subnumber According to the cluster centre of collection;
Step 8: judging whether there is next test target;
Step 8.1: if there is next test target, thening follow the steps 4;
Step 8.2: if terminating without next test target.
2. the insider trading recognition methods according to claim 1 based on K mean cluster Yu KNN algorithm, which is characterized in that Sample data set include index feature and whether the label of insider trading.
3. the insider trading recognition methods according to claim 1 based on K mean cluster Yu KNN algorithm, which is characterized in that The KNN algorithm, using the similarity degree of feature between euclidean distance metric sample.
4. the insider trading recognition methods based on K mean cluster Yu KNN algorithm according to claim 1 to 3, It is characterized in that, selecting the cluster centre corresponding Sub Data Set nearest from test target in step 5, mesh is tested by calculating Data set is marked at a distance from each cluster centre, the Sub Data Set of corresponding sample data set is selected using nearby principle.
CN201910470749.5A 2019-05-31 2019-05-31 A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm Pending CN110189035A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910470749.5A CN110189035A (en) 2019-05-31 2019-05-31 A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910470749.5A CN110189035A (en) 2019-05-31 2019-05-31 A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm

Publications (1)

Publication Number Publication Date
CN110189035A true CN110189035A (en) 2019-08-30

Family

ID=67719446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910470749.5A Pending CN110189035A (en) 2019-05-31 2019-05-31 A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm

Country Status (1)

Country Link
CN (1) CN110189035A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179077A (en) * 2019-12-19 2020-05-19 成都数联铭品科技有限公司 Method and system for identifying abnormal stock transaction
CN111199419A (en) * 2019-12-19 2020-05-26 成都数联铭品科技有限公司 Method and system for identifying abnormal stock transaction
CN111833175A (en) * 2020-06-03 2020-10-27 百维金科(上海)信息科技有限公司 Internet financial platform application fraud behavior detection method based on KNN algorithm

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179077A (en) * 2019-12-19 2020-05-19 成都数联铭品科技有限公司 Method and system for identifying abnormal stock transaction
CN111199419A (en) * 2019-12-19 2020-05-26 成都数联铭品科技有限公司 Method and system for identifying abnormal stock transaction
CN111179077B (en) * 2019-12-19 2023-09-12 成都数联铭品科技有限公司 Stock abnormal transaction identification method and system
CN111199419B (en) * 2019-12-19 2023-09-15 成都数联铭品科技有限公司 Stock abnormal transaction identification method and system
CN111833175A (en) * 2020-06-03 2020-10-27 百维金科(上海)信息科技有限公司 Internet financial platform application fraud behavior detection method based on KNN algorithm

Similar Documents

Publication Publication Date Title
CN104679777B (en) A kind of method and system for being used to detect fraudulent trading
CN110189035A (en) A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm
Camacho et al. Are European business cycles close enough to be just one?
CN109165566A (en) A kind of recognition of face convolutional neural networks training method based on novel loss function
CN109692877A (en) A kind of Cold-strip Steel Surface quality control system and method
CN109034194A (en) Transaction swindling behavior depth detection method based on feature differentiation
CN106326913A (en) Money laundering account determination method and device
CN106952159A (en) A kind of real security risk control method, system and storage medium
CN105095238A (en) Decision tree generation method used for detecting fraudulent trade
CN103544499B (en) The textural characteristics dimension reduction method that a kind of surface blemish based on machine vision is detected
CN105825078B (en) Small sample Classification of Gene Expression Data method based on gene big data
CN106485528A (en) The method and apparatus of detection data
CN110580510B (en) Clustering result evaluation method and system
CN104850868A (en) Customer segmentation method based on k-means and neural network cluster
CN112633337A (en) Unbalanced data processing method based on clustering and boundary points
Callot et al. Oracle efficient estimation and forecasting with the adaptive lasso and the adaptive group lasso in vector autoregressions
CN114997612A (en) Cluster analysis method and device for abnormal information of large grain pile
CN107274025B (en) System and method for realizing intelligent identification and management of power consumption mode
CN105894023A (en) Support vector data description improved algorithm based on clusters
CN113919932A (en) Client scoring deviation detection method based on loan application scoring model
CN109657122A (en) A kind of Academic Teams' important member's recognition methods based on academic big data
CN107886217A (en) A kind of labor turnover Risk Forecast Method and device based on clustering algorithm
CN111598116B (en) Data classification method, device, electronic equipment and readable storage medium
CN112288561A (en) Internet financial fraud behavior detection method based on DBSCAN algorithm
CN114742655B (en) Anti-money laundering behavior recognition system based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190830