CN110189035A

CN110189035A - A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm

Info

Publication number: CN110189035A
Application number: CN201910470749.5A
Authority: CN
Inventors: 邓尚昆; 王晨光; 徐乔林; 危晨阳; 王明月
Original assignee: China Three Gorges University CTGU
Current assignee: China Three Gorges University CTGU
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2019-08-30

Abstract

The invention discloses one kind to be based onThe insider trading recognition methods of mean cluster and KNN algorithm obtains the insider trading sample data set under the different event time window phase；Sample data set is carried outMean cluster is divided into different Sub Data Sets；Select the cluster centre corresponding Sub Data Set nearest from test target, using KNN algorithm obtain whether the result of insider trading.The present invention is based onMean cluster, establish multiple clusters, the sample for the corresponding cluster of cluster centre for selecting distance test target nearest carries out similarity degree relatively and determines classification, the problem that insider trading stock a variety of causes causes characteristic index to differ greatly and cannot be effectively recognized has been well solved, the differentiation accuracy of insider trading is improved.

Description

A kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm

Technical field

The invention belongs to Securities Market Regulation fields, and in particular to a kind of inside story friendship based on K mean cluster and KNN algorithm Method easy to identify.

Background technique

Insider trading refers to that corporate insider obtains insider information with improper means, reveals insider information, is believed according to inside story Breath dealing security suggest that other people buy and sell the behavior of security according to insider information.Insider trading involved party makes a profit or keeps away to reach The purpose of damage obtains insider information using its special status or chance and carries out securities trading, and behavior violates securities market three Public principle, the equality right to know and interest in property of the investment public that constituted a serious infringement.Insider trading behavior in recent years presents high-incidence Situation is related to mainboard, middle platelet, the multiple plates in GEM；Insider trading is related to that personnel are extensive, and the means of insider trading more become Concealment and complexity；And insider trading stock is because of its characteristic index caused by a variety of causes such as industry difference, market value scale difference It differs greatly, increases the identification difficulty of insider trading.

Therefore, the present invention studies a kind of first clustered to sample and carries out insider trading knowledge method for distinguishing again.

Summary of the invention

The purpose of the present invention is for insider trading sample because of characteristic index caused by many factors such as industry, market value scale It differs greatly, and then indiscernible technical problem, a kind of insider trading identification side based on K mean cluster Yu KNN algorithm is provided Method establishes multiple clusters according to sample data set, test target is selected apart from nearest cluster centre using K mean cluster Then the sample of cluster carries out similarity degree using KNN algorithm again and relatively and to test target carries out identification classification.

The technical scheme is that a kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm, including with Lower step:

Step 1: obtaining the insider trading sample data set under the different event time window phase, and select, white sample is added This is non-insider trading sample；

Step 2: sample data set progress K mean cluster is divided into different Sub Data Sets；

Step 3: each Sub Data Set being calculated, determines cluster centre, and for each Sub Data Set, KNN is respectively adopted Algorithm establishes Sub Data Set KNN insider trading and distinguishes model；

Step 4: obtaining test target and collect test target data set；

Step 5: the cluster centre corresponding Sub Data Set nearest from test target is selected, using son corresponding in step 3 Data set KNN insider trading distinguishes that model judgement show whether it belongs to insider trading；

Step 6: judging whether insider trading recognition result is correct；

Step 6.1: if recognition result is correct, thening follow the steps 8；

Step 6.2: if recognition result is incorrect, thening follow the steps 7；

Step 7: by test target data set and its whether be insider trading label be added sample data set, and update pair The cluster centre for the Sub Data Set answered；

Step 8: judging whether there is next test target；

Step 8.1: if there is next test target, thening follow the steps 4；

Step 8.2: if terminating without next test target.

Further, sample data set include sample index feature and sample whether the label of insider trading.

Further, characteristic index includes that Company Financial index and company governance disclosed in securities market personal share refer to Mark, further includes by CAMP (Capital Asset Pricing Model), GARCH (Generalized Auto Regressive Conditional Heteroscedasticity Model) model calculate the microcosmic finger of personal share securities market Mark.

Further, the KNN algorithm measures the similarity degree of feature between sample using Euclidean distance degree.

Further, in step 5, the cluster centre corresponding Sub Data Set nearest from test target is selected, calculating is passed through Test target data set selects the Sub Data Set of corresponding sample data set using nearby principle at a distance from each cluster centre.

Further, specific step is as follows for KNN algorithm:

Step 1: selecting estimating for the similarity degree for the feature for determining test target data and sample data；

Step 2: the Likelihood Computation test target data selected using step 1 are concentrated at a distance from data point with sample data；

Step 3: the test target data calculated according to step 2 are concentrated at a distance from data point with sample data, will test mesh Mark data are concentrated with sample data and are ranked up at a distance from data point, selection and the immediate K of test target₂A sample；

Step 4: according in step 3 with test target immediate K₂The ratio that insider trading occurs for a sample judges to survey Whether examination target has occurred insider trading；

Step 4.1: if K₂It is more than 0.5 that the ratio of insider trading occurs in a sample, then judges that inside story occurs for test target Transaction；

Step 4.2: if K₂The ratio that insider trading occurs in a sample is no more than 0.5, then judges that test target does not occur Insider trading.

Beneficial effects of the present invention:

1) it is based on K mean cluster algorithm, establishes multiple classifications, belonging to the cluster centre for selecting distance test target nearest Cluster sample set carry out similarity degree relatively and determine test sample classification, preferably solve insider trading stock sample because The problem of various factors causes characteristic index to differ greatly and cannot be effectively recognized；

2) classified to test target using KNN algorithm, be should be readily appreciated that, it is easy to accomplish while and execution efficiency it is high.

Detailed description of the invention

Present invention will be further explained below with reference to the attached drawings and examples.

Fig. 1 is the flow diagram based on K mean cluster Yu the insider trading recognition methods of KNN algorithm.

Fig. 2 is the flow diagram of KNN algorithm.

Specific embodiment

As shown in Figure 1, a kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm, specifically includes following step It is rapid:

Step 1: obtaining the stock sample for the generation insider trading that stock supervisory committee announces by China Securities Regulatory Commission official website, obtain phase The securities market Microscopic Indexes and Corporate Finance index closed under the insider trading sample corresponding different event time window phase are made For sample data set；It is suitable with the sample size that insider trading occurred and affiliated same and according to insider trading never occurred One industry, inside news sensitive event belong to the standard collection white sample in same time, are added to sample data set；Sample data Collection include characteristic index and whether the label of insider trading；

Step 2: sample data set progress K mean cluster is divided into K₁Different Sub Data Sets；

Step 4: obtaining test target and collect test target data set；

Step 6: judging whether recognition result is correct；

Step 6.1: if recognition result is correct, thening follow the steps 8；

Step 6.2: if recognition result is incorrect, thening follow the steps 7；

Step 7: by test target data set and whether sample data set is added in the label of insider trading, and updates corresponding The cluster centre of Sub Data Set；

Step 8: judging whether there is next test target；

Step 8.1: if there is next test target, thening follow the steps 4；

Step 8.2: if terminating without next test target.

Characteristic index includes Company Financial index and company governance index disclosed in securities market personal share in step 1, It further include by CAMP (Capital Asset Pricing Model), GARCH (Generalized Auto Regressive Conditional Heteroscedasticity Model) model calculate personal share securities market Microscopic Indexes.

In step 5, KNN algorithm measures the similarity degree of feature between sample using Euclidean distance degree.

In step 2, the K mean cluster algorithm of the K mean cluster the following steps are included:

Step 1: choosing K object in data space as initial center, each object represents a cluster centre；

Step 2: for the data object in sample, according to the Euclidean distance of they and these cluster centres, most by distance They are assigned to class corresponding to the cluster centre nearest apart from them (most like) by close criterion；

Step 3: updating the value of cluster centre and calculating target function；

Step 4: whether judgment criteria cluster measure function restrains, if convergence, exports result；If not restraining, return Step 2.

In step 1, ifFor n R^AThe data in space choose K before cluster starts₁It is initial Cluster centre number.

In step 2, according to step 1 choose initial cluster center, by other objects according to initial cluster center Similarity is separately dispensed into most like sample classification.The formula for calculating similarity is as follows:

Wherein d (x_i,c_j) it is data object xi estimating at a distance from cluster centre cj, in embodiment, distance is estimated Expression formula is as follows

In step 3, the value of cluster centre and calculating target function is updated: assuming that the sample in j class is

It include n_jA sample, cluster centre areWherein,For Cluster centre c_jK-th of attribute, cluster centre is updated using following expression:

In step 4, when judging whether to meet termination condition, measure function, table are clustered as standard using mean square deviation Up to formula are as follows:

If being unsatisfactory for termination condition, constantly repeat the above process, until standard cluster measure function convergence.

As shown in Fig. 2, specific step is as follows for KNN algorithm:

Step 4: according in step 3 with test target immediate K₂The ratio that insider trading occurs for a sample judges to survey Whether examination target occurs insider trading；

In the step 1 of KNN algorithm, using the distance between Euclidean distance measurement sample.

The Euclidean distance d of hyperspace two o'clock_eucCalculation method it is as follows:

Wherein X_i=(X₁,X₂,…X_n) and Y_j=(Y₁,Y₂,…Y_n) it is respectively the vector that two sample datas represent, n is sample Eigen attribute number.

In embodiment, the stock sample of insider trading occurs between collection China Securities Regulatory Commission announces first 2001 to 2017 years 171.Then with same type inside news sensitive event, same industry, the same time, be not affected by stock supervisory committee punishment White sample corresponding with insider trading stock sample about 1 to 1 occurs is chosen for selection principle, and in embodiment, total collection meets It is required that 164, white sample.It is sample data by the sample that insider trading occurs and the sample group cooperation that insider trading does not occur Collection.Then, respectively subsidiary company financial data with and securities market Microscopic in terms of have chosen 16 characteristic indexs, such as one institute of table Show.

One characteristic index table of table

In order to detect the recognition effect of insider trading recognition methods of the invention on stock sample, by collected sample Data set is according to training set: test set=8:2 points are training set and test set, establish K according to training set₁A cluster centre, this K in embodiment₁It is 4, K₂It is 3.Then, test set is divided into 4 sub- test sets, respectively to 4 sub- test sets using of the invention Insider trading recognition methods is made whether that the identification of insider trading occurs.

Firstly, all sample sets have been divided into four classes using K mean cluster, in test set, four class testing collection are included Sample number be respectively 30,14,4 and 20.

Insider trading identification, identification knot are carried out using the sample of insider trading recognition methods of the invention to sub- test set 1 Fruit is as shown in Table 2, and the recognition correct rate for insider trading stock sample is 86.67%.

The insider trading recognition result of the sub- test set 1 of table two

Insider trading identification, identification knot are carried out using the sample of insider trading recognition methods of the invention to sub- test set 2 Fruit is as shown in Table 3, and the recognition correct rate for insider trading stock sample is 75%.

The insider trading recognition result of the sub- test set 2 of table three

Insider trading identification, identification knot are carried out using the sample of insider trading recognition methods of the invention to sub- test set 3 Fruit is as shown in Table 4, and the recognition correct rate for insider trading stock sample is 100%.

The insider trading recognition result of the sub- test set 3 of table four

Insider trading identification, identification knot are carried out using the sample of insider trading recognition methods of the invention to sub- test set 4 Fruit is as shown in Table 5, and the recognition correct rate for insider trading stock sample is 70%.

The insider trading recognition result of the sub- test set 4 of table five

Consolidated statement two arrives table five, calculates the present invention and proposes that the whole accuracy of identification of model is 80%.In embodiment also into It has gone and has not included K mean cluster, and directly insider trading has been differentiated using KNN algorithm, differentiated that earning rate is 73%, such as Shown in table six.After this explanation combines K mean cluster, the relatively common KNN algorithm of the method for the present invention differentiates in insider trading Accuracy in terms of be significantly improved.

The comparison in difference result table of six insider trading of table discrimination accuracy

Claims

1. a kind of insider trading recognition methods based on K mean cluster Yu KNN algorithm, which is characterized in that include the following steps,

Step 1: obtaining the insider trading sample data set under the different event time window phase, and select, white sample is added i.e. Non- insider trading sample；

Step 3: each Sub Data Set being calculated, determines cluster centre, and for each Sub Data Set, KNN algorithm is respectively adopted It establishes Sub Data Set KNN insider trading and distinguishes model；

Step 4: obtaining test target and collect test target data set；

Step 5: the cluster centre corresponding Sub Data Set nearest from test target is selected, using subdata corresponding in step 3 Collection KNN insider trading distinguishes that model judgement show whether it belongs to insider trading；

Step 6: judging whether recognition result is correct；

Step 6.1: if recognition result is correct, thening follow the steps 8；

Step 6.2: if recognition result is incorrect, thening follow the steps 7；

Step 7: by test target data set and whether sample data set is added in the label of insider trading, and updates corresponding subnumber According to the cluster centre of collection；

Step 8: judging whether there is next test target；

Step 8.1: if there is next test target, thening follow the steps 4；

Step 8.2: if terminating without next test target.

2. the insider trading recognition methods according to claim 1 based on K mean cluster Yu KNN algorithm, which is characterized in that Sample data set include index feature and whether the label of insider trading.

3. the insider trading recognition methods according to claim 1 based on K mean cluster Yu KNN algorithm, which is characterized in that The KNN algorithm, using the similarity degree of feature between euclidean distance metric sample.

4. the insider trading recognition methods based on K mean cluster Yu KNN algorithm according to claim 1 to 3, It is characterized in that, selecting the cluster centre corresponding Sub Data Set nearest from test target in step 5, mesh is tested by calculating Data set is marked at a distance from each cluster centre, the Sub Data Set of corresponding sample data set is selected using nearby principle.