CN111581384B

CN111581384B - Enterprise policy text clustering method

Info

Publication number: CN111581384B
Application number: CN202010367581.8A
Authority: CN
Inventors: 郭肇禄; 陈远存; 谭力江; 张文生
Original assignee: Guangdong Oking Information Industry Co ltd
Current assignee: Guangdong Oking Information Industry Co ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2022-06-10
Anticipated expiration: 2040-04-30
Also published as: CN111581384A

Abstract

The invention discloses a method for clustering enterprise-benefiting policy texts, and relates to the technical field of text clustering. The method comprises the steps of firstly collecting the favorable enterprise policy text, then preprocessing the favorable enterprise policy text, extracting the characteristic vector, and then optimizing the clustering center of the favorable enterprise policy text by utilizing a guided sine and cosine algorithm. In the guided sine and cosine algorithm, the guided crossing rate is adaptively adjusted according to the searched feedback information, the guided searching direction is generated by combining the guided crossing rate, and then the performance of the algorithm is improved by utilizing the guided searching direction. The method realizes clustering of the enterprise-preference policy text by utilizing the guided sine and cosine algorithm, and can improve the clustering precision of the enterprise-preference policy text.

Description

Enterprise policy text clustering method

Technical Field

The invention relates to the technical field of text clustering, in particular to a method for clustering enterprise-benefited policy texts.

Background

In order to better serve small and medium-sized enterprises and accelerate economic construction, various related departments at all levels have issued a plurality of enterprise-benefiting policies. The enterprise-benefiting policies comprise tax-free policies, tax-reducing policies, interest-bearing support policies, yield-increasing and efficiency-increasing reward policies and the like. However, with the successive business of various types of favorable-enterprise policies, it is often difficult for many small and medium-sized enterprises to find favorable-enterprise policies that meet their own conditions. How to read the favorable enterprise policy for small and medium-sized enterprises is a very challenging task. Therefore, researchers try to recommend an enterprise-benefiting policy meeting development requirements for medium-sized and small enterprises according to the characteristics of the medium-sized and small enterprises by using an artificial intelligence technology.

In order to better help small and medium-sized enterprises to recommend proper enterprise-preference policies, the enterprise-preference policy texts need to be classified into clusters. The manual class clustering of a plurality of enterprise-benefit policy texts usually consumes a great deal of manpower. Therefore, researchers propose to perform class clustering on the preferential enterprise policy text by using a text clustering technology. However, when the traditional text clustering technology is applied to clustering of the text of the enterprise-preference policy, the shortcoming of low clustering precision is easy to occur.

Disclosure of Invention

The invention provides a method for clustering enterprise-preference policy texts, which overcomes the defect that the clustering precision is not high easily when the traditional text clustering technology is applied to clustering of enterprise-preference policy texts to a certain extent, and can improve the accuracy of the enterprise-preference policy text clustering.

The technical scheme of the invention is as follows: a method for clustering the text of a preferential enterprise policy comprises the following steps:

step 1, collecting a preferential enterprise policy text;

step 2, preprocessing the preferential enterprise policy text;

step 3, extracting the feature vector of the preferential enterprise policy text;

step 4, setting the obtained feature vector of the enterprise-promoting policy text as an enterprise-promoting policy text data set;

step 5, optimizing a clustering center of the enterprise-preference policy text data set by using a guided sine and cosine algorithm;

step 6, carrying out class cluster division on the enterprise-favorable policy text data set by using the obtained clustering center, namely obtaining a clustering result of the enterprise-favorable policy text;

wherein, the optimizing the clustering center of the preferential enterprise policy text data set by using the guided sine and cosine algorithm in the step 5 comprises the following steps:

step 5.1, setting the number PSZ of agents and setting the maximum iteration number MaxIT;

step 5.2, setting the current iteration times CIt to be 0;

step 5.3, setting the number CCN of the text type clusters of the enterprise-benefiting policy;

step 5.4, generating PSZ intelligent agent AC randomly_iWherein each agent stores CCN cluster centers, agent index i ═ 1,2, …, PSZ;

step 5.5, forming the generated PSZ intelligent agents into a population;

step 5.6, calculating the adaptive values of PSZ agents in the population according to the formula (1):

afv therein_iAn adaptation value representing the ith executing agent; si is a sample subscript; cluster-like subscripts ci ═ 1,2, …, CCN; CX_siRepresenting the sih sample in the set of the preferential enterprise policy text data; DC (direct current)_ciRepresenting the ci-th class cluster; AC_i,ciRepresenting the ci-th cluster center stored by the ith agent;

step 5.7, finding out the intelligent agent with the minimum adaptation value from PSZ intelligent agents of the population, and storing the found intelligent agent with the minimum adaptation value to the optimal intelligent agent gBA;

step 5.8, initializing the retention cross rate KCR_i＝0.5；

Step 5.9, generating PSZ guided Agents DIA_iThe generation method is setting DIA_i＝AC_iWherein the agent subscript i ═ 1,2, …, PSZ;

step 5.10, setting temporary storage intelligent agent TIA_i＝DIA_iWherein the agent subscript i ═ 1,2, …, PSZ;

step 5.11, setting a counter tsi to 1;

step 5.12 at [1, PSZ]Randomly generating a positive integer ei within the range; then setting the ei temporary storage intelligent agent TIA_ei＝gBA；

Step 5.13, setting a counter tsi to tsi + 1;

step 5.14, if the counter tsi is less than PSZ × 0.1, go to step 5.12, otherwise go to step 5.15;

step 5.15, calculating the guided crossing rate DCR according to the formula (2)_i：

Wherein rand represents a random real number generating function, tep is a random real number between [0,1 ];

step 5.16, calculating the NIA of the foreground intelligent agent according to the formula (3)_i：

Wherein rid is a random positive integer between [1, PSZ ]; atp is a random real number between [0,1 ]; trp is a random real number between [0,1 ];

step 5.17, if the foreground agent NIA_iIs smaller than the guiding agent DIA_iThe adapted value of (D), then the guided agent DIA is set_i＝NIA_iOtherwise, the guiding agent DIA is maintained_iThe change is not changed;

step 5.18, executing a guided sine and cosine operator according to the formula (4):

wherein

r2 is [0, 2X π]Random real number in between, and pi is the circumferential ratio; r3 is [0,2 ]]Random real numbers in between; r4 is [0,1]]Random real numbers in between; sin is a sine function; cos is a cosine function; GX_iA sampling agent;

step 5.19, if sampling agent GX_iAdapted value ratio of AC_iIs smaller, the AC is set_i＝GX_iOtherwise, keeping AC_iThe change is not changed;

step 5.20, if sampling the intelligent GX_iAdapted value ratio of AC_iIs smaller, the retention cross rate KCR is set_i＝DCR_iOtherwise, keeping the retention cross rate KCR_iThe change is not changed;

step 5.21, finding out the intelligent agent with the minimum adaptive value in the population and storing the intelligent agent to the optimal intelligent agent gBA;

step 5.22, setting the current iteration times CIt to CIt + 1; if the current iteration number CIt is less than the maximum iteration number MaxIT, go to step 5.10, otherwise go to step 5.23;

and 5.23, extracting the clustering center stored in the optimal agent gBA, namely obtaining the clustering center of the favorable enterprise policy text data set.

The method optimizes the clustering center of the enterprise-preference policy text by using the guided sine and cosine algorithm, and performs cluster division on the enterprise-preference policy text by using the obtained clustering center to realize clustering of the enterprise-preference policy text. In the guided sine and cosine algorithm, an adaptive adjustment mechanism of the guided intersection rate is designed, guided information is generated by utilizing the guided intersection rate, and the performance of the sine and cosine algorithm is improved, so that the clustering precision of the enterprise-benefiting policy text is improved.

Drawings

FIG. 1 is a flow chart of the guided sine and cosine algorithm of the present invention.

Detailed Description

The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.

The technical scheme of the invention is further specifically described by the following embodiments and the accompanying drawings.

Example (b):

in this embodiment, with reference to the accompanying drawings, the specific implementation steps of the present invention are as follows:

step 1, acquiring an enterprise-favorable policy text, wherein the enterprise-favorable policy text comprises but is not limited to a tax-free policy text, a tax-reducing policy text, a interest support policy text and a yield-increasing and efficiency-increasing reward policy text;

step 2, preprocessing the preferential enterprise policy text, wherein the preprocessing comprises but is not limited to deleting messy code characters, removing punctuation marks, segmenting words and removing stop words;

step 3, extracting the feature vector of the preferential enterprise policy text, wherein the method for extracting the feature vector of the preferential enterprise policy text comprises but is not limited to a method of utilizing Word frequency-inverse file frequency (TF-IDF), Word2Vec and LDA;

step 4, setting the obtained feature vector of the enterprise-promoting policy text as an enterprise-promoting policy text data set; wherein, a line of the preferential enterprise policy text data set represents a feature vector of a preferential enterprise policy text;

step 6, sequentially calculating the Euclidean distance between the feature vector of each booby policy text in the booby policy text data set and each obtained clustering center; dividing the attribute vector of the preferential enterprise policy text into clusters with the smallest Euclidean distance from the cluster center to obtain the clustering result of the preferential enterprise policy text;

step 5.1, setting the intelligent agent quantity PSZ to 120 and setting the maximum iteration time MaxIT to 2000;

step 5.2, setting the current iteration times CIt to be 0;

step 5.3, setting the number CCN of the text clusters of the enterprise-benefiting policy as 4;

step 5.5, forming the generated PSZ intelligent agents into a population;

step 5.8, initializing the retention cross rate KCR_i＝0.5；

step 5.11, setting a counter tsi to 1;

Step 5.13, setting a counter tsi to tsi + 1;

and 5.18, executing a guided sine and cosine operation operator according to the formula (4):

wherein

r2 is [0, 2X π]Random real number in between, and pi is a circumferential ratio; r3 is [0,2 ]]Random real numbers in between; r4 is [0,1]]Random real numbers in between; sin is a sine function; cos is a cosine function; GX_iA sampling agent;

The technical principle of the present invention is described above in connection with specific embodiments. The description is made for the purpose of illustrating the principles of the invention and should not be construed in any way as limiting the scope of the invention. Based on the explanations herein, those skilled in the art will be able to conceive of other embodiments of the present invention without inventive effort, which would fall within the scope of the present invention.

Claims

1. A method for clustering the text of a preferential enterprise policy is characterized by comprising the following steps:

step 1, collecting a preferential enterprise policy text;

step 2, preprocessing the preferential enterprise policy text;

step 6, performing cluster division on the enterprise-benefiting policy text data set by using the obtained clustering center to obtain a clustering result of the enterprise-benefiting policy text;

step 5.2, setting the current iteration number CIt to be 0;

step 5.5, forming the generated PSZ intelligent agents into a population;

step 5.7, finding out the intelligent agent with the minimum adaptation value from the PSZ intelligent agents of the population, and storing the found intelligent agent with the minimum adaptation value to the optimal intelligent agent gBA;

step 5.8, initializing the retention cross rate KCR_i＝0.5；

step 5.10, setting temporary storage intelligent agent TIA_i＝DIA_iWherein agent subscript i ═ 1,2, …, PSZ;

step 5.11, setting a counter tsi to 1;

Step 5.13, setting a counter tsi to tsi + 1;

wherein

step 5.19, if sampling agent GX_iAdapted value ratio of AC_iIs smaller, the AC is set_i＝GX_iOtherwise, maintain AC_iThe change is not changed;

step 5.20, if the intelligent agent GX is sampled_iAdapted value ratio of AC_iIs smaller, the retention cross rate KCR is set_i＝DCR_iOtherwise, keeping the retention cross rate KCR_iThe change is not changed;