CN112686043A

CN112686043A - Word vector-based classification method for emerging industries to which enterprises belong

Info

Publication number: CN112686043A
Application number: CN202110034145.3A
Authority: CN
Inventors: 彭敏; 徐文杰; 胡刚; 贾旭
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-01-12
Filing date: 2021-01-12
Publication date: 2021-04-20
Anticipated expiration: 2041-01-12
Also published as: CN112686043B

Abstract

The invention provides a word vector-based classification method for emerging industries to which enterprises belong. The invention obtains the input new industry and obtains the relevant information on the internet according to the name; obtaining candidate keywords by using a Textrank algorithm according to related information of emerging industries; clustering by using a K-means algorithm according to the candidate keywords to obtain emerging industry clustering keywords; acquiring an enterprise operation range from an official website, and acquiring an enterprise operation word bank according to the operation range; expanding emerging industry clustering keywords according to an enterprise operation word bank to obtain an emerging industry keyword word bank; obtaining the inverse document frequency weight of the words according to the enterprise operation word bank; sequentially obtaining a basic evaluation score, a comprehensive evaluation score and an enterprise classification score according to the enterprise operation range to be classified and a newly emerging industrial keyword word bank; and obtaining a classification result of the emerging industry to which the enterprise belongs according to the enterprise classification score. The method has the advantages of no need of manual marking and training, high accuracy and capability of classifying new and emerging industries.

Description

Word vector-based classification method for emerging industries to which enterprises belong

Technical Field

The invention belongs to the technical field of natural language processing, and particularly relates to a word vector-based classification method for emerging industries to which enterprises belong.

Background

In the analysis of enterprise-to-industry connections, it is time-consuming and labor-consuming to manually classify enterprises into corresponding enterprises, especially when facing a large number of samples of enterprises to be classified and emerging industries with a lack of related classification experience. All enterprises have operation ranges, the operation ranges can embody the industries of the enterprises, and the operation ranges are used for analyzing the classification of the industries of the enterprises. The business scope and the industry are both composed of words, and the words in the same business scope or the same industry have similarity, so the distance between word vectors can be used as the measure of word similarity.

In the algorithm research, proper introduction of external parameters including the use of industry description information as supplementary knowledge and the use of word inverse document frequency and word part of speech as word weight can be found out, so that better classification results can be obtained. In addition, the use of unsupervised algorithms can save time and labor costs for labeling large numbers of samples. In conclusion, based on the consideration of accelerating enterprise classification and improving analysis efficiency, the invention provides a word vector-based classification method for emerging industries to which enterprises belong.

In the existing invention technology, for example, patent application with publication number CN110019769 discloses an intelligent enterprise classification method, in which a supervised classification method based on SVM (support vector machine) is used, and the method has the following short boards: a large number of samples need to be manually pre-labeled and the model needs to be trained for a certain time. The method does not have the capacity of classifying emerging industries, a large amount of time is needed for retraining when new labels appear, and a network which is trained in advance needs to be deployed when the method is used, so that the computer power requirement is high.

Disclosure of Invention

The invention aims to provide a method for rapidly and accurately classifying emerging industries of enterprises when the enterprises and the industries are analyzed in a correlation mode, and provides a word vector-based method for classifying the emerging industries of the enterprises, which does not need manual labeling, training, high adaptability, high accuracy and strong expansibility, in view of the fact that the existing method needs to label a large amount of data and a large amount of time for model training and cannot expand the emerging industries.

The technical scheme adopted by the invention is as follows: a word vector-based classification method for emerging industries of enterprises is characterized by comprising the following steps:

step 1: acquiring input emerging industries, and acquiring related information on the Internet according to the names of the emerging industries; obtaining candidate keywords by using a Textrank algorithm according to related information of the emerging industry, and obtaining candidate keywords of the emerging industry; clustering by using a K-means algorithm according to the candidate keywords to obtain emerging industry clustering keywords;

step 2: acquiring an enterprise operation range from an official website, and acquiring an enterprise operation word bank according to the enterprise operation range; expanding the emerging industry clustering keywords according to the enterprise operation word bank to obtain an emerging industry keyword word bank;

and step 3: obtaining the inverse document frequency weight of the words according to the enterprise operation word bank;

and 4, step 4: obtaining a basic evaluation score according to the business range of the enterprise to be classified and a keyword lexicon of emerging industries; obtaining a comprehensive evaluation score according to the basic evaluation score; obtaining enterprise classification scores according to the comprehensive evaluation scores; obtaining a classification result of a new industry to which the enterprise belongs according to the enterprise classification score;

preferably, the emerging industry in step 1 is:

Ind_p

p∈[1,M]

wherein Ind_pName of the p-th emerging industry, and M represents the number of emerging industries;

step 1, obtaining relevant information on the Internet according to the name of emerging industry:

subjecting Ind to_pAutomatic retrieval of Ind as a keyword on the Internet using crawler technology_pIs recorded as

Information related to the p emerging industry;

step 1, obtaining candidate keywords by using a Textrank algorithm according to related information of the emerging industry, and obtaining candidate keywords of the emerging industry:

using the Textrank algorithm from

Extracting key words to obtain Ind_pThe candidate keywords of (2) are noted as:

key_p＝[w_p，1，w_p，2，…，w_p，D]

wherein, key_pCandidate keyword, w, representing the p-th emerging industry_p，dThe D candidate keyword representing the p emerging industry, D ∈ [1, D [ ]]D represents the number of candidate keywords;

step 1, clustering by using a K-means algorithm according to the candidate keywords to obtain emerging industry clustering keywords:

using word2vec technology to combine keys_pAll words in (2) are mapped to a multi-dimensional word vector space:

key_p＝[w_p，1，w_p，2，…，w_p，D]

where w2v (·) represents a function that converts words into word vectors, key_pCandidate keyword, w, representing the p-th emerging industry_p，dA d-th candidate keyword representing a p-th emerging industry;

using a K-means pair w2v (key)_p) Clustering to obtain emerging industrial clustering keywords, wherein the clustering quantity is as follows:

wherein, K_pRepresenting the number of clusters for the p-th emerging industry,

denotes a rounded-down symbol, Len (key)_p) Representing the total number of candidate keywords of the p emerging industry;

the emerging industry clustering keywords are as follows:

D_p，q[k]

p∈[1，M]，q∈[1，K_p]，k∈[1，L_p，q]

wherein D is_p，q[k]Representing the kth key word in a key word array formed by the qth clustering result in the pth emerging industry, M representing the number of the emerging industries, K_pRepresents the total number of clusters, L, for the p-th emerging industry_p，qThe total number of keywords representing the qth clustering result in the pth emerging industry.

Preferably, the official website in step 2 acquires the enterprise operation range, and acquires an enterprise operation word bank according to the enterprise operation range:

the enterprise operation range stated in step 2 is recorded as

S_g

1≤g≤N

Wherein S is_gThe management range information of the g enterprise is represented, and N represents the total number of the enterprises;

and (3) obtaining the enterprise operation word bank in the step 2 after removing stop words and word segmentation from the enterprise operation range, and recording the word bank as follows:

F＝[Split(S₁)，Split(S₂)，…，Split(S_N)]

Split(S_g)＝[x_g，1，x_g，2，…，x_g，h]

wherein F represents an enterprise operation word bank, Split (. cndot.) represents a function of removing stop words and participles, and x_g，hRepresenting the h term obtained after the g enterprise operation range is subjected to stop word removal and word segmentation treatment, namely the h term in the g enterprise operation word bank;

step 2, according to the enterprise operation word bank, expanding the emerging industry clustering keywords:

the emerging industrial clustering keywords are used for searching 3 words with the highest similarity by cosine similarity:

where cossim (·,) represents a function for calculating cosine similarity, w2v (·) represents a function for converting words into word vectors, and x_g，hExpress the h term in the g enterprise operation word bank, D_p，q[k]Representing the kth keyword in a keyword array formed by the qth clustering result in the pth emerging industry;

neutralizing F with D_p，q[k]And (3) supplementing the L words with the highest similarity to the emerging industry clustering keywords to obtain an emerging industry keyword word bank, and recording as follows:

A_p，q＝[D_p，q[1]，D_p，q[1]₁，D_p，q[1]₂，…，D_p，q[1]_l，…，D_p，q[k]，D_p，q[k]₁，D_p，q[k]₂，…，D_p，q[k]_L]

wherein A is_p，qRepresents the q-th auxiliary keyword array of the p-th emerging industry, D_p，q[k]Representing the kth keyword in a keyword array consisting of the qth clustering results in the pth emerging industry, D_p，q[k]_lDenotes in F and D_p，q[k]The ith word with the highest similarity, and L represents the number of the highest-ranked word-taking numbers in turn according to the similarity ranking.

Preferably, in step 3, the inverse document frequency weight of the term is obtained according to the enterprise operation word bank:

calculating the inverse document frequency of all the words according to the distribution of the words in the enterprise operation word bank, and recording as follows:

1≤g≤G，1≤h≤G_g

wherein，idf_won(x_g，h) The inverse document frequency of the h term in the g enterprise operation word bank, R is the total number of the operation ranges, Num (x)_g，h) Representing the total number of the operation range containing the h word in the G enterprise operation word bank, G being the total number of the enterprise operation word bank, G_gThe total number of words in the g enterprise operation word bank;

and obtaining the normalized inverse document frequency by using a normalization algorithm according to the inverse document frequency, and recording the normalized inverse document frequency as:

wherein idf_norm(x_g，h) Normalized inverse document frequency, idf, for the h term in the g-th Enterprise thesaurus_won(x_g，h) The inverse document frequency, idf, of the h term in the g-th enterprise thesaurus_wonmin is the minimum value of the frequency of the inverse documents in the business word bank of all enterprises, idf_wonmax is the maximum value of the frequency of the inverse documents in all the enterprise operation word banks;

obtaining an inverse document frequency weight according to the normalized inverse document frequency, and recording as:

wherein idf (·) is a function for calculating the frequency weight of the inverse document, word is any term, and F is an enterprise operation word bank;

preferably, step 4, obtaining a basic evaluation score according to the business operation range of the enterprise to be classified and the keyword lexicon of the emerging industry:

the enterprise to be classified is marked as:

C_e

wherein, C_eRepresenting the e-th enterprise to be classified;

the operation range of the enterprise to be classified is recorded as:

Scope_e

wherein, Scope_eIndicates the e-th waiting scoreThe business scope of the class enterprise;

scope to be described_eAnd (3) segmenting words and removing stop words to obtain enterprise operation range segmentation words, and recording the segmentation words as:

query_e＝[y_e，1，y_e，2，…，y_e，r]

wherein, query_eMeaning the operation range word segmentation of the e-th enterprise to be classified_e[r]＝y_e，rThe r term represents the operation range segmentation of the e enterprise to be classified;

obtaining cosine similarity according to the word segmentation of the enterprise operation range to be classified and the emerging industry keyword word bank, and recording as follows:

wherein, cossim (·,) represents the function of calculating cosine similarity, w2v (·) represents the function of converting words into word vectors, query_e[r]The r term, A, representing the operation range division of the e enterprise to be classified_p，q[t]Representing the t word in the q auxiliary keyword array of the p emerging industry;

and calculating word similarity according to the cosine similarity, and recording as:

sim(query_e[r]，A_p，q[t])＝cossim(w2v(query_e[r])，w2v(A_p，q[t]))

wherein sim (·,) represents a function for calculating word similarity, cossim (·,) represents a function for calculating cosine similarity, and query_e[r]The r term, A, representing the operation range division of the e enterprise to be classified_p，q[t]Representing the t word in the q auxiliary keyword array of the p emerging industry;

and calculating a basic evaluation score according to the similarity of the words, and recording as:

wherein, base (q)uery_e，A_p，q) Expressing the basic evaluation score, query, of the q-th auxiliary keyword array of the e-th enterprise to be classified in the operation range and the p-th emerging industry_eMeaning the e-th enterprise operation range word segmentation, query_e[i]The ith word representing the business operation range word division of the e-th enterprise to be classified, A_p，qRepresents the q-th auxiliary key word array of the p-th emerging industry, A_p，q[t]Representing the jth word in the qth auxiliary keyword array of the pth emerging industry, idf (·) is a function for calculating idf weight, n represents the total number of the business range participles of the e enterprise, and m represents the total number of the qth auxiliary keyword array of the pth emerging industry;

and 4, obtaining a comprehensive evaluation score according to the basic evaluation score:

introducing word part-of-speech weight according to the basic evaluation score, calculating a comprehensive evaluation score, and recording as:

wherein, score (query)_e，A_p，q) Representing the comprehensive evaluation score, base (query), of the q-th auxiliary keyword array of the mth enterprise to be classified and the pth emerging industry_e，A_p，q) Expressing the basic evaluation score, query, of the q-th auxiliary keyword array of the e-th enterprise to be classified in the operation range and the p-th emerging industry_eExpress the e-th business operation area word, A_p，qRepresents the q auxiliary key word array of the p emerging industry, query _ n_eIs query_eAn array composed of Chinese nouns, n _ n is query _ n_eLength of (1), query _ v_eIs query_eAn array composed of medium verbs, n _ v is query _ v_eC is a weight parameter,

is A_p,qAn array of Chinese nouns, m _ n is

Length of (d);

is A_p,qAn array of medium verbs, m _ v being

Length of (d);

and 4, obtaining enterprise classification scores according to the comprehensive evaluation scores:

and obtaining enterprise classification scores according to the comprehensive evaluation scores, and recording as:

wherein, classify (C)_e，Ind_p) Score (query) for the classification scores of the e-th business to be classified and the p-th emerging business_e，A_p，i) Expressing the comprehensive evaluation score Q of the ith auxiliary keyword array of the enterprise operation range participle of the e-th to-be-classified enterprise and the p-th emerging industry_pThe total number of the auxiliary keyword arrays of the p emerging industry;

and 4, obtaining a classification result of the emerging industry to which the enterprise belongs according to the enterprise classification scores:

obtaining a classification result of a new industry to which the enterprise belongs according to the enterprise classification score, and recording the classification result as:

Ind_T＝argmax(classify(C_e，Ind_i))

wherein Ind_TClassification (C) is an emerging industry that maximizes the enterprise classification scores of all the e-th enterprise to be classified_e，lnd_p) And classifying the enterprise classification scores of the e-th enterprise to be classified and the p-th emerging industry.

The method has the advantages of no need of manual marking and training, strong adaptability and high accuracy, and can classify newly-added emerging industries.

Drawings

FIG. 1: is a flow chart of an embodiment of the present invention.

FIG. 2: is a comparative effect diagram of the method of the embodiment of the invention.

FIG. 3: graphs are shown for the results of the examples of the invention.

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

Emerging industries are obtained by statistics of the inventor according to the emerging industries mentioned in government reports and the development of the industries in recent years, and all the emerging industries are marked as Ind; enterprises in the enterprise operating word bank are obtained from the industrial and commercial registration database, and all the enterprise operating word banks are marked as F; and inquiring the official website of the corresponding enterprise according to the enterprise name of the enterprise to be classified, and recording all the enterprises to be classified as C.

Please refer to FIG. 1, the present invention provides a word vector-based classification method for emerging industries belonging to enterprises, comprising the following steps:

step 1 the emerging industry is:

Ind_p

p∈[1,M]

wherein Ind_pName indicating the pth emerging industry, M ═ 212 indicates the number of emerging industries;

Information related to the p emerging industry;

using the Textrank algorithm from

Extracting key words to obtain Ind_pThe candidate keywords of (2) are noted as:

key_p＝[w_p，1，w_p，2，…，w_p，D]

wherein, key_pCandidate keyword, w, representing the p-th emerging industry_p，dThe D candidate keyword representing the p emerging industry, D ∈ [1, D [ ]]D18625 represents the number of candidate keywords;

dey will be used using word2vec technology_pAll words in (2) are mapped to a multi-dimensional word vector space:

key_p＝[w_p，1，w_p，2，…，w_p，D]

wherein, K_pRepresenting the number of clusters for the p-th emerging industry,

denotes a rounded-down symbol, Len(key_p) Representing the total number of candidate keywords of the p emerging industry;

the emerging industry clustering keywords are as follows:

D_p，q[k]

p∈[1,M]，q∈[1,K_p]，k∈[1,L_p,q]

wherein D is_p，q[k]Representing the kth key word in a key word array formed by the qth clustering result in the pth emerging industry, M representing the number of the emerging industries, K_pRepresents the total number of clusters, L, for the p-th emerging industry_p,qThe total number of keywords representing the qth clustering result in the pth emerging industry.

step 2, the official website acquires the enterprise operation range, and an enterprise operation word bank is obtained according to the enterprise operation range:

the enterprise operation range stated in step 2 is recorded as

S_g

1≤g≤N

Wherein S is_gThe management range information of the g-th enterprise is represented, and the total number of the enterprises is represented by N100000;

F＝[Split(S₁)，Split(S₂)，…，Split(S_N)]

Split(S_g)＝[x_g，1，x_g，2，…，x_g，h]

A_p，q＝[D_p，q[1]，D_p，q[1]₁，D_p，q[1]₂，…，D_p，q[1]_l，…，D_p，q[k]，D_p，q[k]₁，D_p，q[2]₂，…，D_p，q[k]_L]

wherein A is_p，qRepresents the q-th auxiliary keyword array of the p-th emerging industry, D_p，q[k]Representing the kth keyword in a keyword array consisting of the qth clustering results in the pth emerging industry, D_p，q[k]_lDenotes in F and D_p，q[k]The 1 st word with the highest similarity, and L represents the number of the highest-ranked word-taking numbers in turn according to the similarity ranking.

step 3, obtaining the inverse document frequency weight of the words according to the enterprise operation word bank:

1≤g≤G，1≤h≤G_g

wherein idf_won(x_g，h) The inverse document frequency of the h term in the g enterprise operation word bank, R is the total number of the operation ranges, Num (x)_g，h) Representing the total number of the operation range containing the h word in the G enterprise operation word bank, G being the total number of the enterprise operation word bank, G_gThe total number of words in the g enterprise operation word bank;

step 4, obtaining a basic evaluation score according to the business operation range of the enterprise to be classified and the emerging industry keyword word bank:

the enterprise to be classified is marked as:

C_e

wherein, C_eRepresenting the e-th enterprise to be classified;

the operation range of the enterprise to be classified is recorded as:

Scope_e

wherein, Scope_eRepresenting the operation range of the e-th enterprise to be classified;

query_e＝[y_e，1，y_e，2，…，y_e，r]

sim(query_e[r]，A_p，q[t])＝cossim(w2v(query_e[r])，w2v(A_p，q[t]))

wherein, base (query)_e，A_p，q) Expressing the basic evaluation score, query, of the q-th auxiliary keyword array of the e-th enterprise to be classified in the operation range and the p-th emerging industry_eMeaning the e-th enterprise operation range word segmentation, query_e[i]The ith word representing the business operation range word division of the e-th enterprise to be classified, A_p，qRepresents the q-th auxiliary key word array of the p-th emerging industry, A_p，q[t]Representing the jth word in the qth auxiliary keyword array of the pth emerging industry, idf (·) is a function for calculating idf weight, n represents the total number of the business range participles of the e enterprise, and m represents the total number of the qth auxiliary keyword array of the pth emerging industry;

wherein, score (query)_e，A_p，q) Representing the comprehensive evaluation score, base (query), of the q-th auxiliary keyword array of the mth enterprise to be classified and the pth emerging industry_e，A_p，q) Expressing the basic evaluation score, query, of the q-th auxiliary keyword array of the e-th enterprise to be classified in the operation range and the p-th emerging industry_eExpress the e-th business operation area word, A_p，qRepresents the number of q auxiliary keywords of the p emerging industrySet, query _ n_eIs query_eAn array composed of Chinese nouns, n _ n is query _ n_eLength of (1), query _ v_eIs query_eAn array composed of medium verbs, n _ v is query _ v_eC is a weight parameter,

is A_p,qAn array of Chinese nouns, m _ n is

Length of (d);

is A_p,qAn array of medium verbs, m _ v being

Length of (d);

Ind_T＝argmax(classify(C_e，Ind_i))

wherein the content of the first and second substances,Ind_Tclassification (C) is an emerging industry that maximizes the enterprise classification scores of all the e-th enterprise to be classified_e，Ind_p) Classifying the enterprise classification scores of the e-th enterprise to be classified and the p-th emerging industry;

and 5: for industries which exist in a key word bank of a new industry, the existing word bank is expanded by adopting a method for supplementing key words, and the method is also effective for classifying the traditional industries to which the enterprises belong; for the emerging industries which do not exist in the keyword word stock of the emerging industries, the method for classifying the newly-built industries and searching by using the Internet crawler is adopted to create the corresponding keyword word stock of the emerging industries, so that the aim of dynamically expanding the emerging industries is fulfilled.

Finally, in order to illustrate the experimental effect of the invention, the invention is compared with other methods, and the experimental result is shown in the attached figure 2, which proves the feasibility and the accuracy of the invention. An example of the classification result is shown in fig. 3.

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A word vector-based classification method for emerging industries of enterprises is characterized by comprising the following steps:

and 4, step 4: obtaining a basic evaluation score according to the business range of the enterprise to be classified and a keyword lexicon of emerging industries; obtaining a comprehensive evaluation score according to the basic evaluation score; obtaining enterprise classification scores according to the comprehensive evaluation scores; and obtaining a classification result of the emerging industry to which the enterprise belongs according to the enterprise classification score.

2. The method of claim 1, wherein the method comprises the following steps:

step 1 the emerging industry is:

Ind_p

p∈[1，M]

Information related to the p emerging industry;

using the Textrank algorithm from

Extracting key words to obtain Ind_pThe candidate keywords of (2) are noted as:

key_p＝[w_p，1，w_p，2，…，w_p，D]

wherein, K_pRepresenting the number of clusters for the p-th emerging industry,

the emerging industry clustering keywords are as follows:

D_p，q[k]

p∈[1，M]，q∈[1，K_p]，k∈[1，L_p，q]

wherein D is_p，q[k]Representing the kth key word in a key word array formed by the qth clustering result in the pth emerging industry, M representing the number of the emerging industries, K_pRepresents the total number of clusters, L, for the p-th emerging industry_p，qRepresents the qth cluster node in the pth emerging industryTotal number of keywords of fruit.

3. The method of claim 1, wherein the method comprises the following steps:

and 2, recording the enterprise operation range as S_g

1≤g≤N

F＝[Split(S₁)，Split(S₂)，…，Split(S_N)]

Split(S_g)＝[x_g，1，x_g，2，…，x_g，h]

A_p，q

＝[D_p，q[1]，D_p，q[1]₁，D_p，q[1]₂，…，D_p，q[1]l，…，D_p，q[k]，D_p，q[k]₁，D_p，q[k]₂，…，D_p，q[k]_L]

4. The method of claim 1, wherein the method comprises the following steps:

wherein idf (·) is a function for calculating the frequency weight of the inverse document, word is any term, and F is an enterprise operation word bank.

5. The method of claim 1, wherein the method comprises the following steps:

the enterprise to be classified is marked as:

C_e

wherein, C_eRepresenting the e-th enterprise to be classified;

the operation range of the enterprise to be classified is recorded as:

Scope_e

query_e＝[y_e，1，y_e，2，…，y_e，r]

sim(query_e[r]，A_p，q[t])＝cossim(w2v(query_e[r])，w2v(A_p，q[t]))

wherein, base (query)_e，A_p，q) Expressing the basic evaluation score, query, of the q-th auxiliary keyword array of the e-th enterprise to be classified in the operation range and the p-th emerging industry_eMeaning the e-th enterprise operation range word segmentation, query_e[i]Express the e-th to-be-classified enterpriseThe ith word of division of business tendency, A_p，qRepresents the q-th auxiliary key word array of the p-th emerging industry, A_p，q[t]Representing the jth word in the qth auxiliary keyword array of the pth emerging industry, idf (·) is a function for calculating idf weight, n represents the total number of the business range participles of the e enterprise, and m represents the total number of the qth auxiliary keyword array of the pth emerging industry;

is A_p，qAn array of Chinese nouns, m _ n is

Length of (d);

is A_p，qMiddle verbAn array of compositions, m _ v is

Length of (d);

Ind_T＝argmax(classify(C_e，Ind_i))

wherein Ind_TClassification (C) is an emerging industry that maximizes the enterprise classification scores of all the e-th enterprise to be classified_e，Ind_p) And classifying the enterprise classification scores of the e-th enterprise to be classified and the p-th emerging industry.