CN107357851A

CN107357851A - A kind of information processing method and system

Info

Publication number: CN107357851A
Application number: CN201710506158.XA
Authority: CN
Inventors: 夏耘海; 张斌德; 王江
Original assignee: Guoxin Youe Data Co Ltd
Current assignee: Guoxin Youe Data Co Ltd
Priority date: 2017-06-28
Filing date: 2017-06-28
Publication date: 2017-11-17
Anticipated expiration: 2037-06-28
Also published as: CN107357851B

Abstract

The invention discloses a kind of information processing method, including：Determine that trade classification code meets the first enterprise of default trade classification code from default enterprise；Wherein, it is the affiliated industry of three new spectras to preset trade classification code to characterize industry；Industry corresponding to industry is characterized based on default trade classification code and illustrates document, generates three new spectra keyword corpus；Business scope corresponding to first enterprise is introduced into document and carries out keyword match with corpus, filters out the second enterprise；Business relevant documentation corresponding to the second enterprise is crawled, and the business relevant documentation crawled and corpus are subjected to Similarity Measure；It is up to the second enterprise belonging to the business relevant documentation of default similarity and is defined as three new spectras.The invention also discloses a kind of information processing system.The present invention can accurately filter out three new spectras for meeting to require from default enterprise.

Description

A kind of information processing method and system

Technical field

The present invention relates to a kind of information approach and system, and in particular to a kind of method and system for identifying three new spectras.

Background technology

With the fast development of China's economy, new firms and economic activity continuously emerge.Enterprise is as in social economy Most important active agent, important role is play in economy, the arrangement and analysis for company information assist in Relevant Decision person understands the management state of the enterprise, finds potential business risk.

For example, most recently newly appearance and enjoying three new spectras of the Party Central Committee, State Council's concern (including NPD projects, new industry situation, new The enterprise of business model), related personnel needs economic activity development scale, structure and quality to this kind of enterprise etc. to count Observation, to understand the development scale of this kind of enterprise, structure and quality in real time, reference frame is provided for future decisions.And carry out The key point of statistical observation is that those enterprises belong to three new spectras in the numerous enterprises for needing exact knowledge to investigate.This just needs pair Three new spectras are accurately screened, to filter out satisfactory three new spectra.However, it is new to not currently exist accurate screening three The scheme of enterprise..

The content of the invention

The example technical problems to be solved of the present invention are to provide one kind being capable of time saving and energy saving and three new spectras of accurate screening Scheme.

One aspect of the present invention provides a kind of information processing method, for accurately and effectively screening three new spectras, this method bag Include：Determine that trade classification code meets the first enterprise of default trade classification code from default enterprise；Wherein, the default row It is the affiliated industry of three new spectras that industry Sort Code, which characterizes industry,；It is corresponding that industry is characterized based on the default trade classification code Industry illustrate document, generate three new spectra keyword corpus；Business scope corresponding to first enterprise is introduced into document Keyword match is carried out with the corpus, filters out the second enterprise；Business relevant documentation corresponding to second enterprise is crawled, And the business relevant documentation crawled and the corpus are subjected to Similarity Measure；The business for being up to default similarity is related The second enterprise is defined as three new spectras belonging to document.

Alternatively, the business relevant documentation includes the full text or fragment of following one or more documents：Related product Introduction, Related product operation instruction, software works, trade mark, patent.

Alternatively, industry corresponding to industry is characterized based on the default trade classification code and illustrates document, it is new to generate three Enterprise's keyword corpus, is specifically included：For being said in the default trade classification code per industry corresponding to class industry code Plaintext shelves, the sector is illustrated that document splits into single word；For splitting obtained each word, the word frequency of the word is determined；Using Word frequency extraction keyword of the preset algorithm based on determination, generates three new spectra keyword corpus.

Alternatively, the business relevant documentation crawled and the corpus are subjected to Similarity Measure, specifically included：For The every business relevant documentation crawled, the business relevant documentation is split into single word；For splitting obtained each word, really The word frequency of the fixed word；Obtained word and corresponding word frequency will be split by the business relevant documentation, respectively with by the default row Industry is corresponded to per class industry code illustrate that document splits obtained word and corresponding word frequency progress similarity in industry Sort Code Calculate.

Alternatively, it is up to the second enterprise belonging to the business relevant documentation of default similarity and is defined as three new spectras, specifically Including：If in the presence of at least a kind of industry code, business relevant documentation industry corresponding with such industry code is set to illustrate document phase Reach default similarity like degree, then the second enterprise belonging to the business relevant documentation is defined as three new spectras.

An alternative embodiment of the invention provides a kind of information processing system, including：First processing units, for from default Determine that trade classification code meets the first enterprise of default trade classification code in enterprise；Wherein, the default trade classification generation It is the affiliated industry of three new spectras that code, which characterizes industry,；Corpus generation unit, for based on the default trade classification code institute Industry corresponding to characterizing industry illustrates document, generates three new spectra keyword corpus；Second processing unit, for by described Business scope corresponding to one enterprise introduces document and carries out keyword match with the corpus, filters out the second enterprise；Similarity Computing unit, for crawling business relevant documentation corresponding to second enterprise, and by the business relevant documentation crawled and institute State corpus and carry out Similarity Measure；3rd processing unit, for being up to belonging to the business relevant documentation of default similarity Two enterprises are defined as three new spectras.

Alternatively, the corpus generation unit characterizes industry corresponding to industry based on the default trade classification code Illustrate document, generate three new spectra keyword corpus, specifically include：For every class industry in the default trade classification code Industry illustrates document corresponding to code, and the sector is illustrated into document splits into single word；For splitting obtained each word, it is determined that The word frequency of the word；Keyword is extracted using word frequency of the preset algorithm based on determination, generates three new spectra keyword corpus.

Alternatively, the business relevant documentation crawled and the corpus are carried out similarity by the similarity calculated Calculate, specifically include：For the every business relevant documentation crawled, the business relevant documentation is split into single word；For Obtained each word is split, determines the word frequency of the word；Obtained word and corresponding word frequency will be split by the business relevant documentation, Respectively with by the default trade classification code, per class industry code, corresponding industry illustrates that document splits obtained word and right The word frequency answered carries out Similarity Measure.

Alternatively, the 3rd processing unit is up to the second enterprise belonging to the business relevant documentation of default similarity and determined For three new spectras, specifically include：If in the presence of at least a kind of industry code, make the business relevant documentation corresponding with such industry code Industry illustrates that Documents Similarity reaches default similarity, then the second enterprise belonging to the business relevant documentation is defined as into three new enterprises Industry.

Information processing method provided by the invention, it is three new spectra institutes from industry is characterized first when screening three new spectras Belong to and determine that trade classification code meets the first enterprise of default trade classification code in the default enterprise of industry, then, based on pre- If trade classification code characterizes industry corresponding to industry and illustrates document, three new spectra keyword corpus are generated, then, by the Business scope corresponding to one enterprise introduces document and corpus and carries out keyword match, filters out the second enterprise, then, crawls the Business relevant documentation corresponding to two enterprises, and the business relevant documentation crawled and corpus are subjected to Similarity Measure, finally, It is up to the second enterprise belonging to the business relevant documentation of default similarity and is defined as three new spectras, it is so laddering by three-wheel Screening so that the enterprise screened is the degree of accuracy more and more higher of three new spectras, can filter out three new spectras exactly, be The screening of three new spectras provides reference frame.

Brief description of the drawings

Fig. 1 is the schematic flow sheet of the information processing method of the embodiment of the present invention；

Fig. 2 is the structural representation of the information processing system of the embodiment of the present invention.

Embodiment

To make the technical problem to be solved in the present invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing and tool Body embodiment is described in detail.

Fig. 1 is the schematic flow sheet of the information processing method of the embodiment of the present invention.As shown in figure 1, the embodiment of the present invention carries The information processing method of confession, including：

S101, determine that trade classification code meets the first enterprise of default trade classification code from default enterprise；Wherein, It is the affiliated industry of three new spectras that the default trade classification code, which characterizes industry,.

S102, characterize based on the default trade classification code industry corresponding to industry and illustrate document, the new enterprise of generation three Industry keyword corpus.

S103, business scope corresponding to first enterprise is introduced to document and corpus progress keyword match, Filter out the second enterprise.

S104, crawl business relevant documentation corresponding to second enterprise, and by the business relevant documentation crawled and institute State corpus and carry out Similarity Measure.

S105, it is up to the second enterprise belonging to the business relevant documentation of default similarity and is defined as three new spectras.

Wherein, in step S101, characterizing industry can be based on for the default trade classification code of the affiliated industry of three new spectras Description of the associated documents to three new spectras, extract and obtain from industrial sectors of national economy classification.For example, it can be based on《The Chinese people are total to With state's the 13rd five-year-plan outline of national economy and social development》《State Council is on printing and distributing the logical of the ＞ of ＜ made in China 2025 Know》、《Instruction of the State Council on actively pushing forward " internet+" action》With《State Council is on carrying forward vigorously popular foundation Millions of people innovate the opinion of some policies and measures》The classification model of three new spectras is drawn Deng the elaboration about " three is new " activity in file Enclose, the classification range on three new spectras for being then based on obtaining is divided to choose the industry of correlation from industrial sectors of national economy classification Category code.In one example, the classification range based on three new spectras obtained by associated documents may include modern agriculture, forestry, animal husbandry and fishery, Advanced manufacturing industry, novel energy activity, energy-conserving and environment-protective activity, internet and modern information technologies service, new technology and double wounds take Business activity, modern production sex service activity, new liveliness proof service activity, modern integrated management activity, according to these classification institutes Obtained default trade classification code may include 278 groups.

So, the default trade classification code based on determination, selection trade classification code meets default from default enterprise First enterprise of trade classification code.Default enterprise can be screened from request by specified interface and be obtained at the requestor of three new spectras , or acquisition is crawled by web crawlers according to designated key word.

In step s 102, industry corresponding to industry is characterized based on the default trade classification code and illustrates document, it is raw Into three new spectra keyword corpus, may particularly include：

Step 1: for illustrating document per industry corresponding to class industry code in the default trade classification code, by this Industry illustrates that document splits into single word.

It can use default participle instrument that industry corresponding to every class industry code is illustrated into document splits into single word, example Such as, the jieba storehouses in python can be used.Every industry can be illustrated that document is split as by jieba storehouses according to custom rule Single word.

Step 2: for splitting obtained each word, the word frequency of the word is determined.

The each word obtained for step 1, it can be counted to obtain each word by word frequency statisticses instrument in every industry Illustrate the word frequency that document occurs, so as to obtain the word frequency of each word.In addition, to reduce noise, can be by the word obtained by step 1 In to screening three new spectras without especially contribution or nonsensical word delete, such as delete document in some void Word, such as interjection, preposition, conjunction, so as to improve efficiency of the subsequent step to keyword extraction.

Step 3: extracting keyword using word frequency of the preset algorithm based on determination, three new spectra keyword corpus are generated.

In the example of the present invention, keyword, generation can be extracted using TF-IDF methods come the word frequency based on determination Three new spectra keyword corpus, but the invention is not limited in this, can also be using other method come the word frequency based on determination Keyword is extracted, for example, mutual information, expectation cross entropy, Information Gain Method, PCA, genetic algorithm etc..

The t of each word in every document is obtained in the present invention using TF-IDF methods_i- idf is worth, and chooses t_i- idf is worth More than specific threshold word as keyword, every industry illustrates the t of each word in document_i- idf values can pass through equation below (1) obtain：

t_i- idf=f_i*log(N/df_i) (1)

Wherein, f_iRefer to word frequency rate, represent the number that i-th of word occurs in the sector illustrates document, df_iRefer to document Frequency, represent that all industries illustrate the number of documents for occurring i-th of word in document, N represents that all industries illustrate the number of document. t_iThe specific threshold of-idf values can determine according to actual conditions, as long as the keyword for obtain screens to greatest extent Go out to meet the reduction processing complexity that three desired new spectra and cans are tried one's best.

The word frequency of each word of the every service description document obtained by step 2, can be obtained using above-mentioned formula (1) To the t of each word_i- idf is worth, and then chooses t_i- idf values are more than the word of specific threshold as keyword, so as to generate three new spectras Keyword corpus.

In step s 103, business scope corresponding to the first enterprise step S101 obtained introduces document and step S102 The keyword of generation expects that storehouse carries out Keywords matching, filters out second enterprise associated with keyword.First enterprise is corresponding Business scope introduce document can by specified interface at related personnel for example request screen three new spectras requestor at obtain , or acquisition is crawled by web crawlers.The present invention an example in, can use R language in match functions by Business scope corresponding to first enterprise introduces document and the keyword of step S102 generations expects that storehouse carries out Keywords matching, automatically Filter out second enterprise associated with the keyword in keyword corpus.Because the trade classification code of enterprise may be not The actual business managed of the enterprise can be represented, i.e. the trade classification code of enterprise may exist with its actual business managed Deviation, thus the first enterprise determined by trade classification code there may be it is many be not three new spectras enterprise, because This, in step S103 using Keywords matching by way of the second enterprise for further being filtered out from the first enterprise it is new for three The accuracy rate of enterprise can improve a lot, and in one exemplary embodiment of the invention, be screened by step S103 About 85% three new spectras can be included in second enterprise.

In step S104, Real-time Network can be carried out by the software kit such as the seleuim of python programming languages, bs4 reptile Network crawls business relevant documentation corresponding to the second enterprise, and the business relevant documentation may include the full text of following one or more documents Or fragment：Related product introduction, Related product operation instruction, software works, trade mark, patent.Wherein, the business that will be crawled Relevant documentation carries out Similarity Measure with the keyword corpus obtained in step S102, may particularly include：

The first step, every business relevant documentation for crawling, single word is split into by the business relevant documentation.

The step every business relevant documentation is split into single word mode can with will be per class row in abovementioned steps S102 Industry corresponding to industry code illustrate document split into single word mode it is identical.

Second step, each word obtained for fractionation, determine the word frequency of the word.

The step determines that the mode of the word frequency of each word can be with determining each word in industry expository writing in abovementioned steps S102 The mode of word frequency in shelves is identical.Equally, to reduce noise, will can not have in the word obtained by the first step to three new spectras of screening Especially contribution or nonsensical word are deleted, such as delete some function words in document, such as interjection, preposition, conjunction Deng so as to improve efficiency of the subsequent step to keyword extraction.

3rd step, obtained word and corresponding word frequency will be split by the business relevant documentation, respectively with by described default Correspond to industry per class industry code in trade classification code and illustrates that document splits obtained word and the progress of corresponding word frequency is similar Degree calculates.

In the present invention, every business relevant documentation and every class industry code can be calculated using co sinus vector included angle method Corresponding industry illustrates the similarity of document.

Specifically, first, in a certain order, for example, the sequencing occurred in a document according to word, by each word Corresponding word frequency is built into word frequency vector.For example, for i-th business relevant documentation in business relevant documentation, it is based on The word of fractionation and corresponding word frequency can build vector：A_i:[x₁,x₂,...,x_n], wherein, x₁,x₂,...,x_nRespectively should The word frequency of n keyword of business relevant documentation.Similarly, for presetting the i-th class industry code pair in trade classification code Industry is answered to illustrate document, its word based on fractionation and corresponding word frequency can build vector and be：B_i:[y₁,y₂,...,y_n], its In, y₁,y₂,...,y_nRespectively such industry code corresponds to the word frequency that industry illustrates n keyword of document.

Then, the vector based on foregoing structure, using following formula (2) come every business relevant documentation and every class industry generation The corresponding industry of code illustrates the similarity cos θ of document：

So, using above-mentioned formula (2), it can obtain every business relevant documentation industry explanation corresponding with per class industry code The similarity of document.

In step S105, it is up to the second enterprise belonging to the business relevant documentation of default similarity and is defined as three new enterprises Industry, specifically include：If in the presence of at least a kind of industry code, make business relevant documentation industry explanation corresponding with such industry code Documents Similarity reaches default similarity, then the second enterprise belonging to the business relevant documentation is defined as into three new spectras.Specifically, If all about every business relevant documentation calculated in step S104 industry expository writing corresponding with per class industry code The similarity of shelves, which has, reaches default similarity, such as 0.7, then is defined as the second enterprise belonging to corresponding business relevant documentation Three new spectras.Due to that will be three new enterprises by the business relevant documentation and sign industry for the second enterprise that Keywords matching filters out The default industry code of the affiliated industry of industry correspondingly illustrates that document carries out Similarity Measure, then chooses similarity and reaches default similar The enterprise of degree understands higher as three new spectras by three new spectras determined by this Similarity Measure, the degree of accuracy.

To sum up, information processing method provided by the invention, when screening three new spectras, it is primarily based on and characterizes three new spectra institutes The default industry code for belonging to industry carries out first round screening, and business scope corresponding to the enterprise for then screening the first round is situated between The document that continues illustrates that the keyword corpus of document structure tree is closed with characterizing the corresponding industry of industry based on default industry code Keyword matches, and carries out the second wheel screening, finally by the business relevant documentation and the keyword of the enterprise obtained through the second wheel screening Corpus carries out Similarity Measure, and selection similarity reaches the enterprise of default similarity as three new spectras, so, by three-wheel Laddering screening, substantially increase the degree of accuracy that the enterprise filtered out is three new spectras.

Based on same inventive concept, the embodiment of the present invention additionally provides a kind of information processing system, by the system is solved Certainly the principle of problem is similar to aforementioned information processing method, therefore the implementation of the system may refer to the implementation of preceding method, weight Multiple part repeats no more.

A kind of information processing system provided in an embodiment of the present invention, as shown in Fig. 2 including：

First processing units 201, for determining that trade classification code meets default trade classification code from default enterprise The first enterprise；Wherein, it is the affiliated industry of three new spectras that the default trade classification code, which characterizes industry,；

Corpus generation unit 202, said for characterizing industry corresponding to industry based on the default trade classification code Plaintext shelves, generate three new spectra keyword corpus；

Second processing unit 203, for business scope corresponding to first enterprise to be introduced into document and the corpus Keyword match is carried out, filters out the second enterprise；

Similarity calculated 204, for crawling business relevant documentation corresponding to second enterprise, and it will crawl Business relevant documentation carries out Similarity Measure with the corpus；

3rd processing unit 205, it is defined as being up to the second enterprise belonging to the business relevant documentation of default similarity Three new spectras.

In one exemplary embodiment of the invention, the business relevant documentation includes following one or more documents Full text or fragment：Related product introduction, Related product operation instruction, software works, trade mark, patent.The related text of these business Shelves can carry out real-time network by the software kit such as the seleuim of python programming languages, bs4 reptile and crawl.

In one exemplary embodiment of the invention, the corpus generation unit 202 is based on the default industry point Category code characterizes industry corresponding to industry and illustrates document, generates three new spectra keyword corpus, specifically includes：For described Illustrate document per industry corresponding to class industry code in default trade classification code, it is single that the sector is illustrated into document is split into Word；For splitting obtained each word, the word frequency of the word is determined；Keyword is extracted using word frequency of the preset algorithm based on determination, Generate three new spectra keyword corpus.

In one exemplary embodiment of the invention, the similarity calculated 204 is related by the business crawled Document carries out Similarity Measure with the corpus, specifically includes：For the every business relevant documentation crawled, by the business Relevant documentation splits into single word；For splitting obtained each word, the word frequency of the word is determined；It will be torn open by the business relevant documentation The word got and corresponding word frequency, respectively with by the default trade classification code, per class industry code, corresponding industry is said Plaintext shelves split obtained word and corresponding word frequency carries out Similarity Measure.

In one exemplary embodiment of the invention, the 3rd processing unit 205 is up to the industry of default similarity Second enterprise belonging to business relevant documentation is defined as three new spectras, specifically includes：If in the presence of at least a kind of industry code, make the business Relevant documentation industry corresponding with such industry code illustrates that Documents Similarity reaches default similarity, then by the business relevant documentation Affiliated second enterprise is defined as three new spectras.

Finally it should be noted that：Embodiment described above, it is only the embodiment of the present invention, to illustrate the present invention Technical scheme, rather than its limitations, protection scope of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, it will be understood by those within the art that：Any one skilled in the art The invention discloses technical scope in, it can still modify to the technical scheme described in previous embodiment or can be light Change is readily conceivable that, or equivalent substitution is carried out to which part technical characteristic；And these modifications, change or replacement, do not make The essence of appropriate technical solution departs from the spirit and scope of technical scheme of the embodiment of the present invention.The protection in the present invention should all be covered Within the scope of.Therefore, protection scope of the present invention described should be defined by scope of the claims.

Claims

A kind of 1. information processing method, it is characterised in that including：

Determine that trade classification code meets the first enterprise of default trade classification code from default enterprise；Wherein, it is described default It is the affiliated industry of three new spectras that trade classification code, which characterizes industry,；

Industry corresponding to industry is characterized based on the default trade classification code and illustrates document, generates three new spectra key wordses Expect storehouse；

Business scope corresponding to first enterprise is introduced into document and carries out keyword match with the corpus, filters out second Enterprise；

Business relevant documentation corresponding to second enterprise is crawled, and the business relevant documentation crawled is entered with the corpus Row Similarity Measure；

It is up to the second enterprise belonging to the business relevant documentation of default similarity and is defined as three new spectras.
2. according to the method for claim 1, it is characterised in that the business relevant documentation includes following one or more texts The full text or fragment of shelves：Related product introduction, Related product operation instruction, software works, trade mark, patent.
3. method according to claim 1 or 2, it is characterised in that row is characterized based on the default trade classification code Industry corresponding to industry illustrates document, generates three new spectra keyword corpus, specifically includes：

For illustrating document per industry corresponding to class industry code in the default trade classification code, the sector is illustrated into document Split into single word；

For splitting obtained each word, the word frequency of the word is determined；

Keyword is extracted using word frequency of the preset algorithm based on determination, generates three new spectra keyword corpus.
4. method according to claim 1 or 2, it is characterised in that by the business relevant documentation crawled and the language material Storehouse carries out Similarity Measure, specifically includes：

For the every business relevant documentation crawled, the business relevant documentation is split into single word；

For splitting obtained each word, the word frequency of the word is determined；

Obtained word and corresponding word frequency will be split by the business relevant documentation, respectively with by the default trade classification code In correspond to industry per class industry code and illustrates that document splits obtained word and corresponding word frequency progress Similarity Measure.
5. according to the method for claim 4, it is characterised in that be up to belonging to the business relevant documentation of default similarity the Two enterprises are defined as three new spectras, specifically include：

If in the presence of at least a kind of industry code, business relevant documentation industry corresponding with such industry code is set to illustrate that document is similar Degree reaches default similarity, then the second enterprise belonging to the business relevant documentation is defined as into three new spectras.
A kind of 6. information processing system, it is characterised in that including：

First processing units, for determining that trade classification code meets the first enterprise of default trade classification code from default enterprise Industry；Wherein, it is the affiliated industry of three new spectras that the default trade classification code, which characterizes industry,；

Corpus generation unit, illustrate document for characterizing industry corresponding to industry based on the default trade classification code, Generate three new spectra keyword corpus；

Second processing unit, key is carried out with the corpus for business scope corresponding to first enterprise to be introduced into document Word matches, and filters out the second enterprise；

Similarity calculated, for crawling business relevant documentation corresponding to second enterprise, and the business phase that will be crawled Close document and carry out Similarity Measure with the corpus；

3rd processing unit, it is defined as three new enterprises for being up to the second enterprise belonging to the business relevant documentation of default similarity Industry.
7. system according to claim 6, it is characterised in that the business relevant documentation includes following one or more texts The full text or fragment of shelves：Related product introduction, Related product operation instruction, software works, trade mark, patent.
8. the system according to claim 6 or 7, it is characterised in that the corpus generation unit is based on the default row Industry Sort Code characterizes industry corresponding to industry and illustrates document, generates three new spectra keyword corpus, specifically includes：

For illustrating document per industry corresponding to class industry code in the default trade classification code, the sector is illustrated into document Split into single word；

For splitting obtained each word, the word frequency of the word is determined；

Keyword is extracted using word frequency of the preset algorithm based on determination, generates three new spectra keyword corpus.
9. the system according to claim 6 or 7, it is characterised in that the business that the similarity calculated will crawl Relevant documentation carries out Similarity Measure with the corpus, specifically includes：

For the every business relevant documentation crawled, the business relevant documentation is split into single word；

For splitting obtained each word, the word frequency of the word is determined；

Obtained word and corresponding word frequency will be split by the business relevant documentation, respectively with by the default trade classification code In correspond to industry per class industry code and illustrates that document splits obtained word and corresponding word frequency progress Similarity Measure.
10. system according to claim 9, it is characterised in that the 3rd processing unit is up to default similarity The second enterprise is defined as three new spectras belonging to business relevant documentation, specifically includes：

If in the presence of at least a kind of industry code, business relevant documentation industry corresponding with such industry code is set to illustrate that document is similar Degree reaches default similarity, then the second enterprise belonging to the business relevant documentation is defined as into three new spectras.