CN107103094A

CN107103094A - Data among enterprises incidence relation method for catching and its system based on mass data

Info

Publication number: CN107103094A
Application number: CN201710353297.3A
Authority: CN
Inventors: 李小强
Original assignee: Qianhai Sycamore (shenzhen) Data Co Ltd
Current assignee: Qianhai Sycamore (shenzhen) Data Co Ltd
Priority date: 2017-05-18
Filing date: 2017-05-18
Publication date: 2017-08-29

Abstract

The present invention relates to the data among enterprises incidence relation method for catching based on mass data and its system, this method includes obtaining magnanimity enterprise related data；Magnanimity enterprise related data is accumulated, basis of formation data；Magnanimity enterprise related data to acquisition is handled, and forms processing data；According to processing data and basic data, training set data storehouse is obtained；New data is handled using training set data storehouse, data among enterprises incidence relation is obtained.The present invention carries out data processing using big data technology, ensure the safety storage of mass data, ensure mass data distributed treatment, efficiency high, the problem of storage and calculating of mass data being solved with big data technology humanized and based on Distributed Parallel Computing framework, use the theory of machine learning and natural language processing, machine intelligence is allowed to handle company-related information, realize the degree of accuracy for improving and catching, and from mass data, the data effective to enterprise carry out auto-associating and automatic classification, and recognition efficiency is high.

Description

Data among enterprises incidence relation method for catching and its system based on mass data

Technical field

The present invention relates to data processing, more specifically refer to that the data among enterprises incidence relation based on mass data is caught Method and its system.

Background technology

With the development of science and technology, increasing enterprise carries out a surname of itself enterprise in the form of data are announced in internet Investee is passed or finds, therefore, the data of enterprise on the internet are more and more, and the enterprise database on internet is more next It is huger.

The number between enterprise is found during publicizing or finding investee, it is necessary to from the mass data on internet According to incidence relation, in this, as location condition, be accurately positioned the required enterprise looked for.But, number between current searching enterprise Can only be by artificial screening and analysis according to incidence relation, this causes to be difficult to carry out enterprise comprehensive analysis and enterprise's holography Portrait, and artificial screening and analysis efficiency are low, and accuracy rate is also low.

Chinese patent 201510810811.2 provides one kind and identical principal and subordinate is retrieved under big data based on relational database The algorithm of relation data, is a kind of algorithm of progress comparing in mass data, using " changing small, point behind first face greatly ", profit Comparing scope is progressively reduced with packet traverses, middle table storage scheduling algorithm, efficient retrieval goes out identical record.Foregoing invention For magnanimity host-guest architecture data in business data, the need for quick-searching goes out the methods of identical recordings suitable for enterprise managing The various situations of identical host-guest architecture data are retrieved, strengthen the management and control ability of enterprise, is that more preferable market environment is built by enterprise, carries High enterprise competitiveness.

Above-mentioned patent uses the method that quick-searching goes out identical recordings, and this mode can only find similar note Record, the degree of accuracy is not high.

Therefore, it is necessary to design a kind of data among enterprises incidence relation method for catching based on mass data, realize and improve The degree of accuracy of seizure, and from mass data, the data effective to enterprise carry out auto-associating and automatic classification, efficiency high.

The content of the invention

Associate and close there is provided the data among enterprises based on mass data it is an object of the invention to the defect for overcoming prior art It is method for catching and its system.

To achieve the above object, the present invention uses following technical scheme：Data among enterprises association based on mass data is closed It is method for catching, methods described includes：

Obtain magnanimity enterprise related data；

Magnanimity enterprise related data is accumulated, basis of formation data；

Magnanimity enterprise related data to acquisition is handled, and forms processing data；

According to processing data and basic data, training set data storehouse is obtained；

New data is handled using training set data storehouse, data among enterprises incidence relation is obtained.

Its further technical scheme is：Magnanimity enterprise related data is accumulated, the step of basis of formation data, including Step in detail below：

Magnanimity enterprise related data is regularly updated；

Magnanimity enterprise related data is excavated and classified, basic database is set up；

Magnanimity enterprise related data is stored in the basic database；

Obtain the basic data in the basic database.

Its further technical scheme is：Magnanimity enterprise related data to acquisition is handled, and forms processing data The step of, including step in detail below：

The magnanimity enterprise related data of acquisition is cleaned, sorted out, summary is extracted and extracts keyword；

Index is set up to the summary and keyword；

Described information, summary and keyword are classified, classification results are obtained；

Real-time matching and statistics are carried out to classification results, processing data is formed.

Its further technical scheme is：According to processing data and basic data, the step of obtaining training set data storehouse, bag Include step in detail below：

According to processing data and basic data, training set is made；

Investigation is sampled to processing data and is adjusted；

Processing data after adjustment is stored to training set；

Training set is trained；

Improvement is trained using weights, training set data storehouse is formed.

Its further technical scheme is：New data is handled using training set data storehouse, data among enterprises is obtained and closes The step of connection relation, including step in detail below：

Training set data storehouse is trained using training set data, acquisition uses model；

New data is classified and predicted using using model, data among enterprises incidence relation is obtained.

Present invention also offers based on mass data data among enterprises incidence relation catch system, including acquiring unit, Basic data formation unit, processing data formation unit, database acquiring unit and Relation acquisition unit；

The acquiring unit, for obtaining magnanimity enterprise related data；

The basic data formation unit, for being accumulated to magnanimity enterprise related data, basis of formation data；

The processing data formation unit, is handled for the magnanimity enterprise related data to acquisition, at formation Manage data；

The database acquiring unit, for according to processing data and basic data, obtaining training set data storehouse；

The Relation acquisition unit, for being handled using training set data storehouse new data, obtains data among enterprises Incidence relation.

Its further technical scheme is：The basic data formation unit includes update module, Database module, deposited Store up module and basic data acquisition module；

The update module, for being regularly updated to magnanimity enterprise related data；

The Database module, for magnanimity enterprise related data to be excavated and classified, sets up base Plinth database；

The memory module, for storing magnanimity enterprise related data in the basic database；

The basic data acquisition module, for obtaining the basic data in the basic database.

Its further technical scheme is：The processing data formation unit includes processing module, index and sets up module, classification Module and matching statistical module；

The processing module, cleaned, sorted out for the magnanimity enterprise related data to acquisition, extract summary with And extract keyword；

The index sets up module, is indexed for being set up to the summary and keyword；

The sort module, for classifying to described information, summary and keyword, obtains classification results；

The matching statistical module, for carrying out real-time matching and statistics to classification results, forms processing data.

Its further technical scheme is：The database acquiring unit includes training set formation module, adjusting module, processing Data memory module, training module and improvement module；

The training set formation module, for according to processing data and basic data, making training set；

The adjusting module, for being sampled investigation to processing data and adjusting；

The processing data memory module, for the processing data after adjustment to be stored to training set；

The training module, for being trained to training set；

The improvement module, for being trained improvement using weights, forms training set data storehouse.

Its further technical scheme is：The Relation acquisition unit includes model acquisition module and classification prediction module；

The model acquisition module, for being trained using training set data to training set data storehouse, acquisition uses mould Type；

The classification prediction module, for using new data is classified and predicted using model, obtaining number between enterprise According to incidence relation.

Compared with the prior art, the invention has the advantages that：The association of the data among enterprises based on mass data of the present invention Relation method for catching, by gathering enterprise's related data of magnanimity, procurement cost is low, is carried out using big data technology at data Reason, it is ensured that the safety storage of mass data, it is ensured that mass data distributed treatment, efficiency high, the degree of accuracy is with the accumulation of data Constantly lifting, the storage and calculating of mass data are solved with big data technology humanized and based on Distributed Parallel Computing framework Problem, using the theory of machine learning and natural language processing, allows machine intelligence to handle company-related information, is made a summary, returned Class and extraction, realize the degree of accuracy for improving and catching, and from mass data, the data effective to enterprise carry out auto-associating with And automatic classification, recognition efficiency height.

The invention will be further described with specific embodiment below in conjunction with the accompanying drawings.

Brief description of the drawings

The data among enterprises incidence relation method for catching based on mass data that Fig. 1 provides for the specific embodiment of the invention Flow chart；

The particular flow sheet for the basis of formation data that Fig. 2 provides for the specific embodiment of the invention；

The particular flow sheet for the formation processing data that Fig. 3 provides for the specific embodiment of the invention；

The particular flow sheet in the acquisition training set data storehouse that Fig. 4 provides for the specific embodiment of the invention；

The particular flow sheet for the acquisition data among enterprises incidence relation that Fig. 5 provides for the specific embodiment of the invention；

Fig. 6 catches system for the data among enterprises incidence relation based on mass data that the specific embodiment of the invention is provided Structured flowchart；

The structured flowchart for the basic data formation unit that Fig. 7 provides for the specific embodiment of the invention；

The structured flowchart for the processing data formation unit that Fig. 8 provides for the specific embodiment of the invention；

The structured flowchart for the database acquiring unit that Fig. 9 provides for the specific embodiment of the invention；

The structured flowchart for the Relation acquisition unit that Figure 10 provides for the specific embodiment of the invention.

Embodiment

In order to more fully understand the technology contents of the present invention, technical scheme is entered with reference to specific embodiment One step introduction and explanation, but it is not limited to this.

Specific embodiment as shown in Fig. 1~10, the association of the data among enterprises based on mass data that the present embodiment is provided Relation method for catching, can be used in the publicity of enterprise or find investee's process, realize the degree of accuracy for improving and catching, and From mass data, the data effective to enterprise carry out auto-associating and automatic classification, efficiency high.

As shown in figure 1, be the data among enterprises incidence relation method for catching based on mass data that the present embodiment is provided, should Method includes：

S1, acquisition magnanimity enterprise related data；

S2, magnanimity enterprise related data is accumulated, basis of formation data；

S3, the magnanimity enterprise related data to acquisition are handled, and form processing data；

S4, according to processing data and basic data, obtain training set data storehouse；

S5, using training set data storehouse new data is handled, obtain data among enterprises incidence relation.

For S1 steps, the step of obtaining magnanimity enterprise's related data specifically crawls technology, daily from mutual using data Enterprise's related data is gathered and crawled in networking.

Further, above-mentioned S2 steps, are accumulated, the step of basis of formation data to magnanimity enterprise related data Suddenly, including in detail below step：

S21, magnanimity enterprise related data is regularly updated；

S22, magnanimity enterprise related data is excavated and classified, set up basic database；

S23, storage magnanimity enterprise related data are in the basic database；

S24, the basic data obtained in the basic database.

For above-mentioned S21 steps, magnanimity enterprise related data is regularly updated, plays a part of accumulating data.

For above-mentioned S22 steps, specifically using machine learning techniques, by magnanimity enterprise related data on internet Excavated and classified, basic data database is set up with this.

For above-mentioned S23 steps, specifically using big data HDFS technology distributions formula storage magnanimity enterprise related data.

Basic data in above-mentioned S24 steps, basic database be accumulated by magnanimity enterprise related data and Data after processing.

Further, above-mentioned S3 steps, the magnanimity enterprise related data to acquisition is handled, formation processing The step of data, including step in detail below：

S31, the magnanimity enterprise related data to acquisition are cleaned, sorted out, extracted summary and extract keyword；

S32, to it is described summary and keyword set up index；

S33, described information, summary and keyword are classified, obtain classification results；

S34, real-time matching and statistics are carried out to classification results, form processing data.

Above-mentioned S31 steps, are specifically the theory and technology based on natural language processing, the magnanimity enterprise returned to collection Related data is cleaned, sorted out, extracted summary and extract keyword.

It is specifically to the theory using natural language processing and the summary after technical finesse and pass for above-mentioned S32 steps Key word, sets up index.

Above-mentioned S33 steps, are specifically used using K arest neighbors (k-Nearest Neighbor, KNN) classification to upper Information, summary and the keyword stated are classified, and obtain classification results.

Above-mentioned S34 steps, specifically use carry out real-time matching and system to classification results using big data SPARK Meter, processing data is formed with this.

Above-mentioned S1 steps are all based on the big data technology of maturation to the magnanimity that is got from internet to S3 steps Enterprise's related data is handled, it is ensured that the safety storage of mass data, it is ensured that mass data distributed treatment, efficiency high is accurate Exactness is constantly lifted with the accumulation of data.And with big data technology humanized, based on Distributed Parallel Computing framework, solve sea The problem of measuring the storage and calculating of data, using the theory of machine learning and natural language processing, allows machine intelligence processing to look forward to Industry relevant information, is made a summary, sorted out and is extracted.Collected and handled based on internet public information, in the absence of sensitivity letter Breath, data acquisition cost is relatively low.

Further, above-mentioned S4 steps, according to processing data and basic data, obtain the step in training set data storehouse Suddenly, including in detail below step：

S41, according to processing data and basic data, make training set；

S42, to processing data be sampled investigation and adjust；

S43, the processing data after adjustment stored to training set；

S44, training set is trained；

S45, it is trained improvement using weights, forms training set data storehouse.

Above-mentioned S41 steps, utilize the processing number after the basic data in basic database and classification, matching and statistics According to being integrated and being matched, in this, as training set, with the incidence relation of clear and definite basic data and processing data, it is easy to catch new The incidence relation of data.

Above-mentioned S42 steps, primarily to the degree of accuracy of incidence relation between basic data and processing data is improved, because This is, it is necessary to which artificial be sampled investigation to processing data and adjust, to ensure the degree of accuracy of processing data, so that it is guaranteed that association is closed The degree of accuracy of system.

Above-mentioned S43 steps, primarily to corrigendum training set in processing data, using the processing data after adjustment as Standard, is integrated with basic data, forms the higher data correlation relation of the degree of accuracy.

For above-mentioned S45 steps, with adding up for data, improvement, the side of weights are trained by the way of weights Formula is mainly and the sample is big apart from small neighbours' weights.Specifically, weights setting is too small can reduce nicety of grading, if setting It is excessive, and test sample belongs in training set comprising the less class of data, then can increase noise, reduce classifying quality.Therefore, weigh Value will set appropriate, can just improve the degree of accuracy of data among enterprises incidence relation seizure, generally, and the setting of K values is using intersection The mode (on the basis of K=1) of inspection, empirical rule：K is generally below the square root of number of training.

Further, above-mentioned S5 steps, are handled new data using training set data storehouse, obtain number between enterprise The step of according to incidence relation, including step in detail below：

S51, using training set data training set data storehouse is trained, acquisition uses model；

S52, using using model new data is classified and predicted, obtain data among enterprises incidence relation.

Above-mentioned S51 steps, are trained to training set data storehouse, are conducive to improving the validity in training set data storehouse, The degree of accuracy of data among enterprises incidence relation seizure is improved with this.

For above-mentioned S52 steps, using the training set data storehouse after training as model is used, using use model to new Data are classified and predicted, obtain data among enterprises incidence relation, so that automatic classification is realized, while tiring out with data volume Product, accuracy rate more and more higher.

Above-mentioned S51 steps can refer to following embodiments to S52 steps：

Training set is updated in KNN models by #；

Clf=KNeighborsClassifier (n_neighbors=3)；

clf.fit(X_train,y_train)；

# weighs model accuracy using test set；

clf.score(X_test,y_test)；

# sets new data；

New_data=np.array ([[5000,40000]])；

# carries out classification prediction to new data；

clf.predict(new_data)。

For example described above, the data in training set data storehouse are as shown in the table：

Period	Data	Data	Data	Classification
					1	1.0	2.0	3.0	1
2	1.0	2.1	3.1	1
					3	0.9	2.2	2.9	1
4	3.4	6.7	8.9	2
					5	3.0	7.0	8.7	2
6	3.3	6.9	8.8	2
					7	2.5	3.3	10.0	3
8	2.4	2.9	8.0	3

New data is as shown in the table：

Period	Data	Data	Data	Classification
					1	2.1	5.5	7.2	0
2	1.1	2.5	4.2	0
					3	4.1	3.5	9.2	0

Sorted new data is as shown in the table：

Period	Data	Data	Data	Classification
					1	1.1	2.5	4..2	1
2	2.1	5.5	7.2	2
					3	4.1	3.5	9.2	3

The above-mentioned data among enterprises incidence relation method for catching based on mass data is related by the enterprise for gathering magnanimity Data, procurement cost is low, and data processing is carried out using big data technology, it is ensured that the safety storage of mass data, it is ensured that magnanimity number According to distributed treatment, efficiency high, the degree of accuracy is constantly lifted with the accumulation of data, with big data technology humanized and based on distribution The problem of formula parallel computation framework solves the storage and calculating of mass data, uses the reason of machine learning and natural language processing By, allow machine intelligence to handle company-related information, made a summary, sorted out and extracted, the degree of accuracy that realization raising is caught, and from In mass data, the data effective to enterprise carry out auto-associating and automatic classification, and recognition efficiency is high.

As shown in fig. 6, be that the data among enterprises incidence relation based on mass data that the present embodiment is provided catches system, its Including acquiring unit 1, basic data formation unit 2, processing data formation unit 3, database acquiring unit 4 and Relation acquisition Unit 5.

Acquiring unit 1, for obtaining magnanimity enterprise related data.

Basic data formation unit 2, for being accumulated to magnanimity enterprise related data, basis of formation data.

Processing data formation unit 3, is handled for the magnanimity enterprise related data to acquisition, forms processing number According to.

Database acquiring unit 4, for according to processing data and basic data, obtaining training set data storehouse.

Relation acquisition unit 5, for being handled using training set data storehouse new data, obtains data among enterprises association Relation.

Acquiring unit 1 is specifically to crawl technology using data, gathers and crawl daily enterprise's related data from internet.

Further, basic data formation unit 2 includes update module 21, Database module 22, memory module 23 and basic data acquisition module 24.

Update module 21, for being regularly updated to magnanimity enterprise related data.

Database module 22, for magnanimity enterprise related data to be excavated and classified, sets up basis Database.

Memory module 23, for storing magnanimity enterprise related data in the basic database.

Basic data acquisition module 24, for obtaining the basic data in the basic database.

Update module 21 is regularly updated to magnanimity enterprise related data, plays a part of accumulating data

Database module 22 is specifically to use machine learning techniques, by magnanimity enterprise related data on internet Excavated and classified, basic data database is set up with this.

Memory module 23 is specifically using big data HDFS technology distributions formula storage magnanimity enterprise related data.

Basic data in above-mentioned basic database is after magnanimity enterprise related data is accumulated and handled Data.

Further, processing data formation unit 3 includes processing module 31, indexes and set up module 32, sort module 33 And matching statistical module 34.

Processing module 31, cleaned, sorted out for the magnanimity enterprise related data to acquisition, extract summary and Extract keyword.

Index sets up module 32, is indexed for being set up to the summary and keyword.

Sort module 33, for classifying to described information, summary and keyword, obtains classification results.

Statistical module 34 is matched, for carrying out real-time matching and statistics to classification results, processing data is formed.

Processing module 31 is specifically the theory and technology based on natural language processing, and the magnanimity enterprise returned to collection is related Data are cleaned, sorted out, extracted summary and extract keyword.

It is specifically to the theory using natural language processing and the summary and key after technical finesse that index, which sets up module 32, Word, sets up index.

Sort module 33 is specifically used using K arest neighbors (k-Nearest Neighbor, KNN) classification to above-mentioned Information, summary and keyword are classified, and obtain classification results.

The specific carry out real-time matching and system used using big data SPARK to classification results of matching statistical module 34 Meter, processing data is formed with this.

Above-mentioned acquiring unit 1, basic data formation unit 2 and reason data formation unit is all based on the big number of maturation The magnanimity enterprise related data got from internet is handled according to technology, it is ensured that the safety storage of mass data, protected Mass data distributed treatment is demonstrate,proved, efficiency high, the degree of accuracy is constantly lifted with the accumulation of data.And driven with big data technology Dynamic, based on Distributed Parallel Computing framework, the problem of solving the storage and calculating of mass data uses machine learning and nature The theory of Language Processing, allows machine intelligence to handle company-related information, is made a summary, sorted out and extracted.Based on internet Public information is collected and handled, and in the absence of sensitive information, data acquisition cost is relatively low.

In addition, database acquiring unit 4 includes training set formation module 41, adjusting module 42, processing data memory module 43rd, training module 44 and improvement module 45.

Training set formation module 41, for according to processing data and basic data, making training set.

Adjusting module 42, for being sampled investigation to processing data and adjusting.

Processing data memory module 43, for the processing data after adjustment to be stored to training set.

Training module 44, for being trained to training set.

Module 45 is improved, for being trained improvement using weights, training set data storehouse is formed.

After above-mentioned training set formation module 41 is using the basic data in basic database and classification, matching and statistics Processing data integrated and matched, in this, as training set, with the incidence relation of clear and definite basic data and processing data, just In the incidence relation for catching new data.

Adjusting module 42 is primarily to improve the degree of accuracy of incidence relation between basic data and processing data, therefore, Need manually to be sampled processing data investigation and adjust, to ensure the degree of accuracy of processing data, so that it is guaranteed that incidence relation The degree of accuracy.

Processing data memory module 43 is primarily to correct the processing data in training set, with the processing data after adjustment It is defined, is integrated with basic data, forms the higher data correlation relation of the degree of accuracy.

With adding up for data, improve module 45 and improvement is trained by the way of weights, the mode of weights is mainly It is big apart from small neighbours' weights with the sample.Specifically, weights setting is too small can reduce nicety of grading, if setting is excessive, and Test sample belongs in training set comprising the class that data are less, then can increase noise, reduces classifying quality.Therefore, weights will be set Put appropriate, can just improve the degree of accuracy of data among enterprises incidence relation seizure, generally, the setting of K values is using crosscheck Mode (on the basis of K=1), empirical rule：K is generally below the square root of number of training.

Further, Relation acquisition unit 5 includes model acquisition module 51 and classification prediction module 52.

Model acquisition module 51, for being trained using training set data to training set data storehouse, acquisition uses model.

Classification prediction module 52, for using new data is classified and predicted using model, obtaining data among enterprises Incidence relation.

Model acquisition module 51 is trained to training set data storehouse, is conducive to improving the validity in training set data storehouse, The degree of accuracy of data among enterprises incidence relation seizure is improved with this.

Training set data storehouse of the prediction module 52 of classifying using after training is as model is used, using use model to new data Classified and predicted, obtain data among enterprises incidence relation, so that automatic classification is realized, while with the accumulation of data volume, Accuracy rate more and more higher.

Above-mentioned model acquisition module 51 and the course of work for prediction module 52 of classifying, can refer to following embodiments：

Training set is updated in KNN models by #；

Clf=KNeighborsClassifier (n_neighbors=3)；

clf.fit(X_train,y_train)；

# weighs model accuracy using test set；

clf.score(X_test,y_test)；

# sets new data；

New_data=np.array ([[5000,40000]])；

# carries out classification prediction to new data；

clf.predict(new_data)。

The above-mentioned data among enterprises incidence relation based on mass data catches system, related by the enterprise for gathering magnanimity Data, procurement cost is low, and data processing is carried out using big data technology, it is ensured that the safety storage of mass data, it is ensured that magnanimity number According to distributed treatment, efficiency high, the degree of accuracy is constantly lifted with the accumulation of data, with big data technology humanized and based on distribution The problem of formula parallel computation framework solves the storage and calculating of mass data, uses the reason of machine learning and natural language processing By, allow machine intelligence to handle company-related information, made a summary, sorted out and extracted, the degree of accuracy that realization raising is caught, and from In mass data, the data effective to enterprise carry out auto-associating and automatic classification, and recognition efficiency is high.

The above-mentioned technology contents that the present invention is only further illustrated with embodiment, in order to which reader is easier to understand, but not Represent embodiments of the present invention and be only limitted to this, any technology done according to the present invention extends or recreated, by the present invention's Protection.Protection scope of the present invention is defined by claims.

Claims

1. the data among enterprises incidence relation method for catching based on mass data, it is characterised in that methods described includes：

Obtain magnanimity enterprise related data；

Magnanimity enterprise related data is accumulated, basis of formation data；

2. the data among enterprises incidence relation method for catching according to claim 1 based on mass data, it is characterised in that Magnanimity enterprise related data is accumulated, the step of basis of formation data, including step in detail below：

Magnanimity enterprise related data is regularly updated；

Magnanimity enterprise related data is stored in the basic database；

Obtain the basic data in the basic database.

3. the data among enterprises incidence relation method for catching according to claim 1 or 2 based on mass data, its feature exists In the magnanimity enterprise related data to acquisition is handled, the step of forming processing data, including step in detail below：

Index is set up to the summary and keyword；

4. the data among enterprises incidence relation method for catching according to claim 3 based on mass data, it is characterised in that According to processing data and basic data, the step of obtaining training set data storehouse, including step in detail below：

According to processing data and basic data, training set is made；

Investigation is sampled to processing data and is adjusted；

Processing data after adjustment is stored to training set；

Training set is trained；

Improvement is trained using weights, training set data storehouse is formed.

5. the data among enterprises incidence relation method for catching according to claim 4 based on mass data, it is characterised in that New data is handled using training set data storehouse, the step of obtaining data among enterprises incidence relation, including walked in detail below Suddenly：

6. the data among enterprises incidence relation based on mass data catches system, it is characterised in that including acquiring unit, basic number Unit, database acquiring unit and Relation acquisition unit are formed according to unit, processing data is formed；

The acquiring unit, for obtaining magnanimity enterprise related data；

The processing data formation unit, is handled for the magnanimity enterprise related data to acquisition, forms processing number According to；

The Relation acquisition unit, for being handled using training set data storehouse new data, obtains data among enterprises association Relation.

7. the data among enterprises incidence relation according to claim 6 based on mass data catches system, it is characterised in that The basic data formation unit includes update module, Database module, memory module and basic data acquisition module；

The Database module, for magnanimity enterprise related data to be excavated and classified, sets up basic number According to storehouse；

8. the data among enterprises incidence relation according to claim 7 based on mass data catches system, it is characterised in that The processing data formation unit includes processing module, index and sets up module, sort module and matching statistical module；

The processing module, is cleaned for the magnanimity enterprise related data to acquisition, is sorted out, extracted summary and carry Take keyword；

9. the data among enterprises incidence relation according to claim 8 based on mass data catches system, it is characterised in that The database acquiring unit includes training set formation module, adjusting module, processing data memory module, training module and changed Enter module；

The training module, for being trained to training set；

10. the data among enterprises incidence relation according to claim 9 based on mass data catches system, its feature exists In the Relation acquisition unit includes model acquisition module and classification prediction module；

The model acquisition module, for being trained using training set data to training set data storehouse, acquisition uses model；

The classification prediction module, for using new data is classified and predicted using model, obtaining data among enterprises and closing Connection relation.