CN107657057A - A kind of enterprise's reference information fusion graphic method - Google Patents

A kind of enterprise's reference information fusion graphic method Download PDF

Info

Publication number
CN107657057A
CN107657057A CN201710977078.2A CN201710977078A CN107657057A CN 107657057 A CN107657057 A CN 107657057A CN 201710977078 A CN201710977078 A CN 201710977078A CN 107657057 A CN107657057 A CN 107657057A
Authority
CN
China
Prior art keywords
enterprise
data
collage
reference information
credit data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710977078.2A
Other languages
Chinese (zh)
Inventor
王云丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute Of Applied Mathematics Hebei Academy Of Sciences
Original Assignee
Institute Of Applied Mathematics Hebei Academy Of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute Of Applied Mathematics Hebei Academy Of Sciences filed Critical Institute Of Applied Mathematics Hebei Academy Of Sciences
Priority to CN201710977078.2A priority Critical patent/CN107657057A/en
Publication of CN107657057A publication Critical patent/CN107657057A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention provides a kind of enterprise's reference information fusion graphic method, this method constructs:Three formal storehouse, renewal storehouse, history library databases, and comprise the following steps:S1:Enterprise's reference information gathering step:Enterprise name is first depending on, then gathers the reference information of Target Enterprise on the internet;S2:Enterprise's collage-credit data processing:Reference information data is cleaned, is changed, formatted and stored;S3:Enterprise's collage-credit data analysis:Relationship analysis is associated to enterprise's collage-credit data;S4:Enterprise's collage-credit data retrieval, the graphic exhibition of incidence relation.Enterprise's reference information fusion graphic method of the present invention has broken the information island of existing enterprise's credit investigation system, the public data of internet is merged, thorough search and statistics enterprise's reference and graphic software platform select the related information of enterprise, reduce inquiry cost, save the time.

Description

A kind of enterprise's reference information fusion graphic method
Technical field
The present invention relates to internet big data field, more particularly to a kind of enterprise's reference information fusion graphic method.
Background technology
Enterprise's reference refers to the information that disclosed Data Collection is related to the enterprise from internet, such as:Name of judicial person, stock The administrative penalty information of eastern list, proportion of providing funds, legal representative's name, senior executive, and enterprise, jurisdictional information, knowledge production Information etc. is weighed, and arranges, preserve, processing these credit informations, excavates the associated data value of public information.
Present enterprise's reference, it is in industry field mostly, such as enterprise's reference of banking system, the enterprise of industrial and commercial system levies Letter, enterprise's reference of justice system, enterprise's collage-credit data of these industry fields, the longitudinal data of industry are all laid particular emphasis on, these Data message all rests in the administrative department of industry field, forms information island one by one, mutual related information can not It is directly perceived visible.
For general user, they see enterprise's collage-credit data of industry field, are strip and blocks, it is impossible to intuitively It was found that incidence relation.If it is desired to the more comprehensive management state information of certain enterprise is obtained, it is necessary to spend more time to go to collect, And manual analysis and processing and sorting are needed, to find incidence relation between some enterprises.
The content of the invention
In order to solve the problems, such as that existing enterprise's collage-credit data can not facilitate thorough search, the invention provides a kind of enterprise sign Letter information merges graphic method, and to break information island, information data disclosed in fusion, graphical representation selectes enterprise Incidence relation, enterprise's " true features " is seen clearly, so as to reduce cost of labor.
Concrete technical scheme of the present invention includes:
A kind of enterprise's reference information fusion graphic method, this method construct:Formal storehouse, renewal storehouse, history library three Database, and this method comprises the following steps:
S1:Enterprise's reference information gathering step:It is first depending on enterprise name, then gathers Target Enterprise on the internet Reference information;
S2:Enterprise's collage-credit data processing:Reference information data is cleaned, is changed, formatted and stored;S3:Enterprise Collage-credit data is analyzed:Relationship analysis is associated to enterprise's collage-credit data;
S4:Enterprise's collage-credit data retrieval, the graphic exhibition of incidence relation.
Further, enterprise's reference information gathering step also includes step in the step S1:
S1.1:Business directory obtains;
S1.2:Collage-credit data is gathered based on enterprise name;
S1.3:Enterprise's collage-credit data storage, enters formal storehouse.
Further, business directory obtaining step also includes step in the step S1.1:
S1.1.1:Analysis the administration for industry and commerce accepts the coding standard rule of enterprises registration;
S1.1.2:Determine enterprise name data source --- the administration for industry and commerce's related web site;
S1.1.3:Write corresponding data and crawl program in machine code;
S1.1.4:Obtain enterprise name.
It is of course also possible to use the mode for downloading business directory Yellow Page obtains, and still, generally business directory Yellow Page Data renewal be not in time.
Further, step is also included based on enterprise name collection collage-credit data step in the step S1.2:
S1.2.1:Determine the dimension of enterprise's collage-credit data collection;
S1.2.2:Analyze URL, data format of targeted website etc.;
S1.2.3:Write using Python and perform crawlers;
S1.2.4:Obtain the webpage of enterprise's collage-credit data.
Further, the step S1.2.3 writes and performs crawlers, using Agent IP pond, will be climbed by Flume The webpage taken, which pushes away rapidly, to be put into Hadoop.For different targeted websites, rules for grasping is set, writes and crawls mould accordingly Plate.The access frequency adjust automatically of IP address can be realized by crawling program, to adapt to the anti-monitoring strategies of other side.Meanwhile program Also exception processing module, manually solve crawl in exception (webpage correcting, server delay machine, change Agent IP).
Further, enterprise's collage-credit data processing step also includes step in the step S2:
S2.1:Enterprise's collage-credit data extracts;
S2.2:Enterprise's collage-credit data cleaning;
S2.3:Enterprise's collage-credit data conversion;
S2.4:Enterprise's collage-credit data formats.
Further, in the step S2.2 enterprise's collage-credit data cleaning, including:Remove the data of repetition and invalid Data;The data of incompleteness are found out, data is rearranged and crawls;The consistency check verifications of data, unstructured database table Link.
Further, in the step S2.4 after enterprise's collage-credit data cleaning conversion, the specification of data representation form is carried out Change, standardization, make the information of each dimension have enterprise's exclusive identification code.It is meanwhile (formal in the data loading of each dimension Storehouse) when do time-labeling, be easy to data to update.Data format meets the relevant credit standard specification of country.
Further, enterprise's collage-credit data analytical procedure also includes step in the step S3:
S3.1:Enterprise's collage-credit data integrates;
S3.2:The investment relation analysis and the analysis of history investment relation of enterprise.
Further, enterprise's collage-credit data after cleaning, conversion, formatting, carries out data integration in the step S3.1, Collage-credit data by enterprise's exclusive identification code the enterprise of multiple dimensions, associates and is stored in Hadoop Hive, and Marked for enterprise's collage-credit data of self-employed entrepreneur's (nearly 50% ratio of total amount being accounted for, almost without the value of data analysis) Note, is not involved in data analysis.
Further, the investment relation analysis of enterprise is to register " stock according to the administration for industry and commerce's enterprises registration in step S3.2 East and investment information " carrys out the equity information with the investment enterprise of the association analysis enterprise investment.History investment relation is the discovery that, According to the administration for industry and commerce " change record " described in situation, extraction before changing with history shareholder's information after change, Huo Zheyi Extracted according to the data of history library.
Further, in step S4, included by the keyword of input:Enterprise name, legal representative's name, senior executive's surname Name, shareholder's Name or Designation, full-text search inquiry, the collage-credit data of searched targets enterprise various dimensions can be carried out by Solr:Work Business's information, jurisdictional information, administrative penalty information, intellectual property information, recruitment information, news public sentiment, portal website etc..Make With d3.js, by integrated data, and the result data of data correlation relation analysis, generation association collection of illustrative plates shows (Fig. 5).
According to the present invention, the renewal of data in step S1 (our definition steps are S1-b):In order to ensure the real-time of data Property, it is necessary to data update.The renewal of data is divided into batch data and regularly updated and individual data real-time update.S1-b data update The step of also include step:
S1-b.1:The page needed for being downloaded by data reptile;
S1-b.2:The data item of needs is extracted, is put in storage in (renewal storehouse);
S1-b.3:Compare the content in renewal storehouse and formal storehouse;
S1-b.4:Judge whether content needs to update, if need not update, then, it is only necessary to update formal storehouse when Between mark;If necessary to update, then,
S1-b.5:Judge to need the content that updates whether shareholder/senior executive/enterprise name, if it is then entering formal storehouse Meanwhile also enter the storage of history library increment.When entering formal storehouse, raw content is deleted, inserts new content, while renewal time marks.Such as The content of fruit renewal is not related to shareholder, senior executive, enterprise name, then, (formal storehouse) only is put in storage, raw content is deleted, inserts new Content, meanwhile, renewal time mark.In summary, data acquisition of the invention, the framework of renewal are to construct three data Storehouse:Formal storehouse, renewal storehouse, history library.Wherein, formal storehouse, mainly store first data acquisition and/or more new firms sign Letter information;Storehouse is updated, the setting in the storehouse, plays transfer, stores periodically or non-periodically more freshly harvested enterprise's reference information, mesh Be for the comparing with formal storehouse.The write-in in renewal storehouse and formal storehouse is all first to delete to be inserted into.But data update During the information write-in renewal storehouse of collection, it is not all of extracting, but according to the key message of some dimension, such as date and time information, Extract into renewal storehouse, so do, simply play terminal, comparison effect.History library, the setting in the storehouse, it is one enterprise of storage The most important transition information of industry:Such as data update when shareholder, senior executive, the information of enterprise name, increment storage, be only inserted, Covering is not deleted.That is, formal storehouse is mainly used in storing enterprise's reference information of first and/or renewal data acquisition (or other formal information);And update storehouse and play transfer, store periodically or non-periodically more freshly harvested enterprise's reference information;Go through Shi Ku is used to store the most important transition information of certain enterprise.
In addition, according to the unique code collection of enterprise into incidence relation data, be to be showed in the form of collection of illustrative plates.Due to the exhibition of collection of illustrative plates It is existing, many device resources need to be taken, it is therefore advantageous to, the present invention takes the pattern that backstage generates, in data periodically or non-periodically AutoBackground updates during renewal.
It is of the invention to be using the beneficial effect of above technical scheme:Show public data of the enterprise in internet, Quan Miancha Enterprise's reference information is ask and counts, and graphic software platform selectes the related information of enterprise, reduces inquiry cost, saves the time.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by embodiment it is required use it is attached Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 is a kind of enterprise's reference big data incidence relation graphic exhibition method flow disclosed in the embodiment of the present invention Figure;
Fig. 2 is enterprise's collage-credit data acquisition step flow chart in the step S1 that the embodiment of the present invention discloses;
Fig. 3 is enterprise's collage-credit data processing step flow chart in the step S2 that the embodiment of the present invention discloses;
Fig. 4 is enterprise's collage-credit data analytical procedure flow chart in the step S3 that the embodiment of the present invention discloses;
Fig. 5 is the collection of illustrative plates form of expression of enterprise's reference;
Fig. 6 is the obtaining step flow chart of enterprise name in S1.1 steps;
Fig. 7 is the obtaining step flow chart of enterprise's reference multi-dimensional data in S1.2 steps;
Fig. 8 is the analytical procedure flow chart of enterprise investment relation and history investment relation in S3.3 steps.
Fig. 9 is the data update flow chart of S1-b steps.
Embodiment
Example embodiment is described more fully with referring now to accompanying drawing.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to embodiment set forth herein;On the contrary, these embodiments are provided so that the present invention will Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Identical accompanying drawing in figure Mark represents same or similar structure, thus will omit their detailed description.
In the specific embodiment of the present invention, a kind of enterprise's credit investigation system realized using the inventive method is included:Just Three formula storehouse, renewal storehouse, history library databases;That is data acquisition of the invention, the framework of renewal are three numbers of structure According to storehouse:Formal storehouse, renewal storehouse, history library.Wherein, formal storehouse, the enterprise of first and/or renewal data acquisition is mainly stored Reference information (can be considered that issue storehouse i.e. stock puts most new firms collage-credit data);Storehouse is updated, the setting in the storehouse, is transferred in rising With storage periodically or non-periodically more freshly harvested enterprise's reference information, in order to the comparing with formal storehouse.Update storehouse Write-in with formal storehouse is all first to delete to be inserted into.But the more freshly harvested information of data is write when updating storehouse, is not all of taking out Take, but according to the key message of some dimension, such as date and time information, extract into renewal storehouse, so do, simply play transfer Stand, comparison effect.History library, the setting in the storehouse, it is one most important transition information of enterprise of storage:Such as during data renewal Shareholder, senior executive, the information of enterprise name, increment storage, are only inserted, do not delete covering.
Reference picture 1 and Fig. 2, a kind of enterprise's reference information fusion graphic method of the invention, comprise the following steps:
S1:Enterprise's reference information gathering step:It is first depending on enterprise name, then gathers Target Enterprise on the internet Reference information;
Enterprise's reference information acquiring step also includes step in the step S1:
S1.1:Business directory obtains;
S1.2:Collage-credit data is gathered based on enterprise name:
S1.3:Enterprise's collage-credit data storage, enters formal storehouse.
As shown in fig. 6, business directory obtaining step also includes step in the step S1.1:
S1.1.1:Analysis the administration for industry and commerce accepts the coding standard rule of enterprises registration;
S1.1.2:Determine enterprise name data source --- the administration for industry and commerce's related web site;
S1.1.3:Write corresponding data and crawl program in machine code;
S1.1.4:Obtain enterprise name.
It is of course also possible to use the mode for downloading business directory Yellow Page obtains, and still, generally business directory Yellow Page Data renewal is not in time.
As shown in fig. 7, step is also included based on enterprise name collection collage-credit data step in the step S1.2:
S1.2.1:Determine the dimension of enterprise's collage-credit data collection;
S1.2.2:Analyze URL, data format of targeted website etc.;
S1.2.3:Write using Python and perform crawlers;
S1.2.4:Obtain the webpage of enterprise's collage-credit data.
Also, the step S1.2.3 writes and performs crawlers, using Agent IP pond, will be crawled by Flume Webpage pushes away be put into Hadoop rapidly.For different targeted websites, rules for grasping is set, writes and crawls template accordingly.Climb Program fetch can realize the access frequency adjust automatically of IP address, to adapt to the anti-monitoring strategies of other side.Meanwhile program is also different Normal processing module, manually solve crawl in exception (webpage correcting, server delay machine, change Agent IP).
S2:Enterprise's collage-credit data processing:Reference information data is cleaned, is changed, formatted and stored;
With reference to figure 3, enterprise's collage-credit data processing step also includes step in the step S2:
S2.1:Enterprise's collage-credit data extracts;
S2.2:Enterprise's collage-credit data cleaning;
S2.3:Enterprise's collage-credit data conversion;
S2.4:Enterprise's collage-credit data formats.
Further, in the step S2.2 enterprise's collage-credit data cleaning, including:Remove the data of repetition and invalid Data;The data of incompleteness are found out, data is rearranged and crawls;The consistency check verifications of data, unstructured database table Link.
Further, in the step S2.4 after enterprise's collage-credit data cleaning conversion, the specification of data representation form is carried out Change, standardization, make the information of each dimension have enterprise's exclusive identification code.It is meanwhile (formal in the data loading of each dimension Storehouse) when do time-labeling, be easy to data to update.Data format meets the relevant credit standard specification of country.
S3:Enterprise's collage-credit data analysis:Relationship analysis is associated to enterprise's collage-credit data;
Referring to Fig. 4, enterprise's collage-credit data analytical procedure also includes step in the step S3:
S3.1:Enterprise's collage-credit data integrates;
S3.2:The investment relation analysis and the analysis of history investment relation of enterprise.
Wherein, enterprise's collage-credit data after cleaning, conversion, formatting, carries out data integration, passed through in the step S3.1 Enterprise's exclusive identification code associates the collage-credit data of the enterprise of multiple dimensions to be stored in Hadoop Hive, and is directed to Enterprise's collage-credit data of self-employed entrepreneur's (nearly 50% ratio of total amount being accounted for, almost without the value of data analysis) makes marks, It is not involved in data analysis.
The investment relation analysis of enterprise is to register according to the administration for industry and commerce enterprises registration " shareholder and to go out standing in step S3.2 Breath " carrys out the equity information with the investment enterprise of the association analysis enterprise investment.History investment relation is the discovery that, according to ministry of industry and commerce Situation described in " change record " of door, extraction before changing with history shareholder's information after change, or according to history library Data are extracted.
S4:Enterprise's collage-credit data retrieval, the graphic exhibition of incidence relation.
In step s 4, included by the keyword of input:Enterprise name, legal representative's name, senior executive's name, shareholder Name or Designation, full-text search inquiry, the collage-credit data of searched targets enterprise various dimensions can be carried out by Solr:Industrial and commercial information, Jurisdictional information, administrative penalty information, intellectual property information, recruitment information, news public sentiment, portal website etc..Use D3.js, by integrated data, and the result data of data correlation relation analysis, generation association collection of illustrative plates displaying is (reference can be made to figure 5)。
And because showing for collection of illustrative plates need to take many device resources, therefore, the present invention takes backstage in the present embodiment The pattern of generation, when data periodically or non-periodically update, AutoBackground updates.
In addition, in order to ensure the real-time of data, it is necessary to which data update.Thus also include at least one renewal in step S1 Step.
Refer to Fig. 9:
The renewal step (our definition steps are S1-b) of data includes in step S1:
The renewal of data is divided into batch data and regularly updated and individual data real-time update.The step of S1-b data update is also Including step:
S1-b.1:The page needed for being downloaded by data reptile;
S1-b.2:The data item of needs is extracted, enters to update storehouse;
S1-b.3:Compare the content in renewal storehouse and formal storehouse;
S1-b.4:Judge whether content needs to update, if need not update, then, it is only necessary to update formal storehouse when Between mark;If necessary to update, then,
S1-b.5:Judge to need the content that updates whether shareholder/senior executive/enterprise name, if it is then entering formal storehouse Meanwhile also enter the storage of history library increment.When entering formal storehouse, raw content is deleted, inserts new content, while renewal time marks.Such as The content of fruit renewal is not related to shareholder, senior executive, enterprise name, then, only enter formal storehouse, delete raw content, insert new content, Meanwhile renewal time marks.
Although disclosing the present invention with reference to some embodiments, before without departing substantially from the scope of the present invention and category Put, a variety of variants and modifications can be carried out to described embodiment.It will therefore be appreciated that the invention is not limited in illustrated Embodiment, its protection domain should by appended claims content and its equivalence structure and scheme limit.

Claims (13)

1. a kind of enterprise's reference information fusion graphic method, this method construct:Three formal storehouse, renewal storehouse, history library numbers According to storehouse, it is characterised in that this method comprises the following steps:
S1:Enterprise's reference information gathering step:Enterprise name is first depending on, then gathers the reference of Target Enterprise on the internet Information;
S2:Enterprise's collage-credit data processing:Reference information data is cleaned, is changed, formatted and stored;
S3:Enterprise's collage-credit data analysis:Relationship analysis is associated to enterprise's collage-credit data;
S4:Enterprise's collage-credit data retrieval, the graphic exhibition of incidence relation.
2. enterprise's reference information fusion graphic method according to claim 1, it is characterised in that
Enterprise's reference information gathering step also includes step in the step S1:
S1.1:Business directory obtains;
S1.2:Collage-credit data is gathered based on enterprise name;
S1.3:Enterprise's collage-credit data storage, enters formal storehouse.
3. enterprise's reference information fusion graphic method according to claim 2, it is characterised in that in the step S1.1 Business directory obtaining step also includes step:
S1.1.1:Analysis the administration for industry and commerce accepts the coding standard rule of enterprises registration;
S1.1.2:Determine enterprise name data source:The administration for industry and commerce's related web site;
S1.1.3:Write corresponding data and crawl program in machine code;
S1.1.4:Obtain enterprise name.
4. enterprise's reference information fusion graphic method according to claim 1, it is characterised in that in the step S1.2 Step is also included based on enterprise name collection collage-credit data step:
S1.2.1:Determine the dimension of enterprise's collage-credit data collection;
S1.2.2:Analyze URL, the data format of targeted website;
S1.2.3:Write using Python and perform crawlers;
S1.2.4:Obtain the webpage of enterprise's collage-credit data.
5. enterprise's reference information fusion graphic method according to claim 3, it is characterised in that the step S1.2.3 Write and perform crawlers, using Agent IP pond, the webpage crawled is pushed away rapidly by Flume and is put into Hadoop;Wherein, For different targeted websites, corresponding rules for grasping is set, and writes and crawls template accordingly;This, which crawls program, can realize IP The access frequency adjust automatically of address, to adapt to the anti-monitoring strategies of other side;Meanwhile the crawlers also include abnormality processing mould Block, can solve the exception in crawl manually.
6. enterprise's reference information fusion graphic method according to claim 1, it is characterised in that looked forward in the step S2 Industry collage-credit data processing step also includes step:
S2.1:Enterprise's collage-credit data extracts;
S2.2:Enterprise's collage-credit data cleaning;
S2.3:Enterprise's collage-credit data conversion;
S2.4:Enterprise's collage-credit data formats.
7. enterprise's reference information fusion graphic method according to claim 6, it is characterised in that in the step S2.2 The cleaning of enterprise's collage-credit data, including:Remove the data of repetition and invalid data;The data of incompleteness are found out, rearrange number According to crawling;Consistency check verification, the link of unstructured database table of data.
8. enterprise's reference information fusion graphic method according to claim 6, it is characterised in that in the step S2.4 After enterprise's collage-credit data cleaning conversion, the regulation and standardization processing of data representation form is carried out, makes the information of each dimension have Enterprise's exclusive identification code;Meanwhile time-labeling is done when the data of each dimension enter formal storehouse, it is easy to data to update, and data Formatting meets the relevant credit standard specification of country.
9. enterprise's reference information fusion graphic method according to claim 1, it is characterised in that looked forward in the step S3 Industry collage-credit data analytical procedure also includes step:
S3.1:Enterprise's collage-credit data integrates;
S3.2:The investment relation analysis and the analysis of history investment relation of enterprise.
10. enterprise's reference information fusion graphic method according to claim 9, it is characterised in that the step S3.1 Middle enterprise's collage-credit data carries out data integration, by enterprise's exclusive identification code multiple dimensions after cleaning, conversion, formatting Enterprise collage-credit data, associate and be stored in Hadoop Hive, and for self-employed entrepreneur enterprise's collage-credit data Make marks, and be not involved in data analysis.
11. enterprise's reference information fusion graphic method according to claim 9, it is characterised in that looked forward in step S3.2 The investment relation analysis of industry is to register " shareholder and investment information " according to the administration for industry and commerce's enterprises registration to come the association analysis enterprise Investment and the investment enterprise equity information;History investment relation is the discovery that, according to note in " change record " of the administration for industry and commerce The situation of load, extraction with history shareholder's information after change, or are extracted according to the data of history library before changing.
12. enterprise's reference information fusion graphic method according to claim 1, it is characterised in that in step S4, pass through The keyword of input includes:Enterprise name, legal representative's name, senior executive's name, shareholder's Name or Designation, are carried out by Solr Full-text search is inquired about, the collage-credit data of searched targets enterprise various dimensions;And d3.js is used, integrated data, and data are closed Join the result data of relationship analysis, generation association collection of illustrative plates displaying.
13. enterprise's reference information fusion graphic method according to claim 1, it is characterised in that also wrapped in step S1 The renewal step for including data is S1-b;The step of S1-b data update also includes step:
S1-b.1:The page needed for being downloaded by data reptile;
S1-b.2:Extract the data item of needs, deposit renewal storehouse;
S1-b.3:Compare the content in renewal storehouse and formal storehouse;
S1-b.4:Judge whether content needs to update, if need not update, then, it is only necessary to update the time mark in formal storehouse Note;If necessary to update, then,
S1-b.5:Judge to need the content that updates whether shareholder/senior executive/enterprise name, if it is then entering the same of formal storehouse When, also enter the storage of history library increment;When entering formal storehouse, raw content is deleted, inserts new content, while renewal time marks;If The content of renewal is not related to shareholder, senior executive, enterprise name, then, only enter formal storehouse, delete raw content, insert new content, together When, renewal time mark.
CN201710977078.2A 2017-10-19 2017-10-19 A kind of enterprise's reference information fusion graphic method Pending CN107657057A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710977078.2A CN107657057A (en) 2017-10-19 2017-10-19 A kind of enterprise's reference information fusion graphic method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710977078.2A CN107657057A (en) 2017-10-19 2017-10-19 A kind of enterprise's reference information fusion graphic method

Publications (1)

Publication Number Publication Date
CN107657057A true CN107657057A (en) 2018-02-02

Family

ID=61118959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710977078.2A Pending CN107657057A (en) 2017-10-19 2017-10-19 A kind of enterprise's reference information fusion graphic method

Country Status (1)

Country Link
CN (1) CN107657057A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399240A (en) * 2018-02-28 2018-08-14 北京金堤科技有限公司 Enterprise's modification information data digging method and system
CN108717426A (en) * 2018-05-04 2018-10-30 苏州朗动网络科技有限公司 Update method, device, computer equipment and the storage medium of business data
CN108846739A (en) * 2018-06-07 2018-11-20 赵德坤 A kind of credit and debt application method and system
CN109165337A (en) * 2018-10-17 2019-01-08 珠海市智图数研信息技术有限公司 A kind of method and system of knowledge based map construction bidding field association analysis
CN109377375A (en) * 2018-09-03 2019-02-22 平安科技(深圳)有限公司 Fund relation map generation method, system, computer equipment and storage medium
CN109670944A (en) * 2018-12-19 2019-04-23 信雅达系统工程股份有限公司 A kind of rating business credit method and system based on map relational network
CN109829034A (en) * 2018-08-24 2019-05-31 长威信息科技发展股份有限公司 A kind of enterprise's tree spectrogram methods of exhibiting based on main market players's credit data
WO2019205382A1 (en) * 2018-04-28 2019-10-31 平安科技(深圳)有限公司 Electronic device, credit investigation data acquisition method, and storage medium
CN110705297A (en) * 2019-09-23 2020-01-17 北京海致星图科技有限公司 Enterprise name-identifying method, system, medium and equipment
CN111310012A (en) * 2020-01-21 2020-06-19 国网安徽省电力有限公司滁州供电公司 Automatic monitoring and early warning method for enterprise information loss behavior
CN111382181A (en) * 2020-03-16 2020-07-07 中科天玑数据科技股份有限公司 Designated enterprise family affiliation analysis method and system based on stock right penetration
CN111930899A (en) * 2020-09-25 2020-11-13 成都数联铭品科技有限公司 Keyword processing method and system and keyword searching method
CN112529401A (en) * 2020-12-09 2021-03-19 国网天津市电力公司 Enterprise honest risk audit model construction method
CN112579898A (en) * 2020-12-17 2021-03-30 北京金山云网络技术有限公司 Enterprise information management method and device and server
CN115190026A (en) * 2022-05-09 2022-10-14 广州中南网络技术有限公司 Internet digital circulation method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103455636A (en) * 2013-09-27 2013-12-18 浪潮齐鲁软件产业有限公司 Automatic capturing and intelligent analyzing method based on Internet tax data
US9235642B1 (en) * 2011-09-15 2016-01-12 Isaac S. Daniel System and method for conducting searches and displaying search results
CN105740335A (en) * 2016-01-22 2016-07-06 山东合天智汇信息技术有限公司 Titan-based enterprise information analysis platform and construction method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9235642B1 (en) * 2011-09-15 2016-01-12 Isaac S. Daniel System and method for conducting searches and displaying search results
CN103455636A (en) * 2013-09-27 2013-12-18 浪潮齐鲁软件产业有限公司 Automatic capturing and intelligent analyzing method based on Internet tax data
CN105740335A (en) * 2016-01-22 2016-07-06 山东合天智汇信息技术有限公司 Titan-based enterprise information analysis platform and construction method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHARU C. AGGARWAL: "《社会网络数据分析》", 31 December 2016, 武汉大学出版社 *
ESRI中国(北京)有限公司: "《第六届ArcGIS暨ERDAS中国用户大会论文集(2004)上》", 30 September 2004, 地震出版社 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399240A (en) * 2018-02-28 2018-08-14 北京金堤科技有限公司 Enterprise's modification information data digging method and system
WO2019205382A1 (en) * 2018-04-28 2019-10-31 平安科技(深圳)有限公司 Electronic device, credit investigation data acquisition method, and storage medium
CN108717426A (en) * 2018-05-04 2018-10-30 苏州朗动网络科技有限公司 Update method, device, computer equipment and the storage medium of business data
CN108717426B (en) * 2018-05-04 2021-01-05 苏州朗动网络科技有限公司 Enterprise data updating method and device, computer equipment and storage medium
CN108846739A (en) * 2018-06-07 2018-11-20 赵德坤 A kind of credit and debt application method and system
CN109829034A (en) * 2018-08-24 2019-05-31 长威信息科技发展股份有限公司 A kind of enterprise's tree spectrogram methods of exhibiting based on main market players's credit data
CN109377375A (en) * 2018-09-03 2019-02-22 平安科技(深圳)有限公司 Fund relation map generation method, system, computer equipment and storage medium
CN109165337A (en) * 2018-10-17 2019-01-08 珠海市智图数研信息技术有限公司 A kind of method and system of knowledge based map construction bidding field association analysis
CN109165337B (en) * 2018-10-17 2021-10-15 珠海市智图数研信息技术有限公司 Method and system for establishing bid and ask field association analysis based on knowledge graph
CN109670944A (en) * 2018-12-19 2019-04-23 信雅达系统工程股份有限公司 A kind of rating business credit method and system based on map relational network
CN110705297A (en) * 2019-09-23 2020-01-17 北京海致星图科技有限公司 Enterprise name-identifying method, system, medium and equipment
CN111310012A (en) * 2020-01-21 2020-06-19 国网安徽省电力有限公司滁州供电公司 Automatic monitoring and early warning method for enterprise information loss behavior
CN111382181A (en) * 2020-03-16 2020-07-07 中科天玑数据科技股份有限公司 Designated enterprise family affiliation analysis method and system based on stock right penetration
CN111930899A (en) * 2020-09-25 2020-11-13 成都数联铭品科技有限公司 Keyword processing method and system and keyword searching method
CN111930899B (en) * 2020-09-25 2021-04-09 成都数联铭品科技有限公司 Keyword processing method and system and keyword searching method
CN112529401A (en) * 2020-12-09 2021-03-19 国网天津市电力公司 Enterprise honest risk audit model construction method
CN112579898A (en) * 2020-12-17 2021-03-30 北京金山云网络技术有限公司 Enterprise information management method and device and server
CN115190026A (en) * 2022-05-09 2022-10-14 广州中南网络技术有限公司 Internet digital circulation method

Similar Documents

Publication Publication Date Title
CN107657057A (en) A kind of enterprise's reference information fusion graphic method
US11797546B2 (en) Patent mapping
CN107239891B (en) Bidding auditing method based on big data
US9858326B2 (en) Distributed data warehouse
Li Centering labor in the land grab debate
US9710506B2 (en) Method and system for providing statistical data from a data warehouse
US8086592B2 (en) Apparatus and method for associating unstructured text with structured data
US8645332B1 (en) Systems and methods for capturing data refinement actions based on visualized search of information
CN101421725A (en) Method and system for linking business entities
CN105653671A (en) Similar information recommendation method and system
Wagner Exports, foreign direct investments and productivity: are services firms different?
DE102012221251A1 (en) Semantic and contextual search of knowledge stores
CN108052632A (en) A kind of method for obtaining network information, system and company information search system
Woodall et al. The downed and dead wood inventory of forests in the United States
CN113901308A (en) Knowledge graph-based enterprise recommendation method and recommendation device and electronic equipment
Lopes et al. From little seeds to a big tree: a far-reaching assessment of the integrated reporting stream
CN103425705B (en) The acquisition methods and device and searching method and device of a kind of negative keyword
KR101598076B1 (en) Method of data process for patent evaluation and apparatus of providing data for patent evaluation
Maggon A bibliometric analysis of Journal of Relationship Marketing (2002–2019)
Cancian et al. Bibliometric analysis for pattern exploration in worldwide digital soil mapping publications
Walters et al. The Australian digital Online Farm Trials database increases the quality of systematic reviews and meta-analyses in grains crop research
Vatresia et al. Automated Data Integration of Biodiversity with OLAP and OLTP
KR20160144113A (en) Intellectual Property Analysis System
Nguyen et al. Global Zoning and Exchangeability of Field Trial Residues Between Zones: Are There Systematic Differences in Pesticide Residues Across Geographies?
CN116991954A (en) Account data marking method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Wang Yunli

Inventor after: Cheng Bin

Inventor after: Wang Cheng

Inventor after: Shao Yunxia

Inventor after: Yang Wenhuan

Inventor after: Han Zhenzhen

Inventor before: Wang Yunli

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination