CN107657057A - A kind of enterprise's reference information fusion graphic method - Google Patents
A kind of enterprise's reference information fusion graphic method Download PDFInfo
- Publication number
- CN107657057A CN107657057A CN201710977078.2A CN201710977078A CN107657057A CN 107657057 A CN107657057 A CN 107657057A CN 201710977078 A CN201710977078 A CN 201710977078A CN 107657057 A CN107657057 A CN 107657057A
- Authority
- CN
- China
- Prior art keywords
- enterprise
- data
- collage
- reference information
- credit data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- Economics (AREA)
- Computational Linguistics (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention provides a kind of enterprise's reference information fusion graphic method, this method constructs:Three formal storehouse, renewal storehouse, history library databases, and comprise the following steps:S1:Enterprise's reference information gathering step:Enterprise name is first depending on, then gathers the reference information of Target Enterprise on the internet;S2:Enterprise's collage-credit data processing:Reference information data is cleaned, is changed, formatted and stored;S3:Enterprise's collage-credit data analysis:Relationship analysis is associated to enterprise's collage-credit data;S4:Enterprise's collage-credit data retrieval, the graphic exhibition of incidence relation.Enterprise's reference information fusion graphic method of the present invention has broken the information island of existing enterprise's credit investigation system, the public data of internet is merged, thorough search and statistics enterprise's reference and graphic software platform select the related information of enterprise, reduce inquiry cost, save the time.
Description
Technical field
The present invention relates to internet big data field, more particularly to a kind of enterprise's reference information fusion graphic method.
Background technology
Enterprise's reference refers to the information that disclosed Data Collection is related to the enterprise from internet, such as:Name of judicial person, stock
The administrative penalty information of eastern list, proportion of providing funds, legal representative's name, senior executive, and enterprise, jurisdictional information, knowledge production
Information etc. is weighed, and arranges, preserve, processing these credit informations, excavates the associated data value of public information.
Present enterprise's reference, it is in industry field mostly, such as enterprise's reference of banking system, the enterprise of industrial and commercial system levies
Letter, enterprise's reference of justice system, enterprise's collage-credit data of these industry fields, the longitudinal data of industry are all laid particular emphasis on, these
Data message all rests in the administrative department of industry field, forms information island one by one, mutual related information can not
It is directly perceived visible.
For general user, they see enterprise's collage-credit data of industry field, are strip and blocks, it is impossible to intuitively
It was found that incidence relation.If it is desired to the more comprehensive management state information of certain enterprise is obtained, it is necessary to spend more time to go to collect,
And manual analysis and processing and sorting are needed, to find incidence relation between some enterprises.
The content of the invention
In order to solve the problems, such as that existing enterprise's collage-credit data can not facilitate thorough search, the invention provides a kind of enterprise sign
Letter information merges graphic method, and to break information island, information data disclosed in fusion, graphical representation selectes enterprise
Incidence relation, enterprise's " true features " is seen clearly, so as to reduce cost of labor.
Concrete technical scheme of the present invention includes:
A kind of enterprise's reference information fusion graphic method, this method construct:Formal storehouse, renewal storehouse, history library three
Database, and this method comprises the following steps:
S1:Enterprise's reference information gathering step:It is first depending on enterprise name, then gathers Target Enterprise on the internet
Reference information;
S2:Enterprise's collage-credit data processing:Reference information data is cleaned, is changed, formatted and stored;S3:Enterprise
Collage-credit data is analyzed:Relationship analysis is associated to enterprise's collage-credit data;
S4:Enterprise's collage-credit data retrieval, the graphic exhibition of incidence relation.
Further, enterprise's reference information gathering step also includes step in the step S1:
S1.1:Business directory obtains;
S1.2:Collage-credit data is gathered based on enterprise name;
S1.3:Enterprise's collage-credit data storage, enters formal storehouse.
Further, business directory obtaining step also includes step in the step S1.1:
S1.1.1:Analysis the administration for industry and commerce accepts the coding standard rule of enterprises registration;
S1.1.2:Determine enterprise name data source --- the administration for industry and commerce's related web site;
S1.1.3:Write corresponding data and crawl program in machine code;
S1.1.4:Obtain enterprise name.
It is of course also possible to use the mode for downloading business directory Yellow Page obtains, and still, generally business directory Yellow Page
Data renewal be not in time.
Further, step is also included based on enterprise name collection collage-credit data step in the step S1.2:
S1.2.1:Determine the dimension of enterprise's collage-credit data collection;
S1.2.2:Analyze URL, data format of targeted website etc.;
S1.2.3:Write using Python and perform crawlers;
S1.2.4:Obtain the webpage of enterprise's collage-credit data.
Further, the step S1.2.3 writes and performs crawlers, using Agent IP pond, will be climbed by Flume
The webpage taken, which pushes away rapidly, to be put into Hadoop.For different targeted websites, rules for grasping is set, writes and crawls mould accordingly
Plate.The access frequency adjust automatically of IP address can be realized by crawling program, to adapt to the anti-monitoring strategies of other side.Meanwhile program
Also exception processing module, manually solve crawl in exception (webpage correcting, server delay machine, change Agent IP).
Further, enterprise's collage-credit data processing step also includes step in the step S2:
S2.1:Enterprise's collage-credit data extracts;
S2.2:Enterprise's collage-credit data cleaning;
S2.3:Enterprise's collage-credit data conversion;
S2.4:Enterprise's collage-credit data formats.
Further, in the step S2.2 enterprise's collage-credit data cleaning, including:Remove the data of repetition and invalid
Data;The data of incompleteness are found out, data is rearranged and crawls;The consistency check verifications of data, unstructured database table
Link.
Further, in the step S2.4 after enterprise's collage-credit data cleaning conversion, the specification of data representation form is carried out
Change, standardization, make the information of each dimension have enterprise's exclusive identification code.It is meanwhile (formal in the data loading of each dimension
Storehouse) when do time-labeling, be easy to data to update.Data format meets the relevant credit standard specification of country.
Further, enterprise's collage-credit data analytical procedure also includes step in the step S3:
S3.1:Enterprise's collage-credit data integrates;
S3.2:The investment relation analysis and the analysis of history investment relation of enterprise.
Further, enterprise's collage-credit data after cleaning, conversion, formatting, carries out data integration in the step S3.1,
Collage-credit data by enterprise's exclusive identification code the enterprise of multiple dimensions, associates and is stored in Hadoop Hive, and
Marked for enterprise's collage-credit data of self-employed entrepreneur's (nearly 50% ratio of total amount being accounted for, almost without the value of data analysis)
Note, is not involved in data analysis.
Further, the investment relation analysis of enterprise is to register " stock according to the administration for industry and commerce's enterprises registration in step S3.2
East and investment information " carrys out the equity information with the investment enterprise of the association analysis enterprise investment.History investment relation is the discovery that,
According to the administration for industry and commerce " change record " described in situation, extraction before changing with history shareholder's information after change, Huo Zheyi
Extracted according to the data of history library.
Further, in step S4, included by the keyword of input:Enterprise name, legal representative's name, senior executive's surname
Name, shareholder's Name or Designation, full-text search inquiry, the collage-credit data of searched targets enterprise various dimensions can be carried out by Solr:Work
Business's information, jurisdictional information, administrative penalty information, intellectual property information, recruitment information, news public sentiment, portal website etc..Make
With d3.js, by integrated data, and the result data of data correlation relation analysis, generation association collection of illustrative plates shows (Fig. 5).
According to the present invention, the renewal of data in step S1 (our definition steps are S1-b):In order to ensure the real-time of data
Property, it is necessary to data update.The renewal of data is divided into batch data and regularly updated and individual data real-time update.S1-b data update
The step of also include step:
S1-b.1:The page needed for being downloaded by data reptile;
S1-b.2:The data item of needs is extracted, is put in storage in (renewal storehouse);
S1-b.3:Compare the content in renewal storehouse and formal storehouse;
S1-b.4:Judge whether content needs to update, if need not update, then, it is only necessary to update formal storehouse when
Between mark;If necessary to update, then,
S1-b.5:Judge to need the content that updates whether shareholder/senior executive/enterprise name, if it is then entering formal storehouse
Meanwhile also enter the storage of history library increment.When entering formal storehouse, raw content is deleted, inserts new content, while renewal time marks.Such as
The content of fruit renewal is not related to shareholder, senior executive, enterprise name, then, (formal storehouse) only is put in storage, raw content is deleted, inserts new
Content, meanwhile, renewal time mark.In summary, data acquisition of the invention, the framework of renewal are to construct three data
Storehouse:Formal storehouse, renewal storehouse, history library.Wherein, formal storehouse, mainly store first data acquisition and/or more new firms sign
Letter information;Storehouse is updated, the setting in the storehouse, plays transfer, stores periodically or non-periodically more freshly harvested enterprise's reference information, mesh
Be for the comparing with formal storehouse.The write-in in renewal storehouse and formal storehouse is all first to delete to be inserted into.But data update
During the information write-in renewal storehouse of collection, it is not all of extracting, but according to the key message of some dimension, such as date and time information,
Extract into renewal storehouse, so do, simply play terminal, comparison effect.History library, the setting in the storehouse, it is one enterprise of storage
The most important transition information of industry:Such as data update when shareholder, senior executive, the information of enterprise name, increment storage, be only inserted,
Covering is not deleted.That is, formal storehouse is mainly used in storing enterprise's reference information of first and/or renewal data acquisition
(or other formal information);And update storehouse and play transfer, store periodically or non-periodically more freshly harvested enterprise's reference information;Go through
Shi Ku is used to store the most important transition information of certain enterprise.
In addition, according to the unique code collection of enterprise into incidence relation data, be to be showed in the form of collection of illustrative plates.Due to the exhibition of collection of illustrative plates
It is existing, many device resources need to be taken, it is therefore advantageous to, the present invention takes the pattern that backstage generates, in data periodically or non-periodically
AutoBackground updates during renewal.
It is of the invention to be using the beneficial effect of above technical scheme:Show public data of the enterprise in internet, Quan Miancha
Enterprise's reference information is ask and counts, and graphic software platform selectes the related information of enterprise, reduces inquiry cost, saves the time.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by embodiment it is required use it is attached
Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair
The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this
A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 is a kind of enterprise's reference big data incidence relation graphic exhibition method flow disclosed in the embodiment of the present invention
Figure;
Fig. 2 is enterprise's collage-credit data acquisition step flow chart in the step S1 that the embodiment of the present invention discloses;
Fig. 3 is enterprise's collage-credit data processing step flow chart in the step S2 that the embodiment of the present invention discloses;
Fig. 4 is enterprise's collage-credit data analytical procedure flow chart in the step S3 that the embodiment of the present invention discloses;
Fig. 5 is the collection of illustrative plates form of expression of enterprise's reference;
Fig. 6 is the obtaining step flow chart of enterprise name in S1.1 steps;
Fig. 7 is the obtaining step flow chart of enterprise's reference multi-dimensional data in S1.2 steps;
Fig. 8 is the analytical procedure flow chart of enterprise investment relation and history investment relation in S3.3 steps.
Fig. 9 is the data update flow chart of S1-b steps.
Embodiment
Example embodiment is described more fully with referring now to accompanying drawing.However, example embodiment can be with a variety of shapes
Formula is implemented, and is not understood as limited to embodiment set forth herein;On the contrary, these embodiments are provided so that the present invention will
Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Identical accompanying drawing in figure
Mark represents same or similar structure, thus will omit their detailed description.
In the specific embodiment of the present invention, a kind of enterprise's credit investigation system realized using the inventive method is included:Just
Three formula storehouse, renewal storehouse, history library databases;That is data acquisition of the invention, the framework of renewal are three numbers of structure
According to storehouse:Formal storehouse, renewal storehouse, history library.Wherein, formal storehouse, the enterprise of first and/or renewal data acquisition is mainly stored
Reference information (can be considered that issue storehouse i.e. stock puts most new firms collage-credit data);Storehouse is updated, the setting in the storehouse, is transferred in rising
With storage periodically or non-periodically more freshly harvested enterprise's reference information, in order to the comparing with formal storehouse.Update storehouse
Write-in with formal storehouse is all first to delete to be inserted into.But the more freshly harvested information of data is write when updating storehouse, is not all of taking out
Take, but according to the key message of some dimension, such as date and time information, extract into renewal storehouse, so do, simply play transfer
Stand, comparison effect.History library, the setting in the storehouse, it is one most important transition information of enterprise of storage:Such as during data renewal
Shareholder, senior executive, the information of enterprise name, increment storage, are only inserted, do not delete covering.
Reference picture 1 and Fig. 2, a kind of enterprise's reference information fusion graphic method of the invention, comprise the following steps:
S1:Enterprise's reference information gathering step:It is first depending on enterprise name, then gathers Target Enterprise on the internet
Reference information;
Enterprise's reference information acquiring step also includes step in the step S1:
S1.1:Business directory obtains;
S1.2:Collage-credit data is gathered based on enterprise name:
S1.3:Enterprise's collage-credit data storage, enters formal storehouse.
As shown in fig. 6, business directory obtaining step also includes step in the step S1.1:
S1.1.1:Analysis the administration for industry and commerce accepts the coding standard rule of enterprises registration;
S1.1.2:Determine enterprise name data source --- the administration for industry and commerce's related web site;
S1.1.3:Write corresponding data and crawl program in machine code;
S1.1.4:Obtain enterprise name.
It is of course also possible to use the mode for downloading business directory Yellow Page obtains, and still, generally business directory Yellow Page
Data renewal is not in time.
As shown in fig. 7, step is also included based on enterprise name collection collage-credit data step in the step S1.2:
S1.2.1:Determine the dimension of enterprise's collage-credit data collection;
S1.2.2:Analyze URL, data format of targeted website etc.;
S1.2.3:Write using Python and perform crawlers;
S1.2.4:Obtain the webpage of enterprise's collage-credit data.
Also, the step S1.2.3 writes and performs crawlers, using Agent IP pond, will be crawled by Flume
Webpage pushes away be put into Hadoop rapidly.For different targeted websites, rules for grasping is set, writes and crawls template accordingly.Climb
Program fetch can realize the access frequency adjust automatically of IP address, to adapt to the anti-monitoring strategies of other side.Meanwhile program is also different
Normal processing module, manually solve crawl in exception (webpage correcting, server delay machine, change Agent IP).
S2:Enterprise's collage-credit data processing:Reference information data is cleaned, is changed, formatted and stored;
With reference to figure 3, enterprise's collage-credit data processing step also includes step in the step S2:
S2.1:Enterprise's collage-credit data extracts;
S2.2:Enterprise's collage-credit data cleaning;
S2.3:Enterprise's collage-credit data conversion;
S2.4:Enterprise's collage-credit data formats.
Further, in the step S2.2 enterprise's collage-credit data cleaning, including:Remove the data of repetition and invalid
Data;The data of incompleteness are found out, data is rearranged and crawls;The consistency check verifications of data, unstructured database table
Link.
Further, in the step S2.4 after enterprise's collage-credit data cleaning conversion, the specification of data representation form is carried out
Change, standardization, make the information of each dimension have enterprise's exclusive identification code.It is meanwhile (formal in the data loading of each dimension
Storehouse) when do time-labeling, be easy to data to update.Data format meets the relevant credit standard specification of country.
S3:Enterprise's collage-credit data analysis:Relationship analysis is associated to enterprise's collage-credit data;
Referring to Fig. 4, enterprise's collage-credit data analytical procedure also includes step in the step S3:
S3.1:Enterprise's collage-credit data integrates;
S3.2:The investment relation analysis and the analysis of history investment relation of enterprise.
Wherein, enterprise's collage-credit data after cleaning, conversion, formatting, carries out data integration, passed through in the step S3.1
Enterprise's exclusive identification code associates the collage-credit data of the enterprise of multiple dimensions to be stored in Hadoop Hive, and is directed to
Enterprise's collage-credit data of self-employed entrepreneur's (nearly 50% ratio of total amount being accounted for, almost without the value of data analysis) makes marks,
It is not involved in data analysis.
The investment relation analysis of enterprise is to register according to the administration for industry and commerce enterprises registration " shareholder and to go out standing in step S3.2
Breath " carrys out the equity information with the investment enterprise of the association analysis enterprise investment.History investment relation is the discovery that, according to ministry of industry and commerce
Situation described in " change record " of door, extraction before changing with history shareholder's information after change, or according to history library
Data are extracted.
S4:Enterprise's collage-credit data retrieval, the graphic exhibition of incidence relation.
In step s 4, included by the keyword of input:Enterprise name, legal representative's name, senior executive's name, shareholder
Name or Designation, full-text search inquiry, the collage-credit data of searched targets enterprise various dimensions can be carried out by Solr:Industrial and commercial information,
Jurisdictional information, administrative penalty information, intellectual property information, recruitment information, news public sentiment, portal website etc..Use
D3.js, by integrated data, and the result data of data correlation relation analysis, generation association collection of illustrative plates displaying is (reference can be made to figure
5)。
And because showing for collection of illustrative plates need to take many device resources, therefore, the present invention takes backstage in the present embodiment
The pattern of generation, when data periodically or non-periodically update, AutoBackground updates.
In addition, in order to ensure the real-time of data, it is necessary to which data update.Thus also include at least one renewal in step S1
Step.
Refer to Fig. 9:
The renewal step (our definition steps are S1-b) of data includes in step S1:
The renewal of data is divided into batch data and regularly updated and individual data real-time update.The step of S1-b data update is also
Including step:
S1-b.1:The page needed for being downloaded by data reptile;
S1-b.2:The data item of needs is extracted, enters to update storehouse;
S1-b.3:Compare the content in renewal storehouse and formal storehouse;
S1-b.4:Judge whether content needs to update, if need not update, then, it is only necessary to update formal storehouse when
Between mark;If necessary to update, then,
S1-b.5:Judge to need the content that updates whether shareholder/senior executive/enterprise name, if it is then entering formal storehouse
Meanwhile also enter the storage of history library increment.When entering formal storehouse, raw content is deleted, inserts new content, while renewal time marks.Such as
The content of fruit renewal is not related to shareholder, senior executive, enterprise name, then, only enter formal storehouse, delete raw content, insert new content,
Meanwhile renewal time marks.
Although disclosing the present invention with reference to some embodiments, before without departing substantially from the scope of the present invention and category
Put, a variety of variants and modifications can be carried out to described embodiment.It will therefore be appreciated that the invention is not limited in illustrated
Embodiment, its protection domain should by appended claims content and its equivalence structure and scheme limit.
Claims (13)
1. a kind of enterprise's reference information fusion graphic method, this method construct:Three formal storehouse, renewal storehouse, history library numbers
According to storehouse, it is characterised in that this method comprises the following steps:
S1:Enterprise's reference information gathering step:Enterprise name is first depending on, then gathers the reference of Target Enterprise on the internet
Information;
S2:Enterprise's collage-credit data processing:Reference information data is cleaned, is changed, formatted and stored;
S3:Enterprise's collage-credit data analysis:Relationship analysis is associated to enterprise's collage-credit data;
S4:Enterprise's collage-credit data retrieval, the graphic exhibition of incidence relation.
2. enterprise's reference information fusion graphic method according to claim 1, it is characterised in that
Enterprise's reference information gathering step also includes step in the step S1:
S1.1:Business directory obtains;
S1.2:Collage-credit data is gathered based on enterprise name;
S1.3:Enterprise's collage-credit data storage, enters formal storehouse.
3. enterprise's reference information fusion graphic method according to claim 2, it is characterised in that in the step S1.1
Business directory obtaining step also includes step:
S1.1.1:Analysis the administration for industry and commerce accepts the coding standard rule of enterprises registration;
S1.1.2:Determine enterprise name data source:The administration for industry and commerce's related web site;
S1.1.3:Write corresponding data and crawl program in machine code;
S1.1.4:Obtain enterprise name.
4. enterprise's reference information fusion graphic method according to claim 1, it is characterised in that in the step S1.2
Step is also included based on enterprise name collection collage-credit data step:
S1.2.1:Determine the dimension of enterprise's collage-credit data collection;
S1.2.2:Analyze URL, the data format of targeted website;
S1.2.3:Write using Python and perform crawlers;
S1.2.4:Obtain the webpage of enterprise's collage-credit data.
5. enterprise's reference information fusion graphic method according to claim 3, it is characterised in that the step S1.2.3
Write and perform crawlers, using Agent IP pond, the webpage crawled is pushed away rapidly by Flume and is put into Hadoop;Wherein,
For different targeted websites, corresponding rules for grasping is set, and writes and crawls template accordingly;This, which crawls program, can realize IP
The access frequency adjust automatically of address, to adapt to the anti-monitoring strategies of other side;Meanwhile the crawlers also include abnormality processing mould
Block, can solve the exception in crawl manually.
6. enterprise's reference information fusion graphic method according to claim 1, it is characterised in that looked forward in the step S2
Industry collage-credit data processing step also includes step:
S2.1:Enterprise's collage-credit data extracts;
S2.2:Enterprise's collage-credit data cleaning;
S2.3:Enterprise's collage-credit data conversion;
S2.4:Enterprise's collage-credit data formats.
7. enterprise's reference information fusion graphic method according to claim 6, it is characterised in that in the step S2.2
The cleaning of enterprise's collage-credit data, including:Remove the data of repetition and invalid data;The data of incompleteness are found out, rearrange number
According to crawling;Consistency check verification, the link of unstructured database table of data.
8. enterprise's reference information fusion graphic method according to claim 6, it is characterised in that in the step S2.4
After enterprise's collage-credit data cleaning conversion, the regulation and standardization processing of data representation form is carried out, makes the information of each dimension have
Enterprise's exclusive identification code;Meanwhile time-labeling is done when the data of each dimension enter formal storehouse, it is easy to data to update, and data
Formatting meets the relevant credit standard specification of country.
9. enterprise's reference information fusion graphic method according to claim 1, it is characterised in that looked forward in the step S3
Industry collage-credit data analytical procedure also includes step:
S3.1:Enterprise's collage-credit data integrates;
S3.2:The investment relation analysis and the analysis of history investment relation of enterprise.
10. enterprise's reference information fusion graphic method according to claim 9, it is characterised in that the step S3.1
Middle enterprise's collage-credit data carries out data integration, by enterprise's exclusive identification code multiple dimensions after cleaning, conversion, formatting
Enterprise collage-credit data, associate and be stored in Hadoop Hive, and for self-employed entrepreneur enterprise's collage-credit data
Make marks, and be not involved in data analysis.
11. enterprise's reference information fusion graphic method according to claim 9, it is characterised in that looked forward in step S3.2
The investment relation analysis of industry is to register " shareholder and investment information " according to the administration for industry and commerce's enterprises registration to come the association analysis enterprise
Investment and the investment enterprise equity information;History investment relation is the discovery that, according to note in " change record " of the administration for industry and commerce
The situation of load, extraction with history shareholder's information after change, or are extracted according to the data of history library before changing.
12. enterprise's reference information fusion graphic method according to claim 1, it is characterised in that in step S4, pass through
The keyword of input includes:Enterprise name, legal representative's name, senior executive's name, shareholder's Name or Designation, are carried out by Solr
Full-text search is inquired about, the collage-credit data of searched targets enterprise various dimensions;And d3.js is used, integrated data, and data are closed
Join the result data of relationship analysis, generation association collection of illustrative plates displaying.
13. enterprise's reference information fusion graphic method according to claim 1, it is characterised in that also wrapped in step S1
The renewal step for including data is S1-b;The step of S1-b data update also includes step:
S1-b.1:The page needed for being downloaded by data reptile;
S1-b.2:Extract the data item of needs, deposit renewal storehouse;
S1-b.3:Compare the content in renewal storehouse and formal storehouse;
S1-b.4:Judge whether content needs to update, if need not update, then, it is only necessary to update the time mark in formal storehouse
Note;If necessary to update, then,
S1-b.5:Judge to need the content that updates whether shareholder/senior executive/enterprise name, if it is then entering the same of formal storehouse
When, also enter the storage of history library increment;When entering formal storehouse, raw content is deleted, inserts new content, while renewal time marks;If
The content of renewal is not related to shareholder, senior executive, enterprise name, then, only enter formal storehouse, delete raw content, insert new content, together
When, renewal time mark.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710977078.2A CN107657057A (en) | 2017-10-19 | 2017-10-19 | A kind of enterprise's reference information fusion graphic method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710977078.2A CN107657057A (en) | 2017-10-19 | 2017-10-19 | A kind of enterprise's reference information fusion graphic method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107657057A true CN107657057A (en) | 2018-02-02 |
Family
ID=61118959
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710977078.2A Pending CN107657057A (en) | 2017-10-19 | 2017-10-19 | A kind of enterprise's reference information fusion graphic method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107657057A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108399240A (en) * | 2018-02-28 | 2018-08-14 | 北京金堤科技有限公司 | Enterprise's modification information data digging method and system |
CN108717426A (en) * | 2018-05-04 | 2018-10-30 | 苏州朗动网络科技有限公司 | Update method, device, computer equipment and the storage medium of business data |
CN108846739A (en) * | 2018-06-07 | 2018-11-20 | 赵德坤 | A kind of credit and debt application method and system |
CN109165337A (en) * | 2018-10-17 | 2019-01-08 | 珠海市智图数研信息技术有限公司 | A kind of method and system of knowledge based map construction bidding field association analysis |
CN109377375A (en) * | 2018-09-03 | 2019-02-22 | 平安科技(深圳)有限公司 | Fund relation map generation method, system, computer equipment and storage medium |
CN109670944A (en) * | 2018-12-19 | 2019-04-23 | 信雅达系统工程股份有限公司 | A kind of rating business credit method and system based on map relational network |
CN109829034A (en) * | 2018-08-24 | 2019-05-31 | 长威信息科技发展股份有限公司 | A kind of enterprise's tree spectrogram methods of exhibiting based on main market players's credit data |
WO2019205382A1 (en) * | 2018-04-28 | 2019-10-31 | 平安科技(深圳)有限公司 | Electronic device, credit investigation data acquisition method, and storage medium |
CN110705297A (en) * | 2019-09-23 | 2020-01-17 | 北京海致星图科技有限公司 | Enterprise name-identifying method, system, medium and equipment |
CN111310012A (en) * | 2020-01-21 | 2020-06-19 | 国网安徽省电力有限公司滁州供电公司 | Automatic monitoring and early warning method for enterprise information loss behavior |
CN111382181A (en) * | 2020-03-16 | 2020-07-07 | 中科天玑数据科技股份有限公司 | Designated enterprise family affiliation analysis method and system based on stock right penetration |
CN111930899A (en) * | 2020-09-25 | 2020-11-13 | 成都数联铭品科技有限公司 | Keyword processing method and system and keyword searching method |
CN112529401A (en) * | 2020-12-09 | 2021-03-19 | 国网天津市电力公司 | Enterprise honest risk audit model construction method |
CN112579898A (en) * | 2020-12-17 | 2021-03-30 | 北京金山云网络技术有限公司 | Enterprise information management method and device and server |
CN115190026A (en) * | 2022-05-09 | 2022-10-14 | 广州中南网络技术有限公司 | Internet digital circulation method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103455636A (en) * | 2013-09-27 | 2013-12-18 | 浪潮齐鲁软件产业有限公司 | Automatic capturing and intelligent analyzing method based on Internet tax data |
US9235642B1 (en) * | 2011-09-15 | 2016-01-12 | Isaac S. Daniel | System and method for conducting searches and displaying search results |
CN105740335A (en) * | 2016-01-22 | 2016-07-06 | 山东合天智汇信息技术有限公司 | Titan-based enterprise information analysis platform and construction method thereof |
-
2017
- 2017-10-19 CN CN201710977078.2A patent/CN107657057A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9235642B1 (en) * | 2011-09-15 | 2016-01-12 | Isaac S. Daniel | System and method for conducting searches and displaying search results |
CN103455636A (en) * | 2013-09-27 | 2013-12-18 | 浪潮齐鲁软件产业有限公司 | Automatic capturing and intelligent analyzing method based on Internet tax data |
CN105740335A (en) * | 2016-01-22 | 2016-07-06 | 山东合天智汇信息技术有限公司 | Titan-based enterprise information analysis platform and construction method thereof |
Non-Patent Citations (2)
Title |
---|
CHARU C. AGGARWAL: "《社会网络数据分析》", 31 December 2016, 武汉大学出版社 * |
ESRI中国(北京)有限公司: "《第六届ArcGIS暨ERDAS中国用户大会论文集(2004)上》", 30 September 2004, 地震出版社 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108399240A (en) * | 2018-02-28 | 2018-08-14 | 北京金堤科技有限公司 | Enterprise's modification information data digging method and system |
WO2019205382A1 (en) * | 2018-04-28 | 2019-10-31 | 平安科技(深圳)有限公司 | Electronic device, credit investigation data acquisition method, and storage medium |
CN108717426A (en) * | 2018-05-04 | 2018-10-30 | 苏州朗动网络科技有限公司 | Update method, device, computer equipment and the storage medium of business data |
CN108717426B (en) * | 2018-05-04 | 2021-01-05 | 苏州朗动网络科技有限公司 | Enterprise data updating method and device, computer equipment and storage medium |
CN108846739A (en) * | 2018-06-07 | 2018-11-20 | 赵德坤 | A kind of credit and debt application method and system |
CN109829034A (en) * | 2018-08-24 | 2019-05-31 | 长威信息科技发展股份有限公司 | A kind of enterprise's tree spectrogram methods of exhibiting based on main market players's credit data |
CN109377375A (en) * | 2018-09-03 | 2019-02-22 | 平安科技(深圳)有限公司 | Fund relation map generation method, system, computer equipment and storage medium |
CN109165337A (en) * | 2018-10-17 | 2019-01-08 | 珠海市智图数研信息技术有限公司 | A kind of method and system of knowledge based map construction bidding field association analysis |
CN109165337B (en) * | 2018-10-17 | 2021-10-15 | 珠海市智图数研信息技术有限公司 | Method and system for establishing bid and ask field association analysis based on knowledge graph |
CN109670944A (en) * | 2018-12-19 | 2019-04-23 | 信雅达系统工程股份有限公司 | A kind of rating business credit method and system based on map relational network |
CN110705297A (en) * | 2019-09-23 | 2020-01-17 | 北京海致星图科技有限公司 | Enterprise name-identifying method, system, medium and equipment |
CN111310012A (en) * | 2020-01-21 | 2020-06-19 | 国网安徽省电力有限公司滁州供电公司 | Automatic monitoring and early warning method for enterprise information loss behavior |
CN111382181A (en) * | 2020-03-16 | 2020-07-07 | 中科天玑数据科技股份有限公司 | Designated enterprise family affiliation analysis method and system based on stock right penetration |
CN111930899A (en) * | 2020-09-25 | 2020-11-13 | 成都数联铭品科技有限公司 | Keyword processing method and system and keyword searching method |
CN111930899B (en) * | 2020-09-25 | 2021-04-09 | 成都数联铭品科技有限公司 | Keyword processing method and system and keyword searching method |
CN112529401A (en) * | 2020-12-09 | 2021-03-19 | 国网天津市电力公司 | Enterprise honest risk audit model construction method |
CN112579898A (en) * | 2020-12-17 | 2021-03-30 | 北京金山云网络技术有限公司 | Enterprise information management method and device and server |
CN115190026A (en) * | 2022-05-09 | 2022-10-14 | 广州中南网络技术有限公司 | Internet digital circulation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107657057A (en) | A kind of enterprise's reference information fusion graphic method | |
US11797546B2 (en) | Patent mapping | |
CN107239891B (en) | Bidding auditing method based on big data | |
US9858326B2 (en) | Distributed data warehouse | |
Li | Centering labor in the land grab debate | |
US9710506B2 (en) | Method and system for providing statistical data from a data warehouse | |
US8086592B2 (en) | Apparatus and method for associating unstructured text with structured data | |
US8645332B1 (en) | Systems and methods for capturing data refinement actions based on visualized search of information | |
CN101421725A (en) | Method and system for linking business entities | |
CN105653671A (en) | Similar information recommendation method and system | |
Wagner | Exports, foreign direct investments and productivity: are services firms different? | |
DE102012221251A1 (en) | Semantic and contextual search of knowledge stores | |
CN108052632A (en) | A kind of method for obtaining network information, system and company information search system | |
Woodall et al. | The downed and dead wood inventory of forests in the United States | |
CN113901308A (en) | Knowledge graph-based enterprise recommendation method and recommendation device and electronic equipment | |
Lopes et al. | From little seeds to a big tree: a far-reaching assessment of the integrated reporting stream | |
CN103425705B (en) | The acquisition methods and device and searching method and device of a kind of negative keyword | |
KR101598076B1 (en) | Method of data process for patent evaluation and apparatus of providing data for patent evaluation | |
Maggon | A bibliometric analysis of Journal of Relationship Marketing (2002–2019) | |
Cancian et al. | Bibliometric analysis for pattern exploration in worldwide digital soil mapping publications | |
Walters et al. | The Australian digital Online Farm Trials database increases the quality of systematic reviews and meta-analyses in grains crop research | |
Vatresia et al. | Automated Data Integration of Biodiversity with OLAP and OLTP | |
KR20160144113A (en) | Intellectual Property Analysis System | |
Nguyen et al. | Global Zoning and Exchangeability of Field Trial Residues Between Zones: Are There Systematic Differences in Pesticide Residues Across Geographies? | |
CN116991954A (en) | Account data marking method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Wang Yunli Inventor after: Cheng Bin Inventor after: Wang Cheng Inventor after: Shao Yunxia Inventor after: Yang Wenhuan Inventor after: Han Zhenzhen Inventor before: Wang Yunli |
|
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |