CN107577748A - Building trade information acquisition system and its method based on big data - Google Patents
Building trade information acquisition system and its method based on big data Download PDFInfo
- Publication number
- CN107577748A CN107577748A CN201710760105.0A CN201710760105A CN107577748A CN 107577748 A CN107577748 A CN 107577748A CN 201710760105 A CN201710760105 A CN 201710760105A CN 107577748 A CN107577748 A CN 107577748A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- website
- building trade
- collection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A kind of building trade information acquisition system and method based on big data provided by the invention include data acquisition module, website scheduler module and data processing module, and the configuration interface being connected with different web sites is provided with the scheduler module of website;Data acquisition module uses message queue persistence technology, by the interface of website scheduler module to the different corresponding data of web site requests, then the data Cun Chudao databases that will be asked;Website scheduler module is used for the parameter for configuring different web sites, and is sent to the acquisition tasks that data acquisition module correspondingly starts different web sites;Data mart modeling module is used to receive the website initial data that data acquisition module transmits, and is written to after being parsed to data in database.Building trade information acquisition system based on the big data data discrete to building trade are integrated, extraction relevant information is processed analysis, practitioner is just obtained interested trade trend and all kinds of architecture informations only by the search of the data of correlation.
Description
Technical field
The present invention relates to information acquiring technology field, building trade information acquisition system and its side of big data are specifically based on
Method.
Background technology
Big data has become the strategic resource of equal importance with natural resources, human resources, its huge society implied
The great attention of scientific and technological circle and enterprise can have been caused with economic value.If effectively organize and use these big datas will be to society
Huge impetus can be played with expanding economy.
The passage that building trade practitioner obtains data at present is mainly to pass through each regional Department of Construction, bid website
Or the relevant information platform of national building trade obtains related information.Because building trade information data amount is big, updating decision
The features such as, related personnel, which needs to take a significant amount of time energy and goes to each website to search them, to be concerned about and content interested, and
It can not be grasped in time for trade trend and company performance information and certificate information.Domestic construction trade information collection side at present
Face lacks a proprietary system that can complete mass data collection.
The content of the invention
For in the prior art the defects of, the present invention provide a kind of building trade information acquisition system based on big data and
Its method, whole building industry can be improved and obtain the mode of information and improve the dynamic efficiency of acquisition.
A kind of building trade information acquisition system based on big data provided by the invention, including data acquisition module, net
Scheduler module of standing and data processing module, independent mutually between data acquisition module and website scheduler module, website scheduler module
In be provided with the configuration interface that is connected with different web sites;Data acquisition module uses message queue persistence technology, passes through website
The interface of scheduler module is to the different corresponding data of web site requests, then the data Cun Chudao databases that will be asked;Adjust website
Degree module is used for the parameter for configuring different web sites, and is sent to the acquisition tasks that data acquisition module correspondingly starts different web sites;
Data mart modeling module is used to receive the website initial data that data acquisition module transmits, and is written to number after being parsed to data
According in storehouse.
Further, the data resolution unit to match with different web sites and data warehouse are configured with data mart modeling module
Unit.
Further, data acquisition module is provided with data acquisition daily record, and data acquisition daily record is used to record when generation net
Script request is gathered during network exception or sends the abnormal nodes data of failure.
Further, data acquisition module is carried out to the abnormal nodes data of data acquisition daily record automatically after network recovery
Re-request and/or transmission.
Further, in addition to user management module, user management module are used to manage user account, distributing user power
Limit.
A kind of building trade information collecting method based on big data, comprises the following steps:
S1, developer's login system, according to the difference of collection website data type, carry out different task configuration;
S2, according to task start gathered data module, create process using the order of process pipeline, it is defeated that process then is gathered into information
Go out to self-defined text, complete collection;
The data that S3, data mart modeling module are returned according to collection, carry out parsing processing and be then stored in database.
Further, the method for the configuration of task described in S1 is specially:For needing the website gathered by date, day is configured
Period parameters, the script for starting gathered data module corresponding to collection script calling are acquired task;For the net of inquiry collection
Stand, configure to the parameter with inquiry, call middleware to provide message duration, multi-process gathered data.
Further, the specific method that processing is parsed described in S3 is:Completed for the data gathered by date in collection
The corresponding storage of enterprise's related data interested is parsed afterwards and arrives database, and the data to fail to parsing are in data display module
Prompted corresponding to carrying out;For the data that inquiry gathers, the company corresponding to parses corresponding personnel, money after the completion of collection
Matter, achievement and credit appraisal data, depth cleaning, the change for associate's information are carried out to duplicate data and invalid data
It is updated, adds and the operation of reduction, and records the storage of more new state and arrive database.
As shown from the above technical solution, beneficial effects of the present invention:
Present invention offer a kind of building trade information acquisition system and its method based on big data, including data acquisition module,
Website scheduler module and data processing module, independent mutually between data acquisition module and website scheduler module, website scheduling mould
The configuration interface being connected with different web sites is provided with block;Data acquisition module uses message queue persistence technology, passes through net
The interface for scheduler module of standing is to the different corresponding data of web site requests, then the data Cun Chudao databases that will be asked;Website
Scheduler module is used for the parameter for configuring different web sites, and is sent to the collection times that data acquisition module correspondingly starts different web sites
Business;Data mart modeling module is used to receive the website initial data that data acquisition module transmits, and is write after being parsed to data
Into database.The data discrete to building trade are integrated, and extraction relevant information is processed analysis, makes practitioner only
Interested trade trend and all kinds of architecture informations can just only be obtained by the search of the data of correlation.
Brief description of the drawings
, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical scheme of the prior art
The required accompanying drawing used is briefly described in embodiment or description of the prior art.In all of the figs, similar element
Or part is typically identified by similar reference.In accompanying drawing, each element or part might not be drawn according to the ratio of reality.
Fig. 1 is a kind of schematic flow sheet of the building trade information gathering based on big data of the present invention.
Embodiment
The embodiment of technical solution of the present invention is described in detail below in conjunction with accompanying drawing.Following examples are only used for
Clearly illustrate technical scheme, therefore be only used as example, and the protection model of the present invention can not be limited with this
Enclose.
It should be noted that unless otherwise indicated, technical term or scientific terminology used in this application should be this hair
The ordinary meaning that bright one of ordinary skill in the art are understood.
Referring to Fig. 1, the present embodiment provides a kind of building trade information acquisition system based on big data, including data are adopted
Collect module, website scheduler module and data processing module, it is independent mutually between data acquisition module and website scheduler module.Building
Trade information feature data collection type is divided into two kinds, and one kind is building trade multidate information, policies and regulations, notifies bulletin etc.,
The data that need to be acquired by date;Another kind is that building constructor's qualification, personnel, achievement etc. are acquired by Business Name inquiry
Data.It is corresponding that different interfaces is configured in website scheduler module according to the difference of corresponding collection website data type.Data are adopted
Collection module uses message queue persistence technology, is counted accordingly to different web site requests by the interface of website scheduler module
According to, then the data Cun Chudao databases that will be asked;Website scheduler module is used for the parameter for configuring different web sites, and is sent to number
Correspondingly start the acquisition tasks of different web sites according to acquisition module;Data mart modeling module transmits for receiving data acquisition module
Website initial data, and to data parse after be written in data warehouse unit.
Data acquisition module is provided with data acquisition daily record, and data acquisition daily record is used to record adopts when Network Abnormal occurs
Collect script request or send the abnormal nodes data of failure.Data acquisition module is after network recovery automatically to data acquisition daily record
Abnormal nodes data carry out re-request and/or transmission.The data source of collection building trade information mainly includes:Sichuan Province
Department of Construction website, certified safety engineer query web, prospective design engineer's occupational qualification registration of website, cost engineer's note
Volume information inquiry website, national highway construction market information management system, national water conservancy construction market credit information platform, the whole nation
Construction market supervision public service platform, Sichuan Construction net(Bid, acceptance of the bid), Sichuan Province's government affairses service and public resource hand over
Easy service centre(Bid, acceptance of the bid), daily paper bid in Sichuan invites and submit bids working unit credit letter than network selection, Chengdu engineering construction
Breath platform, Chengdu engineering construction field project information and credit information disclose shared special column website and national credit information of enterprise
Publicity system website.Data acquisition module carries out the collection of website data using Python+Beautifulsoup, uses Rddis
As Message Queuing Middleware, there is provided message duration ability, constant time complexity is also can guarantee that to TB levels data above
Access performance;And its throughput is high, acquisition system can be allowed outstanding in the case where a large amount of building trade website datas gather scene
Work.
Come from different web sites due to gathering the data returned, data type and data format are different, against these from
Scattered data need to make just be sent to application layer displaying after further processing cleaning parsing, and data mart modeling module is to collection
The mass data returned is processed parsing.The data resolution unit to match with different web sites is configured with data mart modeling module
With data warehouse unit.Data resolution unit carries out data respectively for the data gathered by date and the data gathered by inquiry
Parsing, data warehouse unit is then stored into, and corresponding prompt is carried out to the data words data display module of parsing failure.For
Inquire about data company's parsing corresponding personnel, qualification, achievement and credit appraisal etc. corresponding to after the completion of collection of collection
Data, further cleaning is carried out to duplicate data and invalid data, the change for associate's information is updated, added
Add and the operation of reduction, and record the storage of more new state and arrive data warehouse unit.Also include user management module, user management mould
Block is used to manage user account, distributing user permission.
A kind of building trade information collecting method based on big data, comprises the following steps:
S1, developer's login system, according to the difference of collection website data type, carry out different task configuration;
Task configuration method be specially:According to task start data acquisition module, process is created using the order of process pipeline, is entered
Current process PID is obtained with PID orders and by acquisition tasks information corresponding to current process pid information, then will after Cheng Qidong
Process collection information is output to self-defined text.For needing the website gathered by date, date parameter is configured, starts collection pin
The script of gathered data module is acquired task corresponding to this calling;For the website of inquiry collection, configure to inquiry
Parameter, middleware is called to provide message duration, multi-process gathered data.
S2, according to task start gathered data module, create process using the order of process pipeline, then process gathered and believed
Breath is output to self-defined text, completes collection;
After acquisition tasks startup, read PID and check corresponding process status, accomplish the real-time monitoring to task, use AJAX
The journal file of acquisition tasks is read in real time and content is output to leading portion interface.It can also pass through reading when acquisition tasks malfunction
PID is taken to terminate the process using process commands.
The data that S3, data mart modeling module are returned according to collection, carry out parsing processing and be then stored in database.
Parsing the specific method processed is:Enterprise to be parsed after collection is completed interested for the data gathered by date
The corresponding storage of related data arrive database, and prompting corresponding to being carried out to the data of parsing failure in data display module;It is right
In data company's parsing corresponding personnel, qualification, achievement and credit appraisal corresponding to after the completion of collection of inquiry collection
Data, depth cleaning is carried out to duplicate data and invalid data, the change for associate's information is updated, adds and subtracted
Few operation, and record the storage of more new state and arrive database.
The concrete operations flow of building trade information acquisition system based on big data is:
User management:
(1)User's registration:User enters before system, it is necessary to account number cipher login system.If without account, it is necessary to register one
Individual new account.But the new account of registration is, it is necessary to which developer's rights holder's distribution group authority, otherwise can not be appointed into system
What is operated;
(2)Editor user:Developer rights holder distributes a group authority, or the group power of one user of change to new user
Limit;
(3)Delete user:Developer rights holder deletes a user.
Rights management:
(1)Add permission group:The authority of a permission group is added, for facilitating distributing user permission, only when user possesses certain
During the authority of individual module, it could enter and operate the module;
(2)Editing authority group:Update the module authority that a permission group has.
Addition task:
Developer's authority adds the profile instance of an acquisition tasks.
Suspended task:
Click on " pause " button, can suspend one be currently running in acquisition tasks, being generally used for other side's server crash can not
When gathered data.
Recovery tasks:
Recovery button is clicked on, the acquisition tasks in one pause of collection can be continued, after being generally used for pause, other side's server is extensive
After multiple, continue to gather.
Check daily record:
(1)Click on " checking daily record " button, it can be seen that the daily records such as whether progress, flow for current task collection abnormal are believed
Breath;
(2)Only in program operation or pause can check that daily record, task can not be checked after terminating.
Editor's task:
(1)Normal user permission, some task instances parameters can be changed with simple editing, pick-up slip number, time such as the task
Deng;
(2)Developer's authority, edit the configuration of a task instances.
Deletion task
Developer's authority, delete an acquisition tasks example.
Data are checked
Checked to gathering and parsing data, the data parsed not successfully, give and prompt.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent
The present invention is described in detail with reference to foregoing embodiments for pipe, it will be understood by those within the art that:Its according to
The technical scheme described in foregoing embodiments can so be modified, either which part or all technical characteristic are entered
Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology
The scope of scheme, it all should cover among the claim of the present invention and the scope of specification.
Claims (8)
- A kind of 1. building trade information acquisition system based on big data, it is characterised in that:Adjusted including data acquisition module, website Module and data processing module are spent, independent mutually between the data acquisition module and website scheduler module, the website scheduling The configuration interface being connected with different web sites is provided with module;The data acquisition module uses message queue persistence technology, By the interface of website scheduler module to the different corresponding data of web site requests, then the data Cun Chudao data that will be asked Storehouse;The website scheduler module is used for the parameter for configuring different web sites, and is sent to data acquisition module and correspondingly starts different nets The acquisition tasks stood;The data mart modeling module is used to receive the website initial data that data acquisition module transmits, and right It is written to after data parsing in database.
- 2. the building trade information acquisition system based on big data according to claim 1, it is characterised in that the data add The data resolution unit to match with different web sites and data warehouse unit are configured with work module.
- 3. the building trade information acquisition system based on big data according to claim 2, it is characterised in that the data are adopted Collection module is provided with data acquisition daily record, and the data acquisition daily record is used to record gathers script request when Network Abnormal occurs Or send the abnormal nodes data of failure.
- 4. the building trade information acquisition system based on big data according to claim 3, it is characterised in that the data are adopted Collect module and re-request and/or transmission are carried out to the abnormal nodes data of data acquisition daily record automatically after network recovery.
- 5. the building trade information acquisition system based on big data according to claim 4, it is characterised in that also including user Management module, the user management module are used to manage user account, distributing user permission.
- A kind of 6. building trade information collecting method based on big data, it is characterised in that:Comprise the following steps:S1, developer's login system, according to the difference of collection website data type, carry out different task configuration;S2, according to task start gathered data module, create process using the order of process pipeline, it is defeated that process then is gathered into information Go out to self-defined text, complete collection;The data that S3, data mart modeling module are returned according to collection, carry out parsing processing and be then stored in database.
- A kind of 7. building trade information collecting method based on big data according to claim 6, it is characterised in that:In S1 The method of task configuration is specially:For needing the website gathered by date, date parameter is configured, starts collection script and adjusts Task is acquired with the script of corresponding gathered data module;For the website of inquiry collection, configure to the parameter with inquiry, Middleware is called to provide message duration, multi-process gathered data.
- A kind of 8. building trade information collecting method based on big data according to claim 7, it is characterised in that:In S3 It is described parsing processing specific method be:Enterprise's phase interested is parsed after collection is completed for the data gathered by date Database is arrived in data corresponding storage in pass, and the data of parsing failure are prompted corresponding to the progress of data display module;For looking into Ask the data gathered company corresponding to after the completion of collection and parse corresponding personnel, qualification, achievement and credit appraisal data, Depth cleaning is carried out to duplicate data and invalid data, what the change for associate's information was updated, and added and reduced Operation, and record the storage of more new state and arrive database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710760105.0A CN107577748A (en) | 2017-08-30 | 2017-08-30 | Building trade information acquisition system and its method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710760105.0A CN107577748A (en) | 2017-08-30 | 2017-08-30 | Building trade information acquisition system and its method based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107577748A true CN107577748A (en) | 2018-01-12 |
Family
ID=61030777
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710760105.0A Pending CN107577748A (en) | 2017-08-30 | 2017-08-30 | Building trade information acquisition system and its method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107577748A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109217474A (en) * | 2018-10-17 | 2019-01-15 | 安徽立卓智能电网科技有限公司 | A kind of new-energy grid-connected data acquisition time method based on radio channel transmission |
CN110555661A (en) * | 2018-05-31 | 2019-12-10 | 西安海平方网络科技有限公司 | information display method, device and equipment of building element and readable storage medium |
CN112104656A (en) * | 2020-09-16 | 2020-12-18 | 杭州安恒信息安全技术有限公司 | Network threat data acquisition method, device, equipment and medium |
CN112801820A (en) * | 2021-02-05 | 2021-05-14 | 郝大伟 | Big data acquisition method for building construction enterprises |
CN113297448A (en) * | 2021-05-13 | 2021-08-24 | 中国电波传播研究所(中国电子科技集团公司第二十二研究所) | Open-source electric wave environment data acquisition method based on web crawler and computer readable storage medium |
CN116910108A (en) * | 2023-09-13 | 2023-10-20 | 彩讯科技股份有限公司 | Method, device, equipment and computer readable storage medium for processing end-side data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050165731A1 (en) * | 2002-08-20 | 2005-07-28 | Tokyo Electron Limited | Method for processing data based on the data context |
CN104484424A (en) * | 2014-12-19 | 2015-04-01 | 浪潮通用软件有限公司 | Establishing method for resource price information base of construction enterprise based on internet |
CN105468664A (en) * | 2015-05-12 | 2016-04-06 | 北京众标网络科技有限公司 | Information acquisition method and apparatus |
CN106096056A (en) * | 2016-06-30 | 2016-11-09 | 西南石油大学 | A kind of based on distributed public sentiment data real-time collecting method and system |
-
2017
- 2017-08-30 CN CN201710760105.0A patent/CN107577748A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050165731A1 (en) * | 2002-08-20 | 2005-07-28 | Tokyo Electron Limited | Method for processing data based on the data context |
CN104484424A (en) * | 2014-12-19 | 2015-04-01 | 浪潮通用软件有限公司 | Establishing method for resource price information base of construction enterprise based on internet |
CN105468664A (en) * | 2015-05-12 | 2016-04-06 | 北京众标网络科技有限公司 | Information acquisition method and apparatus |
CN106096056A (en) * | 2016-06-30 | 2016-11-09 | 西南石油大学 | A kind of based on distributed public sentiment data real-time collecting method and system |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555661A (en) * | 2018-05-31 | 2019-12-10 | 西安海平方网络科技有限公司 | information display method, device and equipment of building element and readable storage medium |
CN110555661B (en) * | 2018-05-31 | 2023-05-09 | 西安海平方网络科技有限公司 | Information display method, device and equipment for building element and readable storage medium |
CN109217474A (en) * | 2018-10-17 | 2019-01-15 | 安徽立卓智能电网科技有限公司 | A kind of new-energy grid-connected data acquisition time method based on radio channel transmission |
CN112104656A (en) * | 2020-09-16 | 2020-12-18 | 杭州安恒信息安全技术有限公司 | Network threat data acquisition method, device, equipment and medium |
CN112104656B (en) * | 2020-09-16 | 2022-07-12 | 杭州安恒信息安全技术有限公司 | Network threat data acquisition method, device, equipment and medium |
CN112801820A (en) * | 2021-02-05 | 2021-05-14 | 郝大伟 | Big data acquisition method for building construction enterprises |
CN113297448A (en) * | 2021-05-13 | 2021-08-24 | 中国电波传播研究所(中国电子科技集团公司第二十二研究所) | Open-source electric wave environment data acquisition method based on web crawler and computer readable storage medium |
CN113297448B (en) * | 2021-05-13 | 2022-10-25 | 中国电波传播研究所(中国电子科技集团公司第二十二研究所) | Open-source electric wave environment data acquisition method based on web crawler and computer readable storage medium |
CN116910108A (en) * | 2023-09-13 | 2023-10-20 | 彩讯科技股份有限公司 | Method, device, equipment and computer readable storage medium for processing end-side data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107577748A (en) | Building trade information acquisition system and its method based on big data | |
CN108446972A (en) | Bank's Supervision of credit method, apparatus and fund position manage system | |
Chung et al. | Dealing with non-functional requirements: three experimental studies of a process-oriented approach | |
CN112364094A (en) | Visual modeling method, device and medium for data warehouse | |
CN111917887A (en) | System for realizing data governance under big data environment | |
CN108470228A (en) | Financial data auditing method and audit system | |
CN103677973A (en) | Distributed multi-task scheduling management system | |
CN109409633A (en) | Business monitoring and Warning System | |
CN114925045B (en) | PaaS platform for big data integration and management | |
CN106156115A (en) | A kind of resource regulating method and device | |
CN101582090A (en) | Distributed processing method and system based on WEB analysis | |
CN103870919A (en) | Precision marketing system | |
CN109345131A (en) | A kind of enterprise management condition monitoring method and system | |
CN107506194A (en) | Application version, which retracts, determines method and device | |
CN106355489A (en) | Data center system and data processing method for management | |
CN107392736A (en) | A kind of data processing method, device and equipment | |
CN113793110A (en) | Industrial equipment data acquisition and analysis method based on cloud computing and cloud service platform | |
Hu | Information lifecycle modeling framework for construction project lifecycle management | |
CN115719207A (en) | Super-automation platform system | |
CN106993032A (en) | The embedded accurate communication cloud service platform applied based on mobile Internet | |
CN109118151A (en) | A kind of work order transaction methods and work order transacter | |
CN115168297A (en) | Bypassing log auditing method and device | |
CN112734363A (en) | Subsidy declaration and auditing method and system based on data full-link supervision | |
CN107644347A (en) | The acquisition method of MMO game operation datas | |
Luping et al. | An intelligent power user data analysis platform based on Spark |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180112 |
|
RJ01 | Rejection of invention patent application after publication |