CN107577748A - Building trade information acquisition system and its method based on big data - Google Patents

Building trade information acquisition system and its method based on big data Download PDF

Info

Publication number
CN107577748A
CN107577748A CN201710760105.0A CN201710760105A CN107577748A CN 107577748 A CN107577748 A CN 107577748A CN 201710760105 A CN201710760105 A CN 201710760105A CN 107577748 A CN107577748 A CN 107577748A
Authority
CN
China
Prior art keywords
data
module
website
building trade
collection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710760105.0A
Other languages
Chinese (zh)
Inventor
徐波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Zhongjian Union Network Technology Co Ltd
Original Assignee
Chengdu Zhongjian Union Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Zhongjian Union Network Technology Co Ltd filed Critical Chengdu Zhongjian Union Network Technology Co Ltd
Priority to CN201710760105.0A priority Critical patent/CN107577748A/en
Publication of CN107577748A publication Critical patent/CN107577748A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A kind of building trade information acquisition system and method based on big data provided by the invention include data acquisition module, website scheduler module and data processing module, and the configuration interface being connected with different web sites is provided with the scheduler module of website;Data acquisition module uses message queue persistence technology, by the interface of website scheduler module to the different corresponding data of web site requests, then the data Cun Chudao databases that will be asked;Website scheduler module is used for the parameter for configuring different web sites, and is sent to the acquisition tasks that data acquisition module correspondingly starts different web sites;Data mart modeling module is used to receive the website initial data that data acquisition module transmits, and is written to after being parsed to data in database.Building trade information acquisition system based on the big data data discrete to building trade are integrated, extraction relevant information is processed analysis, practitioner is just obtained interested trade trend and all kinds of architecture informations only by the search of the data of correlation.

Description

Building trade information acquisition system and its method based on big data
Technical field
The present invention relates to information acquiring technology field, building trade information acquisition system and its side of big data are specifically based on Method.
Background technology
Big data has become the strategic resource of equal importance with natural resources, human resources, its huge society implied The great attention of scientific and technological circle and enterprise can have been caused with economic value.If effectively organize and use these big datas will be to society Huge impetus can be played with expanding economy.
The passage that building trade practitioner obtains data at present is mainly to pass through each regional Department of Construction, bid website Or the relevant information platform of national building trade obtains related information.Because building trade information data amount is big, updating decision The features such as, related personnel, which needs to take a significant amount of time energy and goes to each website to search them, to be concerned about and content interested, and It can not be grasped in time for trade trend and company performance information and certificate information.Domestic construction trade information collection side at present Face lacks a proprietary system that can complete mass data collection.
The content of the invention
For in the prior art the defects of, the present invention provide a kind of building trade information acquisition system based on big data and Its method, whole building industry can be improved and obtain the mode of information and improve the dynamic efficiency of acquisition.
A kind of building trade information acquisition system based on big data provided by the invention, including data acquisition module, net Scheduler module of standing and data processing module, independent mutually between data acquisition module and website scheduler module, website scheduler module In be provided with the configuration interface that is connected with different web sites;Data acquisition module uses message queue persistence technology, passes through website The interface of scheduler module is to the different corresponding data of web site requests, then the data Cun Chudao databases that will be asked;Adjust website Degree module is used for the parameter for configuring different web sites, and is sent to the acquisition tasks that data acquisition module correspondingly starts different web sites; Data mart modeling module is used to receive the website initial data that data acquisition module transmits, and is written to number after being parsed to data According in storehouse.
Further, the data resolution unit to match with different web sites and data warehouse are configured with data mart modeling module Unit.
Further, data acquisition module is provided with data acquisition daily record, and data acquisition daily record is used to record when generation net Script request is gathered during network exception or sends the abnormal nodes data of failure.
Further, data acquisition module is carried out to the abnormal nodes data of data acquisition daily record automatically after network recovery Re-request and/or transmission.
Further, in addition to user management module, user management module are used to manage user account, distributing user power Limit.
A kind of building trade information collecting method based on big data, comprises the following steps:
S1, developer's login system, according to the difference of collection website data type, carry out different task configuration;
S2, according to task start gathered data module, create process using the order of process pipeline, it is defeated that process then is gathered into information Go out to self-defined text, complete collection;
The data that S3, data mart modeling module are returned according to collection, carry out parsing processing and be then stored in database.
Further, the method for the configuration of task described in S1 is specially:For needing the website gathered by date, day is configured Period parameters, the script for starting gathered data module corresponding to collection script calling are acquired task;For the net of inquiry collection Stand, configure to the parameter with inquiry, call middleware to provide message duration, multi-process gathered data.
Further, the specific method that processing is parsed described in S3 is:Completed for the data gathered by date in collection The corresponding storage of enterprise's related data interested is parsed afterwards and arrives database, and the data to fail to parsing are in data display module Prompted corresponding to carrying out;For the data that inquiry gathers, the company corresponding to parses corresponding personnel, money after the completion of collection Matter, achievement and credit appraisal data, depth cleaning, the change for associate's information are carried out to duplicate data and invalid data It is updated, adds and the operation of reduction, and records the storage of more new state and arrive database.
As shown from the above technical solution, beneficial effects of the present invention:
Present invention offer a kind of building trade information acquisition system and its method based on big data, including data acquisition module, Website scheduler module and data processing module, independent mutually between data acquisition module and website scheduler module, website scheduling mould The configuration interface being connected with different web sites is provided with block;Data acquisition module uses message queue persistence technology, passes through net The interface for scheduler module of standing is to the different corresponding data of web site requests, then the data Cun Chudao databases that will be asked;Website Scheduler module is used for the parameter for configuring different web sites, and is sent to the collection times that data acquisition module correspondingly starts different web sites Business;Data mart modeling module is used to receive the website initial data that data acquisition module transmits, and is write after being parsed to data Into database.The data discrete to building trade are integrated, and extraction relevant information is processed analysis, makes practitioner only Interested trade trend and all kinds of architecture informations can just only be obtained by the search of the data of correlation.
Brief description of the drawings
, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical scheme of the prior art The required accompanying drawing used is briefly described in embodiment or description of the prior art.In all of the figs, similar element Or part is typically identified by similar reference.In accompanying drawing, each element or part might not be drawn according to the ratio of reality.
Fig. 1 is a kind of schematic flow sheet of the building trade information gathering based on big data of the present invention.
Embodiment
The embodiment of technical solution of the present invention is described in detail below in conjunction with accompanying drawing.Following examples are only used for Clearly illustrate technical scheme, therefore be only used as example, and the protection model of the present invention can not be limited with this Enclose.
It should be noted that unless otherwise indicated, technical term or scientific terminology used in this application should be this hair The ordinary meaning that bright one of ordinary skill in the art are understood.
Referring to Fig. 1, the present embodiment provides a kind of building trade information acquisition system based on big data, including data are adopted Collect module, website scheduler module and data processing module, it is independent mutually between data acquisition module and website scheduler module.Building Trade information feature data collection type is divided into two kinds, and one kind is building trade multidate information, policies and regulations, notifies bulletin etc., The data that need to be acquired by date;Another kind is that building constructor's qualification, personnel, achievement etc. are acquired by Business Name inquiry Data.It is corresponding that different interfaces is configured in website scheduler module according to the difference of corresponding collection website data type.Data are adopted Collection module uses message queue persistence technology, is counted accordingly to different web site requests by the interface of website scheduler module According to, then the data Cun Chudao databases that will be asked;Website scheduler module is used for the parameter for configuring different web sites, and is sent to number Correspondingly start the acquisition tasks of different web sites according to acquisition module;Data mart modeling module transmits for receiving data acquisition module Website initial data, and to data parse after be written in data warehouse unit.
Data acquisition module is provided with data acquisition daily record, and data acquisition daily record is used to record adopts when Network Abnormal occurs Collect script request or send the abnormal nodes data of failure.Data acquisition module is after network recovery automatically to data acquisition daily record Abnormal nodes data carry out re-request and/or transmission.The data source of collection building trade information mainly includes:Sichuan Province Department of Construction website, certified safety engineer query web, prospective design engineer's occupational qualification registration of website, cost engineer's note Volume information inquiry website, national highway construction market information management system, national water conservancy construction market credit information platform, the whole nation Construction market supervision public service platform, Sichuan Construction net(Bid, acceptance of the bid), Sichuan Province's government affairses service and public resource hand over Easy service centre(Bid, acceptance of the bid), daily paper bid in Sichuan invites and submit bids working unit credit letter than network selection, Chengdu engineering construction Breath platform, Chengdu engineering construction field project information and credit information disclose shared special column website and national credit information of enterprise Publicity system website.Data acquisition module carries out the collection of website data using Python+Beautifulsoup, uses Rddis As Message Queuing Middleware, there is provided message duration ability, constant time complexity is also can guarantee that to TB levels data above Access performance;And its throughput is high, acquisition system can be allowed outstanding in the case where a large amount of building trade website datas gather scene Work.
Come from different web sites due to gathering the data returned, data type and data format are different, against these from Scattered data need to make just be sent to application layer displaying after further processing cleaning parsing, and data mart modeling module is to collection The mass data returned is processed parsing.The data resolution unit to match with different web sites is configured with data mart modeling module With data warehouse unit.Data resolution unit carries out data respectively for the data gathered by date and the data gathered by inquiry Parsing, data warehouse unit is then stored into, and corresponding prompt is carried out to the data words data display module of parsing failure.For Inquire about data company's parsing corresponding personnel, qualification, achievement and credit appraisal etc. corresponding to after the completion of collection of collection Data, further cleaning is carried out to duplicate data and invalid data, the change for associate's information is updated, added Add and the operation of reduction, and record the storage of more new state and arrive data warehouse unit.Also include user management module, user management mould Block is used to manage user account, distributing user permission.
A kind of building trade information collecting method based on big data, comprises the following steps:
S1, developer's login system, according to the difference of collection website data type, carry out different task configuration;
Task configuration method be specially:According to task start data acquisition module, process is created using the order of process pipeline, is entered Current process PID is obtained with PID orders and by acquisition tasks information corresponding to current process pid information, then will after Cheng Qidong Process collection information is output to self-defined text.For needing the website gathered by date, date parameter is configured, starts collection pin The script of gathered data module is acquired task corresponding to this calling;For the website of inquiry collection, configure to inquiry Parameter, middleware is called to provide message duration, multi-process gathered data.
S2, according to task start gathered data module, create process using the order of process pipeline, then process gathered and believed Breath is output to self-defined text, completes collection;
After acquisition tasks startup, read PID and check corresponding process status, accomplish the real-time monitoring to task, use AJAX The journal file of acquisition tasks is read in real time and content is output to leading portion interface.It can also pass through reading when acquisition tasks malfunction PID is taken to terminate the process using process commands.
The data that S3, data mart modeling module are returned according to collection, carry out parsing processing and be then stored in database.
Parsing the specific method processed is:Enterprise to be parsed after collection is completed interested for the data gathered by date The corresponding storage of related data arrive database, and prompting corresponding to being carried out to the data of parsing failure in data display module;It is right In data company's parsing corresponding personnel, qualification, achievement and credit appraisal corresponding to after the completion of collection of inquiry collection Data, depth cleaning is carried out to duplicate data and invalid data, the change for associate's information is updated, adds and subtracted Few operation, and record the storage of more new state and arrive database.
The concrete operations flow of building trade information acquisition system based on big data is:
User management:
(1)User's registration:User enters before system, it is necessary to account number cipher login system.If without account, it is necessary to register one Individual new account.But the new account of registration is, it is necessary to which developer's rights holder's distribution group authority, otherwise can not be appointed into system What is operated;
(2)Editor user:Developer rights holder distributes a group authority, or the group power of one user of change to new user Limit;
(3)Delete user:Developer rights holder deletes a user.
Rights management:
(1)Add permission group:The authority of a permission group is added, for facilitating distributing user permission, only when user possesses certain During the authority of individual module, it could enter and operate the module;
(2)Editing authority group:Update the module authority that a permission group has.
Addition task:
Developer's authority adds the profile instance of an acquisition tasks.
Suspended task:
Click on " pause " button, can suspend one be currently running in acquisition tasks, being generally used for other side's server crash can not When gathered data.
Recovery tasks:
Recovery button is clicked on, the acquisition tasks in one pause of collection can be continued, after being generally used for pause, other side's server is extensive After multiple, continue to gather.
Check daily record:
(1)Click on " checking daily record " button, it can be seen that the daily records such as whether progress, flow for current task collection abnormal are believed Breath;
(2)Only in program operation or pause can check that daily record, task can not be checked after terminating.
Editor's task:
(1)Normal user permission, some task instances parameters can be changed with simple editing, pick-up slip number, time such as the task Deng;
(2)Developer's authority, edit the configuration of a task instances.
Deletion task
Developer's authority, delete an acquisition tasks example.
Data are checked
Checked to gathering and parsing data, the data parsed not successfully, give and prompt.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations;To the greatest extent The present invention is described in detail with reference to foregoing embodiments for pipe, it will be understood by those within the art that:Its according to The technical scheme described in foregoing embodiments can so be modified, either which part or all technical characteristic are entered Row equivalent substitution;And these modifications or replacement, the essence of appropriate technical solution is departed from various embodiments of the present invention technology The scope of scheme, it all should cover among the claim of the present invention and the scope of specification.

Claims (8)

  1. A kind of 1. building trade information acquisition system based on big data, it is characterised in that:Adjusted including data acquisition module, website Module and data processing module are spent, independent mutually between the data acquisition module and website scheduler module, the website scheduling The configuration interface being connected with different web sites is provided with module;The data acquisition module uses message queue persistence technology, By the interface of website scheduler module to the different corresponding data of web site requests, then the data Cun Chudao data that will be asked Storehouse;The website scheduler module is used for the parameter for configuring different web sites, and is sent to data acquisition module and correspondingly starts different nets The acquisition tasks stood;The data mart modeling module is used to receive the website initial data that data acquisition module transmits, and right It is written to after data parsing in database.
  2. 2. the building trade information acquisition system based on big data according to claim 1, it is characterised in that the data add The data resolution unit to match with different web sites and data warehouse unit are configured with work module.
  3. 3. the building trade information acquisition system based on big data according to claim 2, it is characterised in that the data are adopted Collection module is provided with data acquisition daily record, and the data acquisition daily record is used to record gathers script request when Network Abnormal occurs Or send the abnormal nodes data of failure.
  4. 4. the building trade information acquisition system based on big data according to claim 3, it is characterised in that the data are adopted Collect module and re-request and/or transmission are carried out to the abnormal nodes data of data acquisition daily record automatically after network recovery.
  5. 5. the building trade information acquisition system based on big data according to claim 4, it is characterised in that also including user Management module, the user management module are used to manage user account, distributing user permission.
  6. A kind of 6. building trade information collecting method based on big data, it is characterised in that:Comprise the following steps:
    S1, developer's login system, according to the difference of collection website data type, carry out different task configuration;
    S2, according to task start gathered data module, create process using the order of process pipeline, it is defeated that process then is gathered into information Go out to self-defined text, complete collection;
    The data that S3, data mart modeling module are returned according to collection, carry out parsing processing and be then stored in database.
  7. A kind of 7. building trade information collecting method based on big data according to claim 6, it is characterised in that:In S1 The method of task configuration is specially:For needing the website gathered by date, date parameter is configured, starts collection script and adjusts Task is acquired with the script of corresponding gathered data module;For the website of inquiry collection, configure to the parameter with inquiry, Middleware is called to provide message duration, multi-process gathered data.
  8. A kind of 8. building trade information collecting method based on big data according to claim 7, it is characterised in that:In S3 It is described parsing processing specific method be:Enterprise's phase interested is parsed after collection is completed for the data gathered by date Database is arrived in data corresponding storage in pass, and the data of parsing failure are prompted corresponding to the progress of data display module;For looking into Ask the data gathered company corresponding to after the completion of collection and parse corresponding personnel, qualification, achievement and credit appraisal data, Depth cleaning is carried out to duplicate data and invalid data, what the change for associate's information was updated, and added and reduced Operation, and record the storage of more new state and arrive database.
CN201710760105.0A 2017-08-30 2017-08-30 Building trade information acquisition system and its method based on big data Pending CN107577748A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710760105.0A CN107577748A (en) 2017-08-30 2017-08-30 Building trade information acquisition system and its method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710760105.0A CN107577748A (en) 2017-08-30 2017-08-30 Building trade information acquisition system and its method based on big data

Publications (1)

Publication Number Publication Date
CN107577748A true CN107577748A (en) 2018-01-12

Family

ID=61030777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710760105.0A Pending CN107577748A (en) 2017-08-30 2017-08-30 Building trade information acquisition system and its method based on big data

Country Status (1)

Country Link
CN (1) CN107577748A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109217474A (en) * 2018-10-17 2019-01-15 安徽立卓智能电网科技有限公司 A kind of new-energy grid-connected data acquisition time method based on radio channel transmission
CN110555661A (en) * 2018-05-31 2019-12-10 西安海平方网络科技有限公司 information display method, device and equipment of building element and readable storage medium
CN112104656A (en) * 2020-09-16 2020-12-18 杭州安恒信息安全技术有限公司 Network threat data acquisition method, device, equipment and medium
CN112801820A (en) * 2021-02-05 2021-05-14 郝大伟 Big data acquisition method for building construction enterprises
CN113297448A (en) * 2021-05-13 2021-08-24 中国电波传播研究所(中国电子科技集团公司第二十二研究所) Open-source electric wave environment data acquisition method based on web crawler and computer readable storage medium
CN116910108A (en) * 2023-09-13 2023-10-20 彩讯科技股份有限公司 Method, device, equipment and computer readable storage medium for processing end-side data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050165731A1 (en) * 2002-08-20 2005-07-28 Tokyo Electron Limited Method for processing data based on the data context
CN104484424A (en) * 2014-12-19 2015-04-01 浪潮通用软件有限公司 Establishing method for resource price information base of construction enterprise based on internet
CN105468664A (en) * 2015-05-12 2016-04-06 北京众标网络科技有限公司 Information acquisition method and apparatus
CN106096056A (en) * 2016-06-30 2016-11-09 西南石油大学 A kind of based on distributed public sentiment data real-time collecting method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050165731A1 (en) * 2002-08-20 2005-07-28 Tokyo Electron Limited Method for processing data based on the data context
CN104484424A (en) * 2014-12-19 2015-04-01 浪潮通用软件有限公司 Establishing method for resource price information base of construction enterprise based on internet
CN105468664A (en) * 2015-05-12 2016-04-06 北京众标网络科技有限公司 Information acquisition method and apparatus
CN106096056A (en) * 2016-06-30 2016-11-09 西南石油大学 A kind of based on distributed public sentiment data real-time collecting method and system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555661A (en) * 2018-05-31 2019-12-10 西安海平方网络科技有限公司 information display method, device and equipment of building element and readable storage medium
CN110555661B (en) * 2018-05-31 2023-05-09 西安海平方网络科技有限公司 Information display method, device and equipment for building element and readable storage medium
CN109217474A (en) * 2018-10-17 2019-01-15 安徽立卓智能电网科技有限公司 A kind of new-energy grid-connected data acquisition time method based on radio channel transmission
CN112104656A (en) * 2020-09-16 2020-12-18 杭州安恒信息安全技术有限公司 Network threat data acquisition method, device, equipment and medium
CN112104656B (en) * 2020-09-16 2022-07-12 杭州安恒信息安全技术有限公司 Network threat data acquisition method, device, equipment and medium
CN112801820A (en) * 2021-02-05 2021-05-14 郝大伟 Big data acquisition method for building construction enterprises
CN113297448A (en) * 2021-05-13 2021-08-24 中国电波传播研究所(中国电子科技集团公司第二十二研究所) Open-source electric wave environment data acquisition method based on web crawler and computer readable storage medium
CN113297448B (en) * 2021-05-13 2022-10-25 中国电波传播研究所(中国电子科技集团公司第二十二研究所) Open-source electric wave environment data acquisition method based on web crawler and computer readable storage medium
CN116910108A (en) * 2023-09-13 2023-10-20 彩讯科技股份有限公司 Method, device, equipment and computer readable storage medium for processing end-side data

Similar Documents

Publication Publication Date Title
CN107577748A (en) Building trade information acquisition system and its method based on big data
CN108446972A (en) Bank's Supervision of credit method, apparatus and fund position manage system
Chung et al. Dealing with non-functional requirements: three experimental studies of a process-oriented approach
CN112364094A (en) Visual modeling method, device and medium for data warehouse
CN111917887A (en) System for realizing data governance under big data environment
CN108470228A (en) Financial data auditing method and audit system
CN103677973A (en) Distributed multi-task scheduling management system
CN109409633A (en) Business monitoring and Warning System
CN114925045B (en) PaaS platform for big data integration and management
CN106156115A (en) A kind of resource regulating method and device
CN101582090A (en) Distributed processing method and system based on WEB analysis
CN103870919A (en) Precision marketing system
CN109345131A (en) A kind of enterprise management condition monitoring method and system
CN107506194A (en) Application version, which retracts, determines method and device
CN106355489A (en) Data center system and data processing method for management
CN107392736A (en) A kind of data processing method, device and equipment
CN113793110A (en) Industrial equipment data acquisition and analysis method based on cloud computing and cloud service platform
Hu Information lifecycle modeling framework for construction project lifecycle management
CN115719207A (en) Super-automation platform system
CN106993032A (en) The embedded accurate communication cloud service platform applied based on mobile Internet
CN109118151A (en) A kind of work order transaction methods and work order transacter
CN115168297A (en) Bypassing log auditing method and device
CN112734363A (en) Subsidy declaration and auditing method and system based on data full-link supervision
CN107644347A (en) The acquisition method of MMO game operation datas
Luping et al. An intelligent power user data analysis platform based on Spark

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180112

RJ01 Rejection of invention patent application after publication