CN110766555A - Information acquisition system - Google Patents

Information acquisition system Download PDF

Info

Publication number
CN110766555A
CN110766555A CN201911037497.3A CN201911037497A CN110766555A CN 110766555 A CN110766555 A CN 110766555A CN 201911037497 A CN201911037497 A CN 201911037497A CN 110766555 A CN110766555 A CN 110766555A
Authority
CN
China
Prior art keywords
data
information
information acquisition
exchange
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911037497.3A
Other languages
Chinese (zh)
Inventor
孟蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Financial Assets Bats Exchange Inc
Original Assignee
Beijing Financial Assets Bats Exchange Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Financial Assets Bats Exchange Inc filed Critical Beijing Financial Assets Bats Exchange Inc
Priority to CN201911037497.3A priority Critical patent/CN110766555A/en
Publication of CN110766555A publication Critical patent/CN110766555A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The application discloses an information acquisition system, which comprises a data acquisition module, a data processing module and a data sorting module. The data acquisition module is used for acquiring a webpage related to a to-be-monitored transaction listed item. The data processing module is used for extracting information related to the listing project from the webpage and cleaning the extracted information related to the listing project. And the data sorting module is used for sorting and counting the cleaned information related to the listing project and outputting a sorting and counting result. Therefore, the information monitoring work of each exchange is intelligently completed, the labor cost is saved, and the working quality and the data accuracy are ensured.

Description

Information acquisition system
Technical Field
The application relates to the technical field of information service, in particular to an information acquisition system for a listing project of each local exchange.
Background
And the information of the operation activities of other main bodies in the same industry is collected, so that the deep understanding of the industry dynamic is facilitated to optimize the operation activities of the main bodies. For example, beijing financial asset exchange limited, which is a designated trading platform of the china inter-banking market trader association and a financial national asset exchange platform designated by the ministry of finance, needs to perform information monitoring on financial national asset items of various local exchanges.
However, currently this information monitoring work is done manually. With the increasing content points related to the information monitoring content, the content surface is gradually expanded, the information is updated more frequently, and the labor cost is increased rapidly. Meanwhile, the working quality and the data accuracy are difficult to guarantee under the influence of human factors.
Therefore, an information acquisition system specially aiming at the listing items of all the local exchanges is provided.
Disclosure of Invention
The present application is proposed to solve the above-mentioned technical problems. The embodiment of the application provides an information acquisition system to the listing project of each place exchange, and it can accomplish the information monitoring work to each exchange intelligently, has saved the human cost and has ensured operating mass with rated load and data accuracy.
According to an aspect of the present application, there is provided an information acquisition system including:
the data acquisition module is used for acquiring a webpage related to a branded item of the transaction to be monitored;
the data processing module is used for extracting information related to the listing project from the webpage and cleaning the extracted information related to the listing project; and
and the data sorting module is used for sorting and counting the cleaned information related to the listing project and outputting a sorting and counting result.
In the information acquisition system according to the application, the data acquisition module acquires a webpage related to a branded item of a transaction to be monitored based on a network information capture technology.
In the information acquisition system according to the present application, the consolidated statistics include the number of the listed transaction places, the listed transaction amount of each transaction place, and the number of items under the order of 54 of each transaction place.
In the information acquisition system according to the present application, the collation statistical result is output in a tabular form.
In the information acquisition system according to the present application, the information acquisition system further includes a data query module for receiving a query request and outputting a matched query result in response to receiving the query request.
In the information acquisition system according to the present application, each of the exchange listing items is a financial national asset-type item.
According to another aspect of the present application, there is provided an information acquisition system, including:
the system comprises a user layer, a service layer and a service layer, wherein two user roles of a common user and a system administrator are arranged on the user layer, and different access authorities and functions are configured for the common user and the system administrator;
the performance layer is used for displaying data related to the to-be-monitored trading place listing item and/or sorting statistical results;
the application layer is used for searching data related to the to-be-monitored trading place listing item; browsing data related to managing the listing item of the exchange to be monitored; and, carrying out system management on the information acquisition system;
the analysis layer is used for cleaning, identifying and/or analyzing data related to the to-be-monitored exchange listing item;
the acquisition layer is used for acquiring data related to the to-be-monitored trading exchange listing project; and
and the infrastructure layer is used for deploying an operating system, a database system, an application server, a distributed cache system, a file server and a full-text retrieval system.
In the information acquisition system according to the present application, the functions configured for a general user include: data query, data statistics and report output; the functions configured for the system administrator include: user management, exchange site management, and system data management.
The information acquisition system can intelligently complete information monitoring work of each exchange, saves labor cost and ensures working quality and data accuracy.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.
FIG. 1 illustrates a block diagram schematic of an information collection system according to an embodiment of the present application.
FIG. 2 illustrates a workflow diagram of an information collection system according to an embodiment of the application.
Fig. 3 illustrates a logical architecture diagram of an information acquisition system according to an embodiment of the present application.
FIG. 4 illustrates another block diagram schematic of an information collection system according to an embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.
Exemplary information acquisition System
As shown in fig. 1, an information acquisition system according to an embodiment of the present application is illustrated. In particular, in the embodiment of the present application, the information acquisition system is an information acquisition system specially for branding projects (particularly financial national resources projects) of each local exchange, wherein the information monitoring work of each exchange can be intelligently completed through the information acquisition system, so that the labor cost is saved, and the work quality and the data accuracy are ensured.
As shown in fig. 1, the information collecting system 100 according to the embodiment of the present application includes: a data acquisition module 110, a data processing module 120, a data sorting module 130 and a data query module 140. The data collection module 110 is configured to obtain a web page related to a listed item of a transaction to be monitored. The data processing module 120 is configured to extract information related to the listing item from the web page, and perform data cleaning on the extracted information related to the listing item. The data sorting module 130 is configured to perform sorting statistics on the cleaned information related to the listing project, and output a sorting statistical result. The data query module 140 is configured to receive a query request and, in response to receiving the query request, output a matched query result.
Specifically, in the embodiment of the present application, the data collection module 110 obtains a web page related to a transaction-listed item to be monitored based on a network information spending technology. It will be appreciated by those skilled in the art that web crawling technologies, such as web crawlers (also known as web spiders, web robots, among FOAF communities, and more often referred to as web chasers), are programs or scripts that automatically crawl the web according to certain rules (other less commonly used names are ants, auto-indexes, simulators, or worms). According to the system structure and implementation technology, web crawlers can be roughly classified into the following types: general Web crawlers (General Purpose Web Crawler), Focused Web crawlers (Focused Web Crawler), Incremental Web crawlers (Incremental Web Crawler), Deep Web crawlers (Deep Web Crawler), and in specific applications, the above-mentioned several Crawler technologies can also be combined. It should be understood by those of ordinary skill in the art that the choice of web crawler technology in the embodiments of the present application is not limited by the present application.
It should be appreciated that a list of exchanges to be monitored is provided prior to crawling web pages via web crawler technology. In particular, in the present embodiment, each exchange is a local exchange, including but not limited to: tianjin financial asset exchange (abbreviated as "Tianjin" institute), Shanghai United property exchange (abbreviated as "Shanghai" affiliated institute), Chongqing financial asset exchange (abbreviated as "Chongjin" institute), etc. In this way, web pages (as raw data) associated with the branded items of the transaction to be monitored can be obtained through web crawler technology.
Further, the data processing module 120 is configured to extract information related to the listing item from the web page, and perform data cleansing on the extracted information related to the listing item. That is, information (destination information) related to the listing item is extracted from the raw data, and the extracted information related to the listing item is subjected to a cleaning process. In a specific implementation, the data cleansing process includes, but is not limited to: missing value processing, feature variable transformation, feature selection, dimension change, normalization/sparsification and the like. Also, after data cleansing is performed, the cleansed data may be loaded into a database (e.g., an SQL database).
Further, the data sorting module 130 is configured to perform sorting statistics on the cleaned information related to the listing project, and output a sorting statistical result. In particular, the consolidated statistics include, but are not limited to, the number of the listed transactions for each exchange, the amount of the listed transactions for each exchange, and the number of items ordered by 54 for each exchange. It should be noted that, in the specific implementation, the data items in the statistical result may be added, deleted, modified and sorted based on the actual information monitoring requirement, and this is not limited by the embodiment of the present application.
Moreover, the user can quickly inquire the information which the user wants to know through the data inquiry module 140. Specifically, first, the data query module 140 receives a query request from a user; accordingly, upon receiving the query request, the data query module 140 outputs a matching query result.
Fig. 2 illustrates a workflow diagram of the information acquisition system according to an embodiment of the present application. As shown in fig. 2, first, data sources including exchanges of various parties are obtained; further, data acquisition, modeling cleaning and data storage are carried out based on a web crawler technology; and then, outputting a sorting statistical result (comprising the number of the listed trades, the transaction amount of the listed trades of each exchange and the number of items under the order of 54 of each exchange) at the application end through data sorting statistics.
Fig. 3 illustrates a schematic diagram of a logical architecture of the information acquisition system according to an embodiment of the present application. As shown in fig. 3, in the embodiment of the present application, the information acquisition system includes: the system comprises a user layer, a presentation layer, an application layer, an analysis layer, an acquisition layer and a basic setting layer. The infrastructure layer is used for deploying an operating system, a database system, an application server, a distributed cache system, a file server and a full-text retrieval system. The acquisition layer is used for acquiring data related to the to-be-monitored trading place branding project. The analysis layer is used for cleaning, identifying and/or analyzing data related to the to-be-monitored exchange listing item. The application layer is used for searching data related to the to-be-monitored trading place listing item, browsing and managing the data related to the to-be-monitored trading place listing item, and performing system management on the information acquisition system. And the performance layer is used for displaying data related to the to-be-monitored trading place listing item and/or sorting statistical results. And two user roles of a common user and a system administrator are arranged on the user layer, and different access rights and functions are configured for the common user and the system administrator.
That is, with respect to the data collection module 110, the data processing module 120, the data sorting module 130 and the data query module 140 in the information collection system 100 shown in fig. 1, the data collection module 110 is mainly configured to operate at the collection layer, the data processing module 120 and the data sorting module 30 are mainly configured to operate at the analysis layer and the application layer, and the data query module 140 is mainly configured to operate at the presentation layer and the user layer.
It should be noted that, the data acquisition module 110, the data processing module 120, the data sorting module 130, and the data query module 140, the data acquisition module 110 may also operate at other layers according to the implemented functions thereof, for example, when a hardware resource needs to be called, it needs to operate at a basic setting layer. Or, when the user instruction needs to be received, the corresponding operation needs to be executed, and the operation result is fed back to the user, the operation needs to be performed on the user layer and the presentation layer.
Therefore, those skilled in the art can understand that the logical architecture of the information acquisition system shown in fig. 3 and the functional block diagram of the information acquisition system shown in fig. 1 are consistent with each other, and together implement the functions of the information acquisition system according to the embodiments of the present application.
Based on the above six-layer overall architecture, the information acquisition system is mainly divided into two major parts. FIG. 4 illustrates another block diagram schematic of the information collection system according to an embodiment of the present application. As shown in fig. 4, the information acquisition system includes a data acquisition module 110 as the first part, and the data acquisition module mainly adopts a J2EE Application architecture as the main part. And each acquisition robot module internally acquires each directional task through a multithreading mode and externally receives control of a scheduling server. By running at the back end, the transaction data on the exchange site managed and maintained by the administrator is automatically collected. The second part is a system application end, which adopts a three-layer structure based on a B/S mode to ensure the openness, safety, usability and expandability of the system, and mainly comprises a use module of a common user and a configuration management module of an administrator, and different user roles are logged in with different access authorities and functions.
In conclusion, the information acquisition system based on the embodiment of the application is clarified, and aiming at the characteristic of rapid change of market data, the web crawler technology is utilized to timely and efficiently monitor the relevant listing dynamics of the exchange of each place so as to acquire more relevant basic information references and promote the development of similar project work. In addition, the information acquisition system also carries out arrangement statistics on the monitored data, and provides functions of rapidly inquiring, counting and outputting related exchange data by utilizing a database technology, a Redis cache technology and a retrieval technology so as to achieve the aim of monitoring market dynamics.
It should be noted that, in other examples of the present application, the information acquisition system may further be configured to develop an adaptive function module based on actual user requirements, which is not limited in the present application. Furthermore, in the embodiment of the present application, the information acquisition system further includes other necessary functional elements (not described in the present application), which mainly help the information acquisition system to be normally implemented. In this regard, one skilled in the art will appreciate.
Here, it will be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the above-described information collecting system have been described in detail in the above description, and thus, a repetitive description thereof will be omitted.
As described above, the information acquisition system according to the embodiment of the present application may be implemented in various terminal devices, such as a large-screen smart device, or a computer independent of a large-screen smart device. In one example, the information collection system according to the embodiment of the present application may be integrated into a terminal device as a software module and/or a hardware module. For example, the information acquisition system may be a software module in an operating system of the terminal device, or may be an application developed for the terminal device; of course, the information collecting system may also be one of many hardware modules of the terminal device.
Alternatively, in another example, the information collection system and the terminal device may be separate devices, and the information collection system may be connected to the terminal device through a wired and/or wireless network and transmit the interactive information according to an agreed data format.
The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.
The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (8)

1. An information acquisition system, comprising:
the data acquisition module is used for acquiring a webpage related to a branded item of the transaction to be monitored;
the data processing module is used for extracting information related to the listing project from the webpage and cleaning the extracted information related to the listing project; and
and the data sorting module is used for sorting and counting the cleaned information related to the listing project and outputting a sorting and counting result.
2. The information acquisition system of claim 1, wherein the data acquisition module obtains a web page related to the held item of the transaction to be monitored based on network information crawling techniques.
3. The information collection system of claim 1, wherein the consolidated statistics comprise an amount of each exchange listed, and a number of items under 54 orders for each exchange.
4. The information acquisition system according to claim 3, wherein the consolidated statistic is output in a tabular form.
5. The information collection system of claim 1, further comprising a data query module to receive a query request and, in response to receiving the query request, output a matching query result.
6. The information acquisition system according to claim 1, wherein each of the exchange branding items is a financial financing-type item.
7. An information acquisition system, comprising:
the system comprises a user layer, a service layer and a service layer, wherein two user roles of a common user and a system administrator are arranged on the user layer, and different access authorities and functions are configured for the common user and the system administrator;
the performance layer is used for displaying data related to the to-be-monitored trading place listing item and/or sorting statistical results;
the application layer is used for searching data related to the to-be-monitored trading place listing item; browsing data related to managing the listing item of the exchange to be monitored; and, carrying out system management on the information acquisition system;
the analysis layer is used for cleaning, identifying and/or analyzing data related to the to-be-monitored exchange listing item;
the acquisition layer is used for acquiring data related to the to-be-monitored trading exchange listing project; and
and the infrastructure layer is used for deploying an operating system, a database system, an application server, a distributed cache system, a file server and a full-text retrieval system.
8. The information acquisition system according to claim 7,
the functions configured for the ordinary user include: data query, data statistics and report output; and
the functions configured for the system administrator include: user management, exchange site management, and system data management.
CN201911037497.3A 2019-10-29 2019-10-29 Information acquisition system Pending CN110766555A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911037497.3A CN110766555A (en) 2019-10-29 2019-10-29 Information acquisition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911037497.3A CN110766555A (en) 2019-10-29 2019-10-29 Information acquisition system

Publications (1)

Publication Number Publication Date
CN110766555A true CN110766555A (en) 2020-02-07

Family

ID=69334755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911037497.3A Pending CN110766555A (en) 2019-10-29 2019-10-29 Information acquisition system

Country Status (1)

Country Link
CN (1) CN110766555A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191898A (en) * 2021-05-07 2021-07-30 北京金融资产交易所有限公司 Information management system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649498A (en) * 2016-10-10 2017-05-10 合肥红珊瑚软件服务有限公司 Network public opinion analysis system based on crawler and text clustering analysis
CN108563679A (en) * 2018-03-06 2018-09-21 广西友信矿业有限公司 Quarrying Information Acquisition System based on information collection and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649498A (en) * 2016-10-10 2017-05-10 合肥红珊瑚软件服务有限公司 Network public opinion analysis system based on crawler and text clustering analysis
CN108563679A (en) * 2018-03-06 2018-09-21 广西友信矿业有限公司 Quarrying Information Acquisition System based on information collection and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191898A (en) * 2021-05-07 2021-07-30 北京金融资产交易所有限公司 Information management system

Similar Documents

Publication Publication Date Title
CN107861859B (en) Log management method and system based on micro-service architecture
US9256686B2 (en) Using a bloom filter in a web analytics application
US20060129609A1 (en) Database synchronization using change log
US20100318492A1 (en) Data analysis system and method
US8463811B2 (en) Automated correlation discovery for semi-structured processes
CN104750469A (en) Source code statistical analysis method and source code statistical analysis system
CN101625738A (en) Method and device for generating context-aware universal workflow application
CN110347688B (en) Method, device and equipment for fusing characteristics of multi-element information and storage medium
Al-Janabi A proposed framework for analyzing crime data set using decision tree and simple k-means mining algorithms
CN109740129B (en) Report generation method, device and equipment based on blockchain and readable storage medium
CN104246787A (en) Parameter adjustment for pattern discovery
CN106682206A (en) Method and system for big data processing
KR101341948B1 (en) Management system and method for knowledge information of industrial technology
CN110766555A (en) Information acquisition system
US20180046669A1 (en) Eliminating many-to-many joins between database tables
CN103488693A (en) Data processing device and data processing method
CN114416489A (en) System running state monitoring method and device, computer equipment and storage medium
CN109542986B (en) Element normalization method, device, equipment and storage medium of network data
CN108021696B (en) Data association analysis method and system
US8850313B2 (en) Systems and methods for increasing relevancy of search results in intra web domain and cross web domain search and filter operations
CN112015623A (en) Method, device and equipment for processing report data and readable storage medium
JP2009122995A (en) Management system and management method of related process record
Soibelman et al. Data fusion and modeling for construction management knowledge discovery
Gupta et al. Provenance in context of Hadoop as a Service (HaaS)-State of the Art and Research Directions
CN104951467A (en) Statistical method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200207