CN114020833A - Financial information synchronous classification system and establishment method - Google Patents

Financial information synchronous classification system and establishment method Download PDF

Info

Publication number
CN114020833A
CN114020833A CN202111219358.XA CN202111219358A CN114020833A CN 114020833 A CN114020833 A CN 114020833A CN 202111219358 A CN202111219358 A CN 202111219358A CN 114020833 A CN114020833 A CN 114020833A
Authority
CN
China
Prior art keywords
information
list
data
processing
platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111219358.XA
Other languages
Chinese (zh)
Inventor
张琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dongfang Fortune Financial Data Service Co ltd
Original Assignee
Shanghai Dongfang Fortune Financial Data Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dongfang Fortune Financial Data Service Co ltd filed Critical Shanghai Dongfang Fortune Financial Data Service Co ltd
Priority to CN202111219358.XA priority Critical patent/CN114020833A/en
Publication of CN114020833A publication Critical patent/CN114020833A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a financial information synchronous classification system and an establishment method thereof, which can ensure the timeliness, consistency, stability and expandability of financial information under the condition of processing mass financial information. The system is synchronous in a full-automatic mode from acquisition to display, manual intervention is not needed, the overall processing speed is high, the timeliness of financial information is improved, and meanwhile, due to the separation of a classification list and summary information, the response speed of a request under the condition of deep paging is higher than that of other systems; the related financial information is subjected to uniform abstract decomposition, a uniform processing platform is adopted, the processing and processing of the related financial information are facilitated, and the consistency is ensured; technologies used in a system synchronization layer have corresponding fault-tolerant schemes and perform peak clipping processing aiming at source data, so that the overall stability is improved; the technology used by the system supports the cluster mode, is convenient to expand, is convenient to deploy across machine rooms, and ensures expandability.

Description

Financial information synchronous classification system and establishment method
Technical Field
The invention relates to a data pipeline technology, in particular to a financial information synchronous classification system and an establishment method.
Background
At present, financial information is large in data volume and is increased year by year, and the synchronous processing level of the financial information is difficult, the sources are more, the business logic is more, and the real-time requirement is high.
The existing technical scheme mainly comprises the following modes:
1. the conventional content management system CMS is utilized to build a CMS system, source financial information data are stored in the CMS, and then an external interface is provided; the data management mode can encounter a bottleneck when the CMS system is once established and then is subjected to expansibility again, and the stability of the CMS system cannot be well guaranteed under the condition of higher concurrency.
2. Inserting source data into a mysql database through an etl tool, searching the database when requesting, and then providing data to the outside; in the data management mode, after data is stored in the mysql engine, the problem of processing larger texts or files such as texts and attachments firstly exists, and the list is loaded slowly for the first time under the condition of high concurrency.
3. Inserting source data into an elastic search through an etl tool, and directly retrieving a search engine to provide data to the outside when a request is made; in this data management method, when data is stored in the elastic search, when the list is large, deep paging is caused, paging is slow, and although cursor operation is supported in the ES, the cursor cannot meet the requirement even under the conditions that the list is large and large in number and needs the cursor.
In summary, the following problems exist in the current management of financial information data: poor expandability, such as multi-point expansion, and problems caused by the cross-machine-room deployment of the system; the deep paging performance can not meet the requirement, and the speed of most information products is lower when the length of a list reaches ten million levels of paging jump; the consistency problem, many information products will be like information, announcements, research and report information types will be divided into a plurality of system implementation, the consistency is poor, not convenient to maintain.
Disclosure of Invention
Aiming at the problems existing in the content management of mass financial information, a financial information synchronous classification system and an establishment method are provided, and the problems of synchronization and classification of mass financial information are solved.
The technical scheme of the invention is as follows: a financial information synchronous classification system comprises an acquisition configuration platform, a streaming processing platform, a list processing platform and a financial information display platform; the acquisition configuration platform acquires financial information data regularly, the financial information data are classified into different classification subject queues according to financial information data of different acquisition sources and then are sent to the streaming processing platform, the streaming processing platform abstracts information in the acquired financial information data into information summary information, information text and accessories, information index information and information list information and sends the information summary information, the information text and accessories, the information index information and the information list information to the financial information display platform, the information list information which cannot be classified by the streaming processing platform is sent to the financial information display platform after being processed by the list processing platform, and the financial information display platform extracts financial information data according to query requirements and displays the financial information data.
The method for establishing the financial information synchronous classification system specifically comprises the following steps:
1) establishing an acquisition configuration platform: configuring an acquisition task, and generating a corresponding acquisition program in a platform for periodic acquisition; inserting financial information data acquired regularly into a message queue KAFKA, establishing different classification subject queues in the message queue according to financial information data of different acquisition sources, and distinguishing different data sources through the classification subject queues;
2) establishing a streaming processing platform: subscribing different classification subject queues in a streaming processing platform, acquiring single information data from a corresponding message queue after subscription, and performing business logic processing, wherein the business logic processing is to abstract and decompose information in the acquired financial information data into four types of data, namely information summary information, information text and attachments, information index information and information list information, and then perform processing respectively;
3) establishing a list processing platform:
generating different types of information lists, and searching the information data entering a search engine to generate lists to be inserted into a K-V list library;
4) establishing a financial information display platform: and extracting and displaying the processed financial information data at the front end according to the operation of specific business query logic.
Further, the summary information of the information in the step 2) refers to a small text field in the information data; the information index information refers to a field for Chinese keyword retrieval in the information data; the information list information refers to that the main keys of the information data are put into a list for storage according to a set rule.
Further, the information summary information processing in the step 2): storing the information summary information into a K-V summary library according to a main key in a streaming processing platform, and storing the information summary information by taking the main key + attribute as a key and the specific summary information as a value;
and the information text and the attachment information are processed: inserting the text and the attachment information with larger information data quantity into a big data storage engine in a streaming processing platform;
the information index information processing: in the streaming processing platform, merging the multi-source data, merging the index information of the required information into the same row according to the information primary key id;
and information processing of the information list: firstly, the information list information is divided into 2 kinds of data processing according to whether the information can be directly classified as the purpose in the streaming processing platform, which is as follows:
data processing can be categorized directly on a streaming platform: the classified data can be derived according to the summary fields in the information list information, and the main keys of the list are directly stored in a K-V list library list for storage;
data processing cannot be categorized on streaming platforms: if the received information list information can not be classified directly by the streaming processing platform, the information list information is put on the list processing platform for processing.
Further, the multi-source data processed by the information index information in the step 2) are merged, and the specific method is that if the source data distinguishes the slave table information from the main table information, the slave table information is processed preferentially, and the slave table information is stored into a summary K-V library through an information main key id; when the main table information is processed, the required summary fields are obtained from the summary K-V library of the slave table according to the main key id, and the index information of the required information is merged into one line and then the whole information is stored in a search engine.
Further, the list processing platform in step 3) specifically includes: configuring the condition needing matching the list, generating a query task in the list configuration platform, acquiring the configuration condition to perform incremental query, and importing the incremental query result into a K-V list library to perform list storage.
Further, the information is displayed according to the query requirement in the step 4) in the following 3 modes:
acquiring an information list: firstly, acquiring main key information through a K-V list library, inquiring required related data attributes according to set rules of an information list, synthesizing the acquired main key information and required related data attribute fields into a summary main key, acquiring information summary information from the K-V summary library, synthesizing inquiry results after all required fields are obtained, and returning the results to a front end for display;
information search: firstly, returning the information of the primary key list to a search engine through a query condition, and acquiring the information of the primary key list, and then acquiring the subsequent summary field in the same process of acquiring the summary of the information list.
Acquiring text or attachment operation: and when the text and the attachment related to the information are acquired, the text and the attachment are directly acquired from the big data storage engine according to the main key and returned to the front end for display.
The invention has the beneficial effects that: the financial information synchronous classification system and the establishment method thereof ensure the timeliness, consistency, stability and expandability of financial information under the condition of processing mass financial information (news, bulletins, research and report, laws and regulations, news, event reminding and the like).
The system is synchronous in a full-automatic mode from acquisition to display, manual intervention is not needed, the overall processing speed is high, the timeliness of financial information is improved, and meanwhile, due to the separation of a classification list and summary information, the response speed of a request under the condition of deep paging is much higher than that of other systems; the related financial information is subjected to uniform abstraction (summary, list, index and text attachment), a uniform processing platform is realized, the processing and processing of the related financial information are facilitated, and the consistency is ensured; technologies used in a system synchronization layer have corresponding fault-tolerant schemes and perform peak clipping processing aiming at source data, so that the overall stability is improved; the technology used by the system supports the cluster mode, is convenient to expand, is convenient to deploy across machine rooms, and ensures expandability.
Drawings
FIG. 1 is a schematic diagram illustrating the operation of the financial information synchronization classifying system according to the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
As shown in FIG. 1, the financial information synchronous classifying system of the present invention is constructed as follows:
1. and configuring an acquisition task in an acquisition configuration platform, and generating a corresponding acquisition program in the platform to acquire periodically.
2. The financial information data acquired regularly is inserted into a message queue KAFKA, and different classification subject queues are established in the message queue according to the financial information data of different acquisition sources, namely the financial information data of the same acquisition source is classified into the same class queue, wherein the classification subject queues are used for distinguishing different data sources. The classification of the financial information data is processed in the subsequent flow processing platform process, the establishment of the message queue has the advantages that the acquisition task and the business logic processing are decoupled, the acquisition end can be added with any type of acquisition source and task, the acquired tangent plane end point is always the message queue, and the expansion and the maintenance are easy.
3. The receiving end of the message queue is a stream processing platform STORM, different classification subject queues are subscribed in the stream processing platform STORM, a single piece of information data can be obtained from the corresponding message queue after subscription for service logic processing, when some tasks are processed abnormally, the stream processing platform can distribute the task to other working nodes for processing, and stability and timeliness of service logic processing are guaranteed.
4. The key points of information processing of the STORM streaming processing platform are as follows:
the method mainly carries out related business processing in a streaming processing platform according to specific information business logic, abstractly decomposes information in the acquired financial information data into four types of data of information summary information, information text and attachments, information index information and information list information, and specifically decomposes the data in the following way:
information summary information: the summary information of the information refers to small text fields in the information data such as title, time, author, organization, etc.
Information text and attachments: the information text and attachments may be of a much larger amount than the summary compared to the summary. Text material or reference material with attached text
Information index information: the information index information refers to some fields of a certain kind of information data needing Chinese keyword retrieval, and there may be some large text fields, such as text information and accessory extraction content. The large text data only needs to index the content in the search engine, and the original text does not need to be reserved.
Information list information: each piece of information data has a main key information, such as id and other fields capable of uniquely identifying the piece of information data, and the information list is to store the main key of each piece of information data in a list according to a set rule. The rules are set, for example, in a chronological order or in a thermal order.
4.1, information summary information processing: and storing the information summary information into a K-V summary library according to the main key in the streaming processing platform, and storing the information summary information by taking the main key + attribute as a key and the specific summary information as a value, preferably a redis cluster.
4.2, information text and attachment information processing: text and attachment information with larger information data volume is inserted into a big data storage engine in a streaming processing platform, and HBASE is preferred.
4.3, information index information processing: in the streaming platform, data integration is performed on fields required by the information index information. The purpose of data integration is to merge multi-source data and merge the index information of the required information into the same row according to the information primary key id.
In the specific processing, if source data is distinguished from slave table and master table information, the slave table information is preferentially processed, and the slave table information is stored in a summary K-V library through an information master key id. When the main table information is processed, the required summary fields are obtained from the summary K-V library of the slave table according to the main key id, and the index information of the required information is merged into one line and then the whole information is stored in a search engine.
4.4, the information list information processing is as follows:
firstly, the information list information is divided into 2 kinds of data according to whether the information can be directly classified as the purpose in the streaming processing platform:
A. the classification can be directly on-stream platform:
the classified data can be derived according to the summary field in the information list information, and the main key of the list can be directly stored in the K-V list library list for storage.
B. Non-flowable platform classification:
if the received information list information can not be directly classified by the streaming processing platform, the information list information is put on the list processing platform for processing, and the specific processing method is shown in 5.
5. A list processing platform:
the list processing platform is mainly used for generating different types of information lists and mainly generating lists by searching information data entering a search engine and inserting the lists into a K-V list library. The generation process is as follows:
and 5.1, configuring conditions needing to match the list in the list classification configuration platform.
And 5.2, generating a query task in the list configuration platform, and acquiring configuration conditions to perform incremental query.
And 5.3, importing the increment query result into a K-V list library for list storage.
6. Financial information display platform: after the information summary information, the information text and the attachment, the information index information and the information list information are processed, the operation of specific service query logic can be carried out. The information is displayed according to the query requirement in the following 3 modes:
6.1, acquiring an information list:
the method comprises the steps of firstly obtaining main key information through a K-V list library, using time reverse arrangement during storage, performing paging query or increment aiming at time, synthesizing the obtained main key information and required related data attribute fields into summary main keys through decrement query, obtaining information summary information from the K-V summary library, synthesizing query results after obtaining all required fields, and returning the results to a front end for display.
6.2, information search:
firstly, entering a search engine through query conditions to return primary key list information, and after the list primary key information is obtained, obtaining the subsequent summary fields and obtaining the summary process in the same 6.1 way.
6.3. Acquiring text or attachment operation:
when the text and the attachment related to the information are obtained, the text and the attachment related to the information can be directly obtained from the big data storage engine according to the main key and returned to the front end for displaying.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (7)

1. A financial information synchronous classification system is characterized by comprising an acquisition configuration platform, a streaming processing platform, a list processing platform and a financial information display platform; the acquisition configuration platform acquires financial information data regularly, the financial information data are classified into different classification subject queues according to financial information data of different acquisition sources and then are sent to the streaming processing platform, the streaming processing platform abstracts information in the acquired financial information data into information summary information, information text and accessories, information index information and information list information and sends the information summary information, the information text and accessories, the information index information and the information list information to the financial information display platform, the information list information which cannot be classified by the streaming processing platform is sent to the financial information display platform after being processed by the list processing platform, and the financial information display platform extracts financial information data according to query requirements and displays the financial information data.
2. The method for establishing a financial information synchronous classification system as claimed in claim 1, further comprising the steps of:
1) establishing an acquisition configuration platform: configuring an acquisition task, and generating a corresponding acquisition program in a platform for periodic acquisition; inserting financial information data acquired regularly into a message queue KAFKA, establishing different classification subject queues in the message queue according to financial information data of different acquisition sources, and distinguishing different data sources through the classification subject queues;
2) establishing a streaming processing platform: subscribing different classification subject queues in a streaming processing platform, acquiring single information data from a corresponding message queue after subscription, and performing business logic processing, wherein the business logic processing is to abstract and decompose information in the acquired financial information data into four types of data, namely information summary information, information text and attachments, information index information and information list information, and then perform processing respectively;
3) establishing a list processing platform:
generating different types of information lists, and searching the information data entering a search engine to generate lists to be inserted into a K-V list library;
4) establishing a financial information display platform: and extracting and displaying the processed financial information data at the front end according to the operation of specific business query logic.
3. The method for establishing a financial information synchronous classification system according to claim 2, wherein the information summary information in step 2) refers to a small text field in the information data; the information index information refers to a field for Chinese keyword retrieval in the information data; the information list information refers to that the main keys of the information data are put into a list for storage according to a set rule.
4. The method for establishing a financial information synchronous classification system according to claim 3, wherein the information summary information in step 2) is processed by: storing the information summary information into a K-V summary library according to a main key in a streaming processing platform, and storing the information summary information by taking the main key + attribute as a key and the specific summary information as a value;
and the information text and the attachment information are processed: inserting the text and the attachment information with larger information data quantity into a big data storage engine in a streaming processing platform;
the information index information processing: in the streaming processing platform, merging the multi-source data, merging the index information of the required information into the same row according to the information primary key id;
and information processing of the information list: firstly, the information list information is divided into 2 kinds of data processing according to whether the information can be directly classified as the purpose in the streaming processing platform, which is as follows:
data processing can be categorized directly on a streaming platform: the classified data can be derived according to the summary fields in the information list information, and the main keys of the list are directly stored in a K-V list library list for storage;
data processing cannot be categorized on streaming platforms: if the received information list information can not be classified directly by the streaming processing platform, the information list information is put on the list processing platform for processing.
5. The method for establishing a financial information synchronous classification system according to claim 4, wherein the multi-source data processed by the information index information in step 2) are merged by preferentially processing the slave table information if the source data is distinguished from the slave table and the master table information, and storing the slave table information into the summary K-V library through the information master key id; when the main table information is processed, the required summary fields are obtained from the summary K-V library of the slave table according to the main key id, and the index information of the required information is merged into one line and then the whole information is stored in a search engine.
6. The method as claimed in claim 4 or 5, wherein the list processing platform in step 3) comprises: configuring the condition needing matching the list, generating a query task in the list configuration platform, acquiring the configuration condition to perform incremental query, and importing the incremental query result into a K-V list library to perform list storage.
7. The method for establishing the financial information synchronous classification system according to any one of claims 4, 5 and 6, wherein the step 4) of displaying the information according to the query requirement is performed in the following 3 modes:
acquiring an information list: firstly, acquiring main key information through a K-V list library, inquiring required related data attributes according to set rules of an information list, synthesizing the acquired main key information and required related data attribute fields into a summary main key, acquiring information summary information from the K-V summary library, synthesizing inquiry results after all required fields are obtained, and returning the results to a front end for display;
information search: firstly, returning the information of the primary key list to a search engine through a query condition, and acquiring the information of the primary key list, and then acquiring the subsequent summary field in the same process of acquiring the summary of the information list.
Acquiring text or attachment operation: and when the text and the attachment related to the information are acquired, the text and the attachment are directly acquired from the big data storage engine according to the main key and returned to the front end for display.
CN202111219358.XA 2021-10-20 2021-10-20 Financial information synchronous classification system and establishment method Pending CN114020833A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111219358.XA CN114020833A (en) 2021-10-20 2021-10-20 Financial information synchronous classification system and establishment method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111219358.XA CN114020833A (en) 2021-10-20 2021-10-20 Financial information synchronous classification system and establishment method

Publications (1)

Publication Number Publication Date
CN114020833A true CN114020833A (en) 2022-02-08

Family

ID=80056765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111219358.XA Pending CN114020833A (en) 2021-10-20 2021-10-20 Financial information synchronous classification system and establishment method

Country Status (1)

Country Link
CN (1) CN114020833A (en)

Similar Documents

Publication Publication Date Title
US9165085B2 (en) System and method for publishing aggregated content on mobile devices
CN103279513B (en) The method of generation content tab is, provide the method and device of multimedia content information
JP5314504B2 (en) SEARCH DEVICE, SEARCH PROGRAM, AND SEARCH METHOD
US20130124548A1 (en) System and Method for Presenting A Plurality of Email Threads for Review
CN103678494A (en) Method and device for client side and server side data synchronization
US20150106335A1 (en) Hierarchical data archiving
CN112307037A (en) Data synchronization method and device
JP2004102803A (en) Bulletin board system and method for displaying information
CN114254016A (en) Data synchronization method, device and equipment based on elastic search and storage medium
CN110162522A (en) A kind of distributed data search system and method
CN110245134B (en) Increment synchronization method applied to search service
CN111143382A (en) Data processing method, system and computer readable storage medium
CN110807038A (en) CMDB information full-text retrieval method based on elastic search
CN106161193B (en) Mail processing method, device and system
CN109739854A (en) A kind of date storage method and device
US11106739B2 (en) Document structures for searching within and across messages
CN114020833A (en) Financial information synchronous classification system and establishment method
CN112860680B (en) Data processing method and system, and data query method and system
CN107291938A (en) Order Query System and method
CN107992568B (en) Searching method, device and system
JP3588507B2 (en) Information filtering device
CN114911872A (en) Intranet and extranet data synchronization method, device and system, extranet server and storage medium
Bordino et al. Advancing NLP via a distributed-messaging approach
CN114691700A (en) Kafaka cluster-based intelligent park retrieval method
KR100426995B1 (en) Method and system for indexing document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination