CN111737268B - Data processing method based on document database - Google Patents

Data processing method based on document database Download PDF

Info

Publication number
CN111737268B
CN111737268B CN202010822510.2A CN202010822510A CN111737268B CN 111737268 B CN111737268 B CN 111737268B CN 202010822510 A CN202010822510 A CN 202010822510A CN 111737268 B CN111737268 B CN 111737268B
Authority
CN
China
Prior art keywords
data
stream
processing
database
data stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010822510.2A
Other languages
Chinese (zh)
Other versions
CN111737268A (en
Inventor
谢智
谢乾
王吉
龚彬
周国栋
邓锌强
吴大超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU ZHUOYI INFORMATION TECHNOLOGY Co.,Ltd.
KUNSHAN BYOSOFT ELECTRONIC TECHNOLOGY Co.,Ltd.
NANJING BYOSOFT Co.,Ltd.
SHANGHAI BAIZHIAO INFORMATION TECHNOLOGY Co.,Ltd.
Original Assignee
Jiangsu Zhuoyi Information Technology Co ltd
Kunshan Byosoft Electronic Technology Co ltd
Shanghai Baizhiao Information Technology Co ltd
Nanjing Byosoft Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Zhuoyi Information Technology Co ltd, Kunshan Byosoft Electronic Technology Co ltd, Shanghai Baizhiao Information Technology Co ltd, Nanjing Byosoft Co ltd filed Critical Jiangsu Zhuoyi Information Technology Co ltd
Priority to CN202010822510.2A priority Critical patent/CN111737268B/en
Publication of CN111737268A publication Critical patent/CN111737268A/en
Application granted granted Critical
Publication of CN111737268B publication Critical patent/CN111737268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

The invention discloses a data processing method based on a document database, which belongs to the field of data analysis and processing and comprises the following steps: monitoring data change of a document set in a database, collecting incremental data as an input source of a data stream, and generating new data as an output source of the data stream through an automatic processing flow formed by processing nodes; the output source of one data flow is directly bound with other databases and stored in a specified database table, or used as the input source of a new data flow to be processed again. The invention combines various data streams, and can meet the data processing requirements of complex services. Compared with the method in the prior art, the method has wider application range, can be applied to various document databases, and has no special requirement on the operating environment.

Description

Data processing method based on document database
Technical Field
The invention belongs to the technical field of data analysis and processing, and particularly relates to a data processing method based on a document database.
Background
Currently, mainstream data is divided into a relational database and a document database, and the two databases have different side emphasis points. The relational database table has stable structure, emphasizes on maintaining the relationship between the tables and is good for analysis and statistics; the document database table structure is flexible and is suitable for dynamic storage scenes. In many cases, the two databases need to be combined, data in the document database needs to be converted into a relational database, data synchronization between the two sides needs to be maintained, and a business logic process needs to be added in the data conversion process. In the existing technology, only mongodb company realizes the batch based on own cloud service, has some basic data analysis and conversion capability, but the function depends on the cloud environment, is limited to a database of mongodb, and is not strong in applicability.
Disclosure of Invention
The technical problems solved by the invention are as follows: a data processing method based on a document database is provided, which stores result data into a relational database through a series of automatic processing flows by monitoring data changes of the document database.
The technical scheme is as follows: in order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a data processing method based on a document database monitors data change of a document set in the database, collects incremental data as an input source of a data stream, and generates new data as an output source of the data stream through an automatic processing flow formed by processing nodes; and directly binding the output source of the data stream with the database, storing the output source into a specified database table, or using the output source as an input source of a new data stream for processing again. The method mainly comprises the following steps:
s1: monitoring a document set of a document database, collecting changed data, and marking the data according to a change mode;
s2: using the data collected in step S1 as an input source of a data stream;
s3: setting an automated processing flow for the data stream created in step S2, the automated processing flow being a chained flow composed of stream processing nodes and metadata processing nodes;
s4: initializing, by the unified data stream processing center, according to the processing flow set in step S3, executing the logic corresponding to the processing node on the stream and the metadata, and generating an output source;
s5: the output source generated in step S4 is bound to the database, and the processing result is stored; or as an input source for other data streams.
Preferably, in step S1, a listener is used to establish a connection with a document database, a document set is accessed through a timed round, whether a change occurs within the round interval is determined by the creation time, modification time, and deletion time of the document, and change data is collected.
Preferably, in step S2, the input source of the data stream stores the modified data obtained by monitoring in the form of a queue, and the data in the storage queue of the input source adds an identification bit to the data according to the type of modification to distinguish whether the data is added, modified or deleted.
Preferably, in step S3, the stream processing nodes include a stream merging node, a stream associating node, and a stream filtering node.
Preferably, the stream merging processing node designates a data stream of a new document set, and after mapping one to one with fields of metadata in an original data stream, unifies data formats of the two data streams to merge into a new data stream, and a data volume of the merged data stream is a sum of the two data streams.
Preferably, the stream association processing node designates a new document set, establishes a one-to-one relationship of metadata by associating with one or more fields in the original data stream, and then uses the fields in the new document set as an extension of the original data stream, so that the original data stream carries more fields.
Preferably, the flow filtering node filters out data which do not meet the judgment by judging one or more fields of data in the flow, and reduces the data volume.
Preferably, the metadata processing node forms a logic operation therein through a programming language, and the logic operation can be a series of mathematical calculations and a general function. The compiled data is initialized at the data stream processing center, and the logical operations of the processing nodes are performed on the metadata of the received data stream, thereby changing the metadata.
Has the advantages that: compared with the prior art, the invention has the following advantages:
the data processing method based on the document database combines various data streams, and can meet the data processing requirements of complex services. Compared with the method in the prior art, the method has wider application range, can be applied to various document databases, and has no special requirement on the operating environment.
Drawings
FIG. 1 is a flowchart illustrating the implementation of a document database-based data processing method according to the present invention;
FIG. 2 is a listener diagram illustrating a method for data processing based on a document database in accordance with the present invention;
FIG. 3 is a flow merge node schematic of a document database-based data processing method of the present invention;
FIG. 4 is a schematic diagram of a flow-related node of the document database-based data processing method of the present invention;
FIG. 5 is a schematic diagram of a metadata processing node of the document database-based data processing method of the present invention;
FIG. 6 is a schematic diagram of the output source of the document database based data processing method of the present invention.
Detailed Description
The present invention will be further illustrated by the following specific examples, which are carried out on the premise of the technical scheme of the present invention, and it should be understood that these examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.
As shown in fig. 1, which is an execution flow chart of the present invention, a data processing method based on a document database monitors data changes of a document set in the database, collects incremental data as an input source of a data stream, and generates new data as an output source of the data stream through an automated processing flow formed by processing nodes; and directly binding the output source of the data stream with the database, storing the output source into a specified database table, or using the output source as an input source of a new data stream for processing again. The method comprises the following steps:
s1: starting a monitor, monitoring a document set in a document database, and collecting data and marking when the data changes;
the change mode of the data comprises adding, modifying and deleting, the monitoring of the document set is based on recording the adding, modifying and deleting time of each piece of document data, and the monitor acquires the data which changes within the interval time in a round training mode. The listener shown in fig. 2 is an application that establishes a connection with a document database, creates a timing task for each document set to perform round-robin access, and records the time of each access; judging whether the change occurs in the training interval time or not according to the creation time, the modification time and the deletion time of the document; and marking the changed data (adding-A, modifying-M and deleting-D), storing the data into a queue after collection, and then sending the data stream to a data stream processing center as an input source of the data stream.
S2: storing the data collected in step S1 as a queue as an input source of a data stream; the data flow consists of an input source, an automatic processing flow and an output source;
the input source of the data stream stores the changed data obtained by monitoring in a queue mode, and according to the changed type of the data in the storage queue of the input source, an identification bit is added to the data to distinguish whether the data is added, modified or deleted.
S3: setting an automated processing flow for the data stream created in step S2, the processing flow being a chained flow composed of two types of nodes, one being a stream processing node and the other being a metadata processing node;
the input source comprises a data queue, and in the processing flow, data executes the logic of each node in the form of a queue, and a new input source is generated through one node.
The stream processing node, the main operation stream itself, is composed of three types of nodes, which are: the system comprises a flow merging node, a flow association node and a flow filtering node;
the stream merging processing node shown in fig. 3 is a node that specifies a data stream of a new document set, and after mapping the data stream with fields of metadata in an original data stream one by one, unifies data formats of the two data streams to merge into a new data stream, and the data volume of the merged data stream is the sum of the two data streams.
The stream association processing node shown in fig. 4 designates a new document set, establishes a one-to-one relationship of metadata by associating with one or more fields in the original data stream, and then uses the fields in the new document set as an extension of the original data stream, so that the original data stream carries more fields.
The flow filtering node filters data which do not meet judgment by judging one or more fields of data in the flow, and reduces the data volume.
As shown in fig. 5, the metadata processing node, which forms internal logic operations through a programming language, the logic operations are composed of a series of mathematical calculations and general functions, and the data stream processing center initializes the compiled data and executes the logic operations of the processing node on the received metadata of the data stream, thereby changing the metadata.
S4: in the unified data stream processing center, according to the processing flow set in step S3, sequentially executing processing nodes, operating on the data, and finally generating an output source;
the data stream processing center is mainly used for receiving the data stream generated by the listener and initializing the processing flow set by the data stream. Starting a timer to acquire queues in a data stream input source at intervals, and sequentially executing logic of each node in a processing stream until result data are finally generated to form an output source.
S5: the output source generated in step S4 is bound to another database, and the processing result is stored; instead of generating the input source in step 2), other processing flows may be performed.
As shown in fig. 6, the output source temporarily stores the result generated by the processing flow in its own queue, and may select an output mode, and the output source may select to directly bind with the relational database, store the generated data in the corresponding relational database table, or convert the generated data into an input source of a new data stream, and continue to execute other processing flows.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (6)

1. A data processing method based on a document database, characterized by: monitoring data change of a document set in a database, collecting incremental data as an input source of a data stream, generating new data as an output source of the data stream through an automatic processing flow formed by processing nodes, wherein the automatic processing flow is a chain flow formed by stream processing nodes and metadata processing nodes, initializing according to the automatic processing flow, executing logic corresponding to the processing nodes on the stream and the metadata, and generating the output source, the stream processing nodes comprise stream merging nodes, stream association nodes and stream filtering nodes, the stream merging nodes are data streams of a specified new document set, and after mapping with all fields of the metadata in the original data streams one by one, unifying the data formats of the two data streams to form a new data stream, the data volume of the merged data stream is the sum of the two data streams, and the stream association nodes are specified a new document set, establishing a one-to-one relation of metadata by associating with one or more fields in the original data stream, and then taking the fields in the new document set as the expansion of the original data stream to enable the original data stream to carry more fields; and directly binding the output source of the data stream with the database, storing the output source into a specified database table, or using the output source as an input source of a new data stream for processing again.
2. A document database-based data processing method according to claim 1, comprising the steps of:
s1: monitoring a document set of a document database, collecting changed data, and marking the data according to a change mode;
s2: using the data collected in step S1 as an input source of a data stream;
s3: an automated process flow is set for the data stream created in step S2,
s4: initializing, by the unified data stream processing center, according to the processing flow set in step S3, executing the logic corresponding to the processing node on the stream and the metadata, and generating an output source; s5: the output source generated in step S4 is bound to the database, and the processing result is stored; or as an input source for other data streams.
3. A document database-based data processing method according to claim 2, characterized in that: in step S1, a listener is used to establish a connection with a document database, a set of documents is accessed through a timed round, whether a change occurs within a round interval is determined by the creation time, modification time, and deletion time of the documents, and change data is collected.
4. A document database-based data processing method according to claim 2, characterized in that: in step S2, the input source of the data stream stores the modified data obtained by monitoring in the form of a queue, and the data in the storage queue of the input source is added with an identification bit according to the modified type of the data to distinguish whether the data is added, modified or deleted.
5. A document database-based data processing method according to claim 1, characterized in that: the flow filtering node filters data which do not meet judgment by judging one or more fields of data in the flow, and reduces the data volume.
6. A document database-based data processing method according to claim 1, characterized in that: the metadata processing node forms internal logic operation through a programming language, the logic operation consists of a series of mathematical calculations and general functions, and the logic operation of the processing node is initialized after compiling and executed on the metadata of the received data stream in the data stream processing center, so that the metadata is changed.
CN202010822510.2A 2020-08-17 2020-08-17 Data processing method based on document database Active CN111737268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010822510.2A CN111737268B (en) 2020-08-17 2020-08-17 Data processing method based on document database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010822510.2A CN111737268B (en) 2020-08-17 2020-08-17 Data processing method based on document database

Publications (2)

Publication Number Publication Date
CN111737268A CN111737268A (en) 2020-10-02
CN111737268B true CN111737268B (en) 2021-01-01

Family

ID=72658491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010822510.2A Active CN111737268B (en) 2020-08-17 2020-08-17 Data processing method based on document database

Country Status (1)

Country Link
CN (1) CN111737268B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955517B (en) * 2014-05-05 2017-05-03 中国工商银行股份有限公司 Method and system for converting data in documental database to relational database
EP3411807A1 (en) * 2016-02-02 2018-12-12 Activewrite, Inc. Document collaboration and consolidation tools and methods of use
CN109669965A (en) * 2018-11-13 2019-04-23 广州欧赛斯信息科技有限公司 A kind of acquisition analysis system that supporting unstructured data and method

Also Published As

Publication number Publication date
CN111737268A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
US11314808B2 (en) Hybrid flows containing a continous flow
US20030009443A1 (en) Generic data aggregation
JP2006244493A (en) File system expressed in data base
CN110837585A (en) Multi-source heterogeneous data association query method and system
US20130031143A1 (en) Large scale real-time multistaged analytic system using data contracts
CN107301214A (en) Data migration method, device and terminal device in HIVE
CN110019308A (en) Data query method, apparatus, equipment and storage medium
US10642530B2 (en) Global occupancy aggregator for global garbage collection scheduling
CA2319918A1 (en) High performance relational database management system
CN109542985B (en) Universal stream data analysis model and construction method thereof
CN112988916A (en) Full and incremental synchronization method, device and storage medium for Clickhouse
CN102207935A (en) Method and system for establishing index
CN103810197A (en) Hadoop-based data processing method and system
CN115269694A (en) Unified real-time data acquisition device and method
CN109635022B (en) Visual elastic search data acquisition method and device
CN111737268B (en) Data processing method based on document database
CN107807977B (en) Object attribute metadata extraction system based on configuration
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
CN117349368A (en) Cross-database data real-time synchronous task management system and method based on Flink
CN116431635A (en) Lake and warehouse integrated-based power distribution Internet of things data real-time processing system and method
CN115033646A (en) Method for constructing real-time warehouse system based on Flink and Doris
CN100486177C (en) Method of synchronously operating network element by network management and its system
CN110427399A (en) Real-time data acquisition method, system, device and storage medium
CN112507213B (en) Method for recommending optimized system scheme based on behavior big data analysis
CN108121807A (en) The implementation method of multi-dimensional index structures OBF-Index under Hadoop environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 "change of name, title or address"
CP03 "change of name, title or address"

Address after: No. 298, Xingye Road, Yixing new street, Wuxi City, Jiangsu Province, 214205

Patentee after: JIANGSU ZHUOYI INFORMATION TECHNOLOGY Co.,Ltd.

Patentee after: NANJING BYOSOFT Co.,Ltd.

Patentee after: KUNSHAN BYOSOFT ELECTRONIC TECHNOLOGY Co.,Ltd.

Patentee after: SHANGHAI BAIZHIAO INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 210061 11 / F, block a, Chuangzhi building, 17 Xinghuo Road, Jiangbei new district, Nanjing City, Jiangsu Province

Patentee before: NANJING BYOSOFT Co.,Ltd.

Patentee before: JIANGSU ZHUOYI INFORMATION TECHNOLOGY Co.,Ltd.

Patentee before: KUNSHAN BYOSOFT ELECTRONIC TECHNOLOGY Co.,Ltd.

Patentee before: SHANGHAI BAIZHIAO INFORMATION TECHNOLOGY Co.,Ltd.