CN111737268B

CN111737268B - Data processing method based on document database

Info

Publication number: CN111737268B
Application number: CN202010822510.2A
Authority: CN
Inventors: 谢智; 谢乾; 王吉; 龚彬; 周国栋; 邓锌强; 吴大超
Original assignee: Jiangsu Zhuoyi Information Technology Co ltd; Kunshan Byosoft Electronic Technology Co ltd; Shanghai Baizhiao Information Technology Co ltd; Nanjing Byosoft Co ltd
Current assignee: JIANGSU ZHUOYI INFORMATION TECHNOLOGY Co.,Ltd.; KUNSHAN BYOSOFT ELECTRONIC TECHNOLOGY Co.,Ltd.; NANJING BYOSOFT Co.,Ltd.; SHANGHAI BAIZHIAO INFORMATION TECHNOLOGY Co.,Ltd.
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2021-01-01
Anticipated expiration: 2040-08-17
Also published as: CN111737268A

Abstract

The invention discloses a data processing method based on a document database, which belongs to the field of data analysis and processing and comprises the following steps: monitoring data change of a document set in a database, collecting incremental data as an input source of a data stream, and generating new data as an output source of the data stream through an automatic processing flow formed by processing nodes; the output source of one data flow is directly bound with other databases and stored in a specified database table, or used as the input source of a new data flow to be processed again. The invention combines various data streams, and can meet the data processing requirements of complex services. Compared with the method in the prior art, the method has wider application range, can be applied to various document databases, and has no special requirement on the operating environment.

Description

Data processing method based on document database

Technical Field

The invention belongs to the technical field of data analysis and processing, and particularly relates to a data processing method based on a document database.

Background

Currently, mainstream data is divided into a relational database and a document database, and the two databases have different side emphasis points. The relational database table has stable structure, emphasizes on maintaining the relationship between the tables and is good for analysis and statistics; the document database table structure is flexible and is suitable for dynamic storage scenes. In many cases, the two databases need to be combined, data in the document database needs to be converted into a relational database, data synchronization between the two sides needs to be maintained, and a business logic process needs to be added in the data conversion process. In the existing technology, only mongodb company realizes the batch based on own cloud service, has some basic data analysis and conversion capability, but the function depends on the cloud environment, is limited to a database of mongodb, and is not strong in applicability.

Disclosure of Invention

The technical problems solved by the invention are as follows: a data processing method based on a document database is provided, which stores result data into a relational database through a series of automatic processing flows by monitoring data changes of the document database.

The technical scheme is as follows: in order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a data processing method based on a document database monitors data change of a document set in the database, collects incremental data as an input source of a data stream, and generates new data as an output source of the data stream through an automatic processing flow formed by processing nodes; and directly binding the output source of the data stream with the database, storing the output source into a specified database table, or using the output source as an input source of a new data stream for processing again. The method mainly comprises the following steps:

s1: monitoring a document set of a document database, collecting changed data, and marking the data according to a change mode;

s2: using the data collected in step S1 as an input source of a data stream;

s3: setting an automated processing flow for the data stream created in step S2, the automated processing flow being a chained flow composed of stream processing nodes and metadata processing nodes;

s4: initializing, by the unified data stream processing center, according to the processing flow set in step S3, executing the logic corresponding to the processing node on the stream and the metadata, and generating an output source;

s5: the output source generated in step S4 is bound to the database, and the processing result is stored; or as an input source for other data streams.

Preferably, in step S1, a listener is used to establish a connection with a document database, a document set is accessed through a timed round, whether a change occurs within the round interval is determined by the creation time, modification time, and deletion time of the document, and change data is collected.

Preferably, in step S2, the input source of the data stream stores the modified data obtained by monitoring in the form of a queue, and the data in the storage queue of the input source adds an identification bit to the data according to the type of modification to distinguish whether the data is added, modified or deleted.

Preferably, in step S3, the stream processing nodes include a stream merging node, a stream associating node, and a stream filtering node.

Preferably, the stream merging processing node designates a data stream of a new document set, and after mapping one to one with fields of metadata in an original data stream, unifies data formats of the two data streams to merge into a new data stream, and a data volume of the merged data stream is a sum of the two data streams.

Preferably, the stream association processing node designates a new document set, establishes a one-to-one relationship of metadata by associating with one or more fields in the original data stream, and then uses the fields in the new document set as an extension of the original data stream, so that the original data stream carries more fields.

Preferably, the flow filtering node filters out data which do not meet the judgment by judging one or more fields of data in the flow, and reduces the data volume.

Preferably, the metadata processing node forms a logic operation therein through a programming language, and the logic operation can be a series of mathematical calculations and a general function. The compiled data is initialized at the data stream processing center, and the logical operations of the processing nodes are performed on the metadata of the received data stream, thereby changing the metadata.

Has the advantages that: compared with the prior art, the invention has the following advantages:

the data processing method based on the document database combines various data streams, and can meet the data processing requirements of complex services. Compared with the method in the prior art, the method has wider application range, can be applied to various document databases, and has no special requirement on the operating environment.

Drawings

FIG. 1 is a flowchart illustrating the implementation of a document database-based data processing method according to the present invention;

FIG. 2 is a listener diagram illustrating a method for data processing based on a document database in accordance with the present invention;

FIG. 3 is a flow merge node schematic of a document database-based data processing method of the present invention;

FIG. 4 is a schematic diagram of a flow-related node of the document database-based data processing method of the present invention;

FIG. 5 is a schematic diagram of a metadata processing node of the document database-based data processing method of the present invention;

FIG. 6 is a schematic diagram of the output source of the document database based data processing method of the present invention.

Detailed Description

The present invention will be further illustrated by the following specific examples, which are carried out on the premise of the technical scheme of the present invention, and it should be understood that these examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.

As shown in fig. 1, which is an execution flow chart of the present invention, a data processing method based on a document database monitors data changes of a document set in the database, collects incremental data as an input source of a data stream, and generates new data as an output source of the data stream through an automated processing flow formed by processing nodes; and directly binding the output source of the data stream with the database, storing the output source into a specified database table, or using the output source as an input source of a new data stream for processing again. The method comprises the following steps:

s1: starting a monitor, monitoring a document set in a document database, and collecting data and marking when the data changes;

the change mode of the data comprises adding, modifying and deleting, the monitoring of the document set is based on recording the adding, modifying and deleting time of each piece of document data, and the monitor acquires the data which changes within the interval time in a round training mode. The listener shown in fig. 2 is an application that establishes a connection with a document database, creates a timing task for each document set to perform round-robin access, and records the time of each access; judging whether the change occurs in the training interval time or not according to the creation time, the modification time and the deletion time of the document; and marking the changed data (adding-A, modifying-M and deleting-D), storing the data into a queue after collection, and then sending the data stream to a data stream processing center as an input source of the data stream.

S2: storing the data collected in step S1 as a queue as an input source of a data stream; the data flow consists of an input source, an automatic processing flow and an output source;

the input source of the data stream stores the changed data obtained by monitoring in a queue mode, and according to the changed type of the data in the storage queue of the input source, an identification bit is added to the data to distinguish whether the data is added, modified or deleted.

S3: setting an automated processing flow for the data stream created in step S2, the processing flow being a chained flow composed of two types of nodes, one being a stream processing node and the other being a metadata processing node;

the input source comprises a data queue, and in the processing flow, data executes the logic of each node in the form of a queue, and a new input source is generated through one node.

The stream processing node, the main operation stream itself, is composed of three types of nodes, which are: the system comprises a flow merging node, a flow association node and a flow filtering node;

the stream merging processing node shown in fig. 3 is a node that specifies a data stream of a new document set, and after mapping the data stream with fields of metadata in an original data stream one by one, unifies data formats of the two data streams to merge into a new data stream, and the data volume of the merged data stream is the sum of the two data streams.

The stream association processing node shown in fig. 4 designates a new document set, establishes a one-to-one relationship of metadata by associating with one or more fields in the original data stream, and then uses the fields in the new document set as an extension of the original data stream, so that the original data stream carries more fields.

The flow filtering node filters data which do not meet judgment by judging one or more fields of data in the flow, and reduces the data volume.

As shown in fig. 5, the metadata processing node, which forms internal logic operations through a programming language, the logic operations are composed of a series of mathematical calculations and general functions, and the data stream processing center initializes the compiled data and executes the logic operations of the processing node on the received metadata of the data stream, thereby changing the metadata.

S4: in the unified data stream processing center, according to the processing flow set in step S3, sequentially executing processing nodes, operating on the data, and finally generating an output source;

the data stream processing center is mainly used for receiving the data stream generated by the listener and initializing the processing flow set by the data stream. Starting a timer to acquire queues in a data stream input source at intervals, and sequentially executing logic of each node in a processing stream until result data are finally generated to form an output source.

S5: the output source generated in step S4 is bound to another database, and the processing result is stored; instead of generating the input source in step 2), other processing flows may be performed.

As shown in fig. 6, the output source temporarily stores the result generated by the processing flow in its own queue, and may select an output mode, and the output source may select to directly bind with the relational database, store the generated data in the corresponding relational database table, or convert the generated data into an input source of a new data stream, and continue to execute other processing flows.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A data processing method based on a document database, characterized by: monitoring data change of a document set in a database, collecting incremental data as an input source of a data stream, generating new data as an output source of the data stream through an automatic processing flow formed by processing nodes, wherein the automatic processing flow is a chain flow formed by stream processing nodes and metadata processing nodes, initializing according to the automatic processing flow, executing logic corresponding to the processing nodes on the stream and the metadata, and generating the output source, the stream processing nodes comprise stream merging nodes, stream association nodes and stream filtering nodes, the stream merging nodes are data streams of a specified new document set, and after mapping with all fields of the metadata in the original data streams one by one, unifying the data formats of the two data streams to form a new data stream, the data volume of the merged data stream is the sum of the two data streams, and the stream association nodes are specified a new document set, establishing a one-to-one relation of metadata by associating with one or more fields in the original data stream, and then taking the fields in the new document set as the expansion of the original data stream to enable the original data stream to carry more fields; and directly binding the output source of the data stream with the database, storing the output source into a specified database table, or using the output source as an input source of a new data stream for processing again.

2. A document database-based data processing method according to claim 1, comprising the steps of:

s2: using the data collected in step S1 as an input source of a data stream;

s3: an automated process flow is set for the data stream created in step S2,

s4: initializing, by the unified data stream processing center, according to the processing flow set in step S3, executing the logic corresponding to the processing node on the stream and the metadata, and generating an output source; s5: the output source generated in step S4 is bound to the database, and the processing result is stored; or as an input source for other data streams.

3. A document database-based data processing method according to claim 2, characterized in that: in step S1, a listener is used to establish a connection with a document database, a set of documents is accessed through a timed round, whether a change occurs within a round interval is determined by the creation time, modification time, and deletion time of the documents, and change data is collected.

4. A document database-based data processing method according to claim 2, characterized in that: in step S2, the input source of the data stream stores the modified data obtained by monitoring in the form of a queue, and the data in the storage queue of the input source is added with an identification bit according to the modified type of the data to distinguish whether the data is added, modified or deleted.

5. A document database-based data processing method according to claim 1, characterized in that: the flow filtering node filters data which do not meet judgment by judging one or more fields of data in the flow, and reduces the data volume.

6. A document database-based data processing method according to claim 1, characterized in that: the metadata processing node forms internal logic operation through a programming language, the logic operation consists of a series of mathematical calculations and general functions, and the logic operation of the processing node is initialized after compiling and executed on the metadata of the received data stream in the data stream processing center, so that the metadata is changed.