CN110851473A - Data processing method, device and system - Google Patents

Data processing method, device and system Download PDF

Info

Publication number
CN110851473A
CN110851473A CN201810825305.4A CN201810825305A CN110851473A CN 110851473 A CN110851473 A CN 110851473A CN 201810825305 A CN201810825305 A CN 201810825305A CN 110851473 A CN110851473 A CN 110851473A
Authority
CN
China
Prior art keywords
data
deployment
data source
module
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810825305.4A
Other languages
Chinese (zh)
Inventor
王爱东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201810825305.4A priority Critical patent/CN110851473A/en
Publication of CN110851473A publication Critical patent/CN110851473A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data processing method, a device and a system, wherein the data processing method comprises the following steps: the monitoring acquisition module acquires data from the data source according to the pre-configured resource configuration information of the data source and pushes the acquired data to the streaming big data calculation module; the business module defines a deployment and control task; and the streaming big data calculation module processes the data according to the deployment and control task and a data structure of metadata of a pre-configured data source to obtain a deployment and control result, and pushes the deployment and control result to a specified channel of a memory database. The monitoring acquisition module of the embodiment of the invention realizes the quick butt joint with the data source based on the resource configuration information of the data source, and realizes the high-efficiency processing and analysis of the data based on the deployment and control task and the streaming big data calculation module.

Description

Data processing method, device and system
Technical Field
The present invention relates to, but not limited to, big data computing and analyzing technologies, and in particular, to a data processing method, apparatus, and system.
Background
The rapid development of emerging information technologies and application modes such as cloud computing, internet of things, mobile interconnection, social media and the like promotes the rapid increase of global data volume, promotes the human society to enter a big data era, and has higher and higher requirements on data information analysis along with the continuous development of services.
At present, the computing mode of big data can be divided into two forms of batch computing and streaming computing.
The research on the large data batch computing related technology is relatively mature, an efficient and stable batch computing system represented by a mapping reduction (MapReduce) programming model of Google (Google) and an open-source Hadoop computing system is formed, and remarkable results are obtained in theory and practice.
Early research on streaming computing often focused on streaming data computing in a database environment, with small data size and relatively single data object. As the streaming big data in the new period has the characteristics of real-time property, volatility, burst property, disorder property, infinity property and the like, a plurality of new higher requirements are put forward to the system. For example, due to the rapid development of the internet of things and internet technology, the public security field generates massive streaming data, which is characterized by volatility, real-time performance and the like. The volatility means that the data is not long in storage time and can be cleared regularly; real-time refers to the fact that data becomes less and less valuable over time. Conventional computing architectures have difficulty supporting the need for fast docking, fast acquisition, efficient processing, and analysis of this type of data.
Disclosure of Invention
The embodiment of the invention provides a data processing method, a data processing device and a data processing system, which can be used for rapidly accessing a data source and efficiently processing and analyzing data.
The embodiment of the invention provides a data processing method, which comprises the following steps:
the monitoring acquisition module acquires data from the data source according to the pre-configured resource configuration information of the data source and pushes the acquired data to the streaming big data calculation module;
the business module defines a deployment and control task;
and the streaming big data calculation module processes the data according to the deployment and control task and a data structure of metadata of a pre-configured data source to obtain a deployment and control result, and pushes the deployment and control result to a specified channel of a memory database.
In another embodiment of the present invention, before the collecting data from the data source, the method further comprises:
the service module configures the resource configuration information of the data source; wherein the resource configuration information includes: a resource interface and a resource path;
before the streaming big data calculation module processes data according to the deployment and control task and the data structure of the metadata of the pre-configured data source to obtain the deployment and control result, the method further comprises the following steps:
the business module configures a data structure of metadata of the data source.
In another embodiment of the present invention, the method further comprises:
and the service module subscribes the appointed channel of the memory database and receives a subscription message pushed by the appointed channel to acquire the data of the appointed channel of the memory database.
In an embodiment of the present invention, the deployment and control task includes: the method comprises the following steps of deploying and controlling task basic information, deploying and controlling object information and deploying and controlling dimensions, wherein the deploying and controlling dimensions comprise deploying and controlling algorithms;
the streaming big data calculation module processes data according to the deployment and control task and the data structure of the metadata of the pre-configured data source to obtain a deployment and control result, and the method comprises the following steps:
and the streaming big data calculation module analyzes the data according to the data structure of the metadata, matches the analyzed data by adopting the control algorithm according to the basic information of the control task, and outputs the successfully matched data as the control result.
In the embodiment of the present invention, the deployment and control algorithm includes at least one of the following: the general algorithm and the extended algorithm are provided by the algorithm library;
wherein the expansion algorithm comprises at least one of: code class self-defining algorithm, function dependence class algorithm and regular rule class algorithm.
In another embodiment of the present invention, before the streaming big data calculation module processes data according to a deployment and control task to obtain a deployment and control result, the method further includes:
and the streaming big data calculation module shunts the deployment and control task according to the data source.
The embodiment of the invention provides a data processing device, which comprises at least one of the following modules:
the monitoring acquisition module is used for acquiring data from the data source according to the pre-configured resource configuration information of the data source and pushing the acquired data to the streaming big data calculation module;
the service module is used for defining a deployment and control task;
the streaming big data calculation module is used for caching the data; processing the data according to the deployment and control task and a data structure of metadata of a pre-configured data source to obtain a deployment and control result, and pushing the deployment and control result to a specified channel of a memory database;
and the storage module is used for storing the deployment and control result to a specified channel of the memory database.
An embodiment of the present invention provides a data processing apparatus, including a processor and a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed by the processor, at least one step of any one of the data processing methods is implemented.
An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements at least one step of any one of the data processing methods described above.
An embodiment of the present invention provides a data processing system, including:
the monitoring acquisition module is used for acquiring data from the data source according to the pre-configured resource configuration information of the data source and pushing the acquired data to the streaming big data calculation module;
the service module is used for defining a deployment and control task;
the streaming big data calculation module is used for caching the data; processing the data according to the deployment and control task and a data structure of metadata of a pre-configured data source to obtain a deployment and control result, and pushing the deployment and control result to a specified channel of a memory database;
and the storage module is used for storing the deployment and control result to a specified channel of the memory database.
The embodiment of the invention comprises the following steps: the monitoring acquisition module acquires data from the data source according to the pre-configured resource configuration information of the data source and pushes the acquired data to the streaming big data calculation module; the business module defines a deployment and control task; and the streaming big data calculation module processes the data according to the deployment and control task and a data structure of metadata of a pre-configured data source to obtain a deployment and control result, and pushes the deployment and control result to a specified channel of a memory database. The monitoring acquisition module of the embodiment of the invention realizes the quick butt joint with the data source based on the resource configuration information of the data source, and realizes the high-efficiency processing and analysis of the data based on the deployment and control task and the streaming big data calculation module.
Additional features and advantages of embodiments of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of embodiments of the invention. The objectives and other advantages of the embodiments of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the examples of the invention serve to explain the principles of the embodiments of the invention and not to limit the embodiments of the invention.
FIG. 1 is a flow chart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a data processing system according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a detailed structure of a data processing system according to an embodiment of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments of the present invention may be arbitrarily combined with each other without conflict.
The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
Referring to fig. 1, an embodiment of the present invention provides a data processing method, including:
step 100, the monitoring acquisition module acquires data from the data source according to the pre-configured resource configuration information of the data source, and pushes the acquired data to the streaming big data calculation module.
In another embodiment of the present invention, before collecting data from the data source, the method further comprises:
the service module configures the resource configuration information of the data source; wherein the resource configuration information includes: resource interfaces and resource paths.
For example, the resource interface of the data source includes, but is not limited to, at least one of:
databases (DB, DataBase), File Transfer Protocols (FTP), Hadoop Distributed File Systems (HDFS).
Table 1 is an example of resource configuration information of a data source according to an embodiment of the present invention. As shown in table 1, the monitoring and collecting module can rapidly access a data source according to a resource interface, and can rapidly locate data to be collected based on a resource path.
TABLE 1
In the embodiment of the present invention, the monitoring acquisition module acquires data from data sources through monitoring acquisition processes, where each data source corresponds to one monitoring acquisition process, as shown in fig. 3.
The monitoring acquisition module provided by the embodiment of the invention has high reliability (transactional data transmission is carried out to ensure data reliability) and high availability.
Wherein, high reliability means:
(1) using an independent transaction to transfer data, the original data will be identified as completed only if the data is successfully committed;
(2) distributed deployment, when one node fails, data can be transmitted to other nodes without loss;
(3) provide multiple data reliability options (from strong to weak): using disk data and receiving end acknowledge character (Ack) mode; when the receiving end is unavailable, writing the data to the local, and continuing to send the data after recovery; data is sent to a receiving end without any Quality of Service (Qos) guarantee;
high availability means that:
(1) the configuration is simple, and only a data access mode, a pipeline transmission mode and a data output mode (in the example, Kafka is fixed) need to be configured;
(2) various types of data source access and exit are supported;
(3) the bottom layer architecture is uniformly managed by the top layer architecture, so that system monitoring and maintenance are facilitated; distributed deployment, ZooKeeper for management and load balancing.
In the embodiment of the invention, the streaming big data calculation module comprises a high-speed message queue and a calculation engine module. And the monitoring acquisition module pushes the acquired data to a high-speed message queue of the streaming big data calculation module.
Wherein the high-speed message queue can be implemented using a variety of high-speed message queue components. For example, a Kafka component (not limited to Kafka components, but other high-speed message queue components may be used) may be used, and the high-speed message queue, as a distributed data stream processing intermediate system, can provide high-speed throughput at an efficiency of several hundred megabits per second, satisfies the processing efficiency of large data, and is an indispensable role in balancing acquisition and computation.
The Kafka message system is composed of a publisher (producer), a broker (broker) and a subscriber (subscriber), which are respectively located on different nodes, and data transmission is performed between the parts through messages, wherein the publisher can push related messages to a topic (topic), and the subscriber can pay attention to and pull the messages in which the subscriber is interested in by taking a group as a unit. In this embodiment of the present invention, the monitoring collection module may be regarded as a publisher, the high-speed message queue may be regarded as a broker, and the computing engine may be regarded as a subscriber (Spark Streaming in this embodiment). The monitoring acquisition module pushes acquired data into a high-speed message queue, different types of data source message streams can correspond to different Topic definitions, and the logic processing of different Topic is relatively independent.
Step 101, a service module defines a deployment and control task.
In the embodiment of the invention, the deployment and control task comprises the following steps: the control method comprises the following steps of controlling task basic information, controlling object information and controlling dimensions, wherein the controlling dimensions comprise a controlling algorithm.
And the basic information of the control task comprises the starting time of the control task. Optionally, the basic information of the deployment and control task further includes a termination time of the deployment and control task or other basic information related to the deployment and control task.
The deployment object information is used for identifying the tracked object, such as the name of a person, and can be customized.
Optionally, the deployment dimension further includes at least one of: data source, field.
Wherein, the control algorithm comprises at least one of the following: the general algorithm and the extended algorithm are provided by the algorithm library;
in a common scene case of public security, the most common deployment and control algorithm is to analyze and screen the data of the trajectory class, screen whether a deployment and control object appears in the trajectory class data, and screen a key field for early warning display.
For example, the string global matching algorithm is to perform string matching on a value defined in a control dimension of a certain control object and a data stream, and when the data stream contains the value defined in the control dimension, capture the data stream and output the data stream as a control result.
For another example, specifying a field matching algorithm refers to performing string matching on a value defined in a control dimension of a certain control object and a certain field of a data stream, and capturing the data stream to output as a control result when the certain field of the data stream contains the value defined in the control dimension.
When the designated field matching algorithm is adopted, matched fields need to be contained in the control dimension.
When the general algorithm can not meet the requirement in practical application, the algorithm library can be expanded according to the requirement, and the expanded algorithm comprises at least one of the following algorithms: code class self-defining algorithm, function dependence class algorithm and regular rule class algorithm.
The code self-defining algorithm can realize the self-definition of the algorithm by uploading a jar packet for realizing the self-defining algorithm;
the function dependent class algorithm realizes the extension of the algorithm by inputting two fields with association and the association relationship between the two fields by a user;
the regular rule class algorithm implements an extension of the algorithm, such as by a user entering a regular expression.
The embodiment of the invention improves the expandability of the system by expanding the control algorithm, thereby meeting the requirements of characteristic scenes.
And 102, processing data by the streaming big data calculation module according to the deployment and control task and a data structure of metadata of a pre-configured data source to obtain a deployment and control result, and pushing the deployment and control result to a specified channel of a memory database.
In the embodiment of the invention, a calculation engine module of the streaming big data calculation module analyzes data in the high-speed message queue according to a data structure of metadata of a pre-configured data source, and processes the analyzed data according to a control task to obtain a control result.
In another embodiment of the present invention, before the computing engine module of the streaming big data computing module parses the data in the high-speed message queue according to the data structure of the preconfigured metadata of the data source, the method further includes: the business module configures a data structure of metadata of the data source.
In the embodiment of the present invention, when the data in the data source is stored in the form of a table, the data structure of the metadata includes: field content, field order, output field. For example, table 2 is one example of a data structure of metadata of a data source. The monitoring acquisition module can quickly acquire data under a resource path based on a data structure of metadata of a data source.
Figure BDA0001742340100000081
TABLE 2
In the embodiment of the invention, the streaming big data calculation module adopts the control algorithm to match data according to the basic information of the control task, and outputs the successfully matched data as the control result. Specifically, a calculation engine module of the streaming big data calculation module matches data in the high-speed message queue by adopting a control algorithm according to basic information of a control task, and outputs the successfully matched data as a control result.
For example, when the basic information of the deployment task includes the start time of the deployment task, the calculation engine module matches data by using a deployment algorithm from the start time of the deployment task, and outputs the successfully matched data as a deployment result. And when the basic information of the control task comprises the start time and the end time of the control task, the calculation engine module adopts a control algorithm to match data in the time period of the start time and the end time of the control task, and outputs the successfully matched data as a control result.
When the control dimension does not comprise a data source, the calculation engine module adopts a control algorithm to match data collected from all data sources; when the deployment dimension includes a data source, the compute engine module employs a deployment algorithm to match data collected from the data source in the deployment dimension.
The computing engine module in the present example may be implemented using a variety of components. For example, a Spark Streaming module (not limited to Spark Streaming module) may be employed.
In the embodiment of the present invention, the data structure of the deployment and control result may adopt a data structure shown in table 3, and of course, other data structures may also be adopted, which is not limited in the embodiment of the present invention.
Figure BDA0001742340100000091
TABLE 3
In another embodiment of the present invention, before the streaming big data calculation module processes data according to the deployment task to obtain the deployment result, the method further includes:
and the streaming big data calculation module shunts the deployment and control task according to the data source. Specifically, a calculation engine module of the streaming big data calculation module shunts the deployment and control task according to the data source.
That is, the calculation engine module may implement data matching using a plurality of nodes, each node matching data collected from one data source, and each node pushing a matching result to a designated channel of the in-memory database.
In another embodiment of the present invention, the method further comprises:
and the service module subscribes the appointed channel of the memory database and receives a subscription message pushed by the appointed channel to acquire the data of the appointed channel of the memory database. The service module can sense and display the control result in real time through subscription of the specified channel.
In another embodiment of the present invention, the service module periodically dumps the data in the memory database, for example, the data in the memory database is stored in a relational database or a search engine, etc., so as to provide the user with query and analysis of the historical results.
The monitoring acquisition module of the embodiment of the invention realizes the quick butt joint with the data source based on the resource configuration information of the data source, can quickly acquire the data of the data source without knowing the data structure of the metadata of the data source, and realizes the high-efficiency processing and analysis of the data based on the deployment and control task and the streaming big data calculation module.
After the data sources are configured and butted, massive streaming data are processed by the distributed monitoring acquisition module with high reliability, high throughput and high concurrency and the streaming big data calculation framework, so that the response speed of the streaming data with high real-time requirement is greatly increased, and the accuracy of data processing and analysis is improved to a certain extent. The method provided by the embodiment of the invention has higher generalizability.
Examples of the invention
In this example, the method comprises:
step 1, performing deployment control on a suspect A of a certain case, wherein daily track information of the suspect (deployment control object) needs to be monitored.
The suspect is known for the following information: identification number, vehicle number plate, mobile phone number, etc. .
Several new data sources need to be placed under control: internet bar internet information (hdfs), card gate vehicle-passing record (Oracle), hotel check-in information (mysql), and mobile phone call record (ftp).
And step 2, the service module configures a resource interface of the data source, as shown in table 4.
Data source type Interface name Interface definition
HDFS hdfs_01 hdfs://ip:9000
Oracle oracle_02 db://ip:1521/oracle32#oracle
MySQL mysql_03 db://ip:3306/test#mysql
FTP ftp_04 ftp://ip:21
TABLE 4
The interface detail definitions of the data resources of different types are different, and are not described in detail here.
And step 3, configuring a resource path of the data source by the service module, as shown in table 5.
Figure BDA0001742340100000101
Figure BDA0001742340100000111
TABLE 5
And 4, configuring a data structure of the metadata of the data source by the service module, as shown in table 6.
Figure BDA0001742340100000112
TABLE 6
And 5, managing the algorithm library by the service module.
The algorithm library comprises a general algorithm, when the general algorithm cannot meet the requirement in practical application, the algorithm library can expand the algorithm as required, and the expanded algorithm comprises at least one of the following algorithms: code class self-defining algorithm, function dependence class algorithm and regular rule class algorithm.
And step 6, defining a control task by the service module, as shown in a table 7.
Figure BDA0001742340100000121
TABLE 7
According to the algorithm, a certain data source (a certain field) can be selected, or a data source is not selected (namely all data sources are matched).
Step 7, the monitoring acquisition module acquires data of the data source in real time according to the resource interface and the resource path of the data source and pushes the data to Kafka; kafka defines topic by data source, generates 4 topics as above:
Topic1:T_ZYK_RK_WBSWRY
Topic2:T_VEH_KK_PASSREC
Topic3:T_QB_LG_RY_CGUEST
Topic4:T_ZA_XTBA_RY_PH_THJL
generating a corresponding background task according to a deployment task defined by a service module in Spark Streaming, consuming all topics from Kafka by Spark Streaming, shunting according to the topics, and analyzing a data stream in the topics according to a data structure of metadata; the Spark Streaming task calling algorithm carries out logic operation and pushes a real-time deployment and control result in combination with the data source output definition;
the real-time deployment and control result is pushed to the memory database, the application process subscribes a designated channel of the memory database, and the deployment and control result can be sensed and displayed in real time;
the application process periodically dumps the in-memory database data to a relational database or a search engine (only the latest preset data is reserved in the in-memory database) for querying the history deployment and control result or performing secondary analysis (for example, generating a suspect trajectory diagram, performing judgment analysis, and the like, which are not further described herein).
Another embodiment of the present invention provides a data processing apparatus, including at least one of the following modules:
the monitoring acquisition module 202 is used for acquiring data from a data source according to the pre-configured resource configuration information of the data source 201 and pushing the acquired data to the streaming big data calculation module;
the service module 204 is used for defining a deployment and control task;
the streaming big data calculation module 203 is configured to cache the data; processing the data according to the deployment and control task and a data structure of metadata of a pre-configured data source 201 to obtain a deployment and control result, and pushing the deployment and control result to a specified channel of an in-memory database;
the storage module 205 is configured to store the deployment and control result to the specified channel of the in-memory database.
In another embodiment of the present invention, the service module 204 is further configured to:
configuring resource configuration information of the data source 201 and a data structure of metadata of the data source; wherein the resource configuration information includes: resource interfaces and resource paths.
In another embodiment of the present invention, the service module 204 is further configured to:
and subscribing the appointed channel of the memory database, and receiving a subscription message pushed by the appointed channel to acquire the data of the appointed channel of the memory database.
In another embodiment of the present invention, the deployment task comprises: the method comprises the following steps of deploying and controlling task basic information, deploying and controlling object information and deploying and controlling dimensions, wherein the deploying and controlling dimensions comprise deploying and controlling algorithms;
the streaming big data calculation module 203 is specifically configured to:
caching the data; and analyzing the data according to the data structure of the metadata, matching the analyzed data by adopting the control algorithm according to the basic information of the control task, outputting the successfully matched data as a control result, and pushing the control result to a specified channel of the memory database.
In another embodiment of the present invention, the streaming big data calculation module 203 is further configured to:
and shunting the deployment and control task according to the data source.
Another embodiment of the present invention provides a data processing apparatus, including a processor and a computer-readable storage medium, wherein the computer-readable storage medium stores instructions, and when the instructions are executed by the processor, at least one step of any one of the data processing methods is implemented.
Another embodiment of the invention proposes a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out at least one step of any one of the data processing methods described above.
Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer.
Referring to fig. 2, another embodiment of the present invention provides a data processing system, including:
the monitoring acquisition module 202 is used for acquiring data from a data source according to the pre-configured resource configuration information of the data source 201 and pushing the acquired data to the streaming big data calculation module;
the service module 204 is used for defining a deployment and control task;
the streaming big data calculation module 203 is configured to cache the data; processing the data according to the deployment and control task and a data structure of metadata of a pre-configured data source to obtain a deployment and control result, and pushing the deployment and control result to a specified channel of a memory database;
the storage module 205 is configured to store the deployment and control result to the specified channel of the in-memory database.
The monitoring acquisition module 202, the service module 204, the streaming big data calculation module 203, and the storage module 205 in the embodiment of the present invention may be arranged in one node, may be arranged in different nodes, may be implemented by a plurality of nodes, or may be implemented by a cluster.
In another embodiment of the invention, the streaming big data calculation module 203 comprises a high-speed message queue and a calculation engine module;
a high-speed message queue for caching the data;
and the calculation engine module is used for analyzing the data according to the data structure of the metadata, processing the analyzed data according to the deployment and control task to obtain a deployment and control result, and pushing the deployment and control result to a specified channel of the memory database.
In embodiments of the present invention, a high-speed message queue may be implemented using a node, and a high-speed message queue is used for caching data of a data source.
The compute engine module may be implemented using one node or using one cluster. One node implements processing of data of one data source.
In another embodiment of the present invention, the service module 204 is further configured to:
configuring resource configuration information of the data source 201 and a data structure of metadata of the data source; wherein the resource configuration information includes: resource interfaces and resource paths.
In another embodiment of the present invention, the service module 204 is further configured to:
and subscribing the appointed channel of the memory database, and receiving a subscription message pushed by the appointed channel to acquire the data of the appointed channel of the memory database.
In another embodiment of the present invention, the deployment task comprises: the method comprises the following steps of deploying and controlling task basic information, deploying and controlling object information and deploying and controlling dimensions, wherein the deploying and controlling dimensions comprise deploying and controlling algorithms;
the streaming big data calculation module 203 is specifically configured to:
caching the data; and analyzing the data according to the data structure of the metadata, matching the analyzed data by adopting the control algorithm according to the basic information of the control task, outputting the successfully matched data as a control result, and pushing the control result to a specified channel of the memory database.
In another embodiment of the present invention, the streaming big data calculation module 203 is further configured to:
and shunting the deployment and control task according to the data source.
That is, the compute engine module is implemented using a plurality of nodes, one node of the compute engine module being for processing data of one data source.
Although the embodiments of the present invention have been described above, the descriptions are only used for understanding the embodiments of the present invention, and are not intended to limit the embodiments of the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the embodiments of the invention as defined by the appended claims.

Claims (10)

1. A method of data processing, comprising:
the monitoring acquisition module acquires data from the data source according to the pre-configured resource configuration information of the data source and pushes the acquired data to the streaming big data calculation module;
the business module defines a deployment and control task;
and the streaming big data calculation module processes the data according to the deployment and control task and a data structure of metadata of a pre-configured data source to obtain a deployment and control result, and pushes the deployment and control result to a specified channel of a memory database.
2. The data processing method of claim 1, wherein prior to collecting data from the data source, the method further comprises:
the service module configures the resource configuration information of the data source; wherein the resource configuration information includes: a resource interface and a resource path;
before the streaming big data calculation module processes data according to the deployment and control task and the data structure of the metadata of the pre-configured data source to obtain the deployment and control result, the method further comprises the following steps:
the business module configures a data structure of metadata of the data source.
3. The data processing method of claim 1, further comprising:
and the service module subscribes the appointed channel of the memory database and receives a subscription message pushed by the appointed channel to acquire the data of the appointed channel of the memory database.
4. The data processing method according to any one of claims 1 to 3, wherein the deployment task comprises: the method comprises the following steps of deploying and controlling task basic information, deploying and controlling object information and deploying and controlling dimensions, wherein the deploying and controlling dimensions comprise deploying and controlling algorithms;
the streaming big data calculation module processes data according to the deployment and control task and the data structure of the metadata of the pre-configured data source to obtain a deployment and control result, and the method comprises the following steps:
and the streaming big data calculation module analyzes the data according to the data structure of the metadata, matches the analyzed data by adopting the control algorithm according to the basic information of the control task, and outputs the successfully matched data as the control result.
5. The data processing method of claim 4, wherein the deployment algorithm comprises at least one of: the general algorithm and the extended algorithm are provided by the algorithm library;
wherein the expansion algorithm comprises at least one of: code class self-defining algorithm, function dependence class algorithm and regular rule class algorithm.
6. The data processing method according to any one of claims 1 to 3, wherein before the streaming big data calculation module processes data according to the deployment and control task to obtain a deployment and control result, the method further comprises:
and the streaming big data calculation module shunts the deployment and control task according to the data source.
7. A data processing apparatus comprising at least one of the following modules:
the monitoring acquisition module is used for acquiring data from the data source according to the pre-configured resource configuration information of the data source and pushing the acquired data to the streaming big data calculation module;
the service module is used for defining a deployment and control task;
the streaming big data calculation module is used for caching the data; processing the data according to the deployment and control task and a data structure of metadata of a pre-configured data source to obtain a deployment and control result, and pushing the deployment and control result to a specified channel of a memory database;
and the storage module is used for storing the deployment and control result to a specified channel of the memory database.
8. A data processing apparatus comprising a processor and a computer readable storage medium having instructions stored thereon, wherein the instructions, when executed by the processor, carry out at least one step of a data processing method according to any one of claims 1 to 6.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out at least one step of a data processing method according to any one of claims 1 to 6.
10. A data processing system comprising:
the monitoring acquisition module is used for acquiring data from the data source according to the pre-configured resource configuration information of the data source and pushing the acquired data to the streaming big data calculation module;
the service module is used for defining a deployment and control task;
the streaming big data calculation module is used for caching the data; processing the data according to the deployment and control task and a data structure of metadata of a pre-configured data source to obtain a deployment and control result, and pushing the deployment and control result to a specified channel of a memory database;
and the storage module is used for storing the deployment and control result to a specified channel of the memory database.
CN201810825305.4A 2018-07-25 2018-07-25 Data processing method, device and system Withdrawn CN110851473A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810825305.4A CN110851473A (en) 2018-07-25 2018-07-25 Data processing method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810825305.4A CN110851473A (en) 2018-07-25 2018-07-25 Data processing method, device and system

Publications (1)

Publication Number Publication Date
CN110851473A true CN110851473A (en) 2020-02-28

Family

ID=69594403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810825305.4A Withdrawn CN110851473A (en) 2018-07-25 2018-07-25 Data processing method, device and system

Country Status (1)

Country Link
CN (1) CN110851473A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084224A (en) * 2020-09-03 2020-12-15 北京锐安科技有限公司 Data management method, system, device and medium
CN112506888A (en) * 2020-12-29 2021-03-16 浪潮云信息技术股份公司 Data processing method based on different data sources of HDFS (Hadoop distributed File System)
CN112612514A (en) * 2020-12-31 2021-04-06 青岛海尔科技有限公司 Program development method and device, storage medium and electronic device
CN113452667A (en) * 2021-03-05 2021-09-28 浙江华云信息科技有限公司 Edge Internet of things terminal access method suitable for multiple protocol types
CN113572854A (en) * 2021-08-10 2021-10-29 北京无线电测量研究所 Kafka component-based data transmission method and system
CN113873037A (en) * 2021-09-29 2021-12-31 四川长虹网络科技有限责任公司 Data pushing method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105741134A (en) * 2016-01-26 2016-07-06 北京百分点信息科技有限公司 Method and apparatus for applying cross-data-source marketing crowds to marketing
CN107294801A (en) * 2016-12-30 2017-10-24 江苏号百信息服务有限公司 Stream Processing method and system based on magnanimity real-time Internet DPI data
CN107908691A (en) * 2017-11-01 2018-04-13 南京欣网互联网络科技有限公司 A kind of big data via operation analytic system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105741134A (en) * 2016-01-26 2016-07-06 北京百分点信息科技有限公司 Method and apparatus for applying cross-data-source marketing crowds to marketing
CN107294801A (en) * 2016-12-30 2017-10-24 江苏号百信息服务有限公司 Stream Processing method and system based on magnanimity real-time Internet DPI data
CN107908691A (en) * 2017-11-01 2018-04-13 南京欣网互联网络科技有限公司 A kind of big data via operation analytic system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084224A (en) * 2020-09-03 2020-12-15 北京锐安科技有限公司 Data management method, system, device and medium
CN112084224B (en) * 2020-09-03 2024-05-10 北京锐安科技有限公司 Data management method, system, equipment and medium
CN112506888A (en) * 2020-12-29 2021-03-16 浪潮云信息技术股份公司 Data processing method based on different data sources of HDFS (Hadoop distributed File System)
CN112612514A (en) * 2020-12-31 2021-04-06 青岛海尔科技有限公司 Program development method and device, storage medium and electronic device
CN112612514B (en) * 2020-12-31 2023-11-28 青岛海尔科技有限公司 Program development method and device, storage medium and electronic device
CN113452667A (en) * 2021-03-05 2021-09-28 浙江华云信息科技有限公司 Edge Internet of things terminal access method suitable for multiple protocol types
CN113572854A (en) * 2021-08-10 2021-10-29 北京无线电测量研究所 Kafka component-based data transmission method and system
CN113572854B (en) * 2021-08-10 2023-11-14 北京无线电测量研究所 Data transmission method and system based on Kafka component
CN113873037A (en) * 2021-09-29 2021-12-31 四川长虹网络科技有限责任公司 Data pushing method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110851473A (en) Data processing method, device and system
US11669499B2 (en) Management of journal entries associated with customizations of knowledge objects in a search head cluster
Nasridinov et al. A decision tree-based classification model for crime prediction
KR20210118452A (en) Real-time event detection for social data streams
JP2014112367A (en) System and method of reduction of irrelevant information during search
CN111309550A (en) Data acquisition method, system, equipment and storage medium of application program
CN110928851B (en) Method, device and equipment for processing log information and storage medium
CN111078765A (en) View base system based on Hadoop system architecture and construction method thereof
CN111258978A (en) Data storage method
US9792334B2 (en) Large-scale processing and querying for real-time surveillance
Li et al. City digital pulse: a cloud based heterogeneous data analysis platform
CN110543584B (en) Method, device, processing server and storage medium for establishing face index
CN111026709A (en) Data processing method and device based on cluster access
CN114358726A (en) Drug inhibition early warning research and judgment method and system based on combination of reporting clues and multiple data sources
CN114356692A (en) Visual processing method and device for application monitoring link and storage medium
WO2016201992A1 (en) Video storage and retrieval method for cloud storage server, and video cloud storage system
CN112463527A (en) Data processing method, device, equipment, system and storage medium
CN110909072B (en) Data table establishment method, device and equipment
CN113873025B (en) Data processing method and device, storage medium and electronic equipment
US10498799B2 (en) Intelligent routing of media items
Bailer et al. Learning selection of user generated event videos
CN112862598A (en) Channel information management method and device, electronic equipment and medium
Shuai et al. Memtv: a research on multi-level edge computing model for traffic video processing
CN118138806B (en) Video filtering method, apparatus, device, storage medium, and computer program product
CN117112815B (en) Personal attention video event retrieval method and system, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200228