CN113535783A - Streaming data processing method, system, computer device and readable storage medium - Google Patents

Streaming data processing method, system, computer device and readable storage medium Download PDF

Info

Publication number
CN113535783A
CN113535783A CN202110837361.1A CN202110837361A CN113535783A CN 113535783 A CN113535783 A CN 113535783A CN 202110837361 A CN202110837361 A CN 202110837361A CN 113535783 A CN113535783 A CN 113535783A
Authority
CN
China
Prior art keywords
data
field
streaming
join
streaming data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110837361.1A
Other languages
Chinese (zh)
Inventor
汪月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Minglue Zhaohui Technology Co Ltd
Original Assignee
Beijing Minglue Zhaohui Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Minglue Zhaohui Technology Co Ltd filed Critical Beijing Minglue Zhaohui Technology Co Ltd
Priority to CN202110837361.1A priority Critical patent/CN113535783A/en
Publication of CN113535783A publication Critical patent/CN113535783A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a streaming data processing method, a system, a computer device and a readable storage medium, wherein the streaming data processing method comprises the following steps: a step of accessing streaming data, namely accessing the streaming data in the streaming processing platform based on a streaming computing component; a streaming data analysis step, namely, configuring a data reading configuration file and analyzing the streaming data according to the data reading configuration file to obtain target data; wherein, the data reading configuration file at least comprises: field name, field value, parsing configuration, and Join configuration. According to the method and the device, flexible stream data analysis is realized based on the data reading configuration file, and a solution for accurately and flexibly obtaining the required target data through the configuration data reading configuration file is provided for a user.

Description

Streaming data processing method, system, computer device and readable storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a streaming data processing method, system, computer device, and computer-readable storage medium.
Background
Big data processing systems can be divided into batch (batch) big data and streaming (streaming) big data. The batch big data is also called historical big data, and the streaming big data is also called real-time big data. Stream data is a continuous, borderless, fast, time-varying series of data items (e.g. structured data or tuples, such as picture documents, may also constitute stream data), and as technology develops, the processing of stream data becomes more and more important.
At present, the mainstream Streaming data processing technology is represented by Spark Streaming, Storm and Flink, but the data accessed by the existing Streaming data access system is not the valuable data which is finally needed, and how to accurately and flexibly obtain the valuable data which is actually needed, an effective solution is not provided.
Disclosure of Invention
The embodiment of the application provides a streaming data processing method, a streaming data processing system, computer equipment and a computer readable storage medium, which realize flexible streaming data analysis based on a data reading configuration file and provide a solution for a user to accurately and flexibly obtain required target data by configuring the data reading configuration file.
In a first aspect, an embodiment of the present application provides a streaming data processing method, including:
a step of accessing streaming data, namely accessing the streaming data in the streaming processing platform based on a streaming computing component; the stream computing component is specifically Spark Streaming.
A streaming data analysis step, namely, configuring a data reading configuration file and analyzing the streaming data according to the data reading configuration file to obtain target data;
wherein, the data reading configuration file at least comprises: field name, field value, parsing configuration, and Join configuration.
Based on the steps, the embodiment of the application realizes that the reading of the data stream is configured according to the requirements of the user, and the data is analyzed based on the data reading configuration file. The embodiment of the application supports Join connection on streaming data based on the data reading configuration file and supports data governance based on the data reading configuration file.
In some of these embodiments, the method further comprises:
and a streaming data connection step, namely performing data Join according to a main table of a target database and the Join configuration and outputting the data Join.
Based on the steps, the embodiment of the application provides a new streaming data Join method according to the streaming data Join connection configured by the Join, so that the field is extracted according to the requirement instead of roughly loading all data to carry out the Join, and the flexibility of the data Join is improved.
In some embodiments, the parsing configuration includes one or any combination of a data field, a sink field, a secondary data field, a table name, a secondary sink field, a lower value replacement field, a constant assignment field, a delete field,
the streaming data parsing step further comprises:
a target data judgment step of judging and searching all data with the same configuration as the field names and the field values by combining the field names and the field values to obtain target data;
and analyzing the target data, namely analyzing and treating the target data according to the analysis configuration.
Based on the steps, the operations of modifying, reserving or deleting the original data field and the like are realized based on the lower value replacing field, the constant value assignment field, the deleting field and the like in the data analyzing process, and the data does not need to be managed through SQL in the subsequent data processing process, so that the flexibility of data analysis is further improved, and the data management cost in the subsequent data processing process is reduced.
In some embodiments, the Join configuration includes one or any combination of a Join field, a code table primary key field, a code table value field, and a rename field.
Based on the configuration, Join connection is carried out on the data, renaming of data fields is supported, and flexible data management is achieved.
In a second aspect, an embodiment of the present application provides a streaming data processing system, including:
the stream data access module is used for accessing stream data in the stream processing platform based on the stream computing component; the stream computing component is specifically Spark Streaming.
The streaming data analysis module is used for configuring a data reading configuration file and analyzing the streaming data according to the data reading configuration file to obtain target data;
wherein, the data reading configuration file at least comprises: field name, field value, parsing configuration, and Join configuration.
Based on the modules, the embodiment of the application realizes that the reading of the data stream is configured according to the requirements of users, and the data is analyzed based on the data reading configuration file. The embodiment of the application supports Join connection on streaming data based on the data reading configuration file and supports data governance based on the data reading configuration file.
In some of these embodiments, the system further comprises:
and the streaming data connection module is used for carrying out data Join according to a main table of a target database and the Join configuration and outputting the data Join.
Based on the modules, the embodiment of the application provides a new streaming data Join method according to the streaming data Join connection configured by the Join, so that fields are extracted according to requirements instead of roughly loading all data to carry out the Join, and the flexibility of the data Join is improved.
In some embodiments, the parsing configuration includes one or any combination of a data field, a sink field, a secondary data field, a table name, a secondary sink field, a lower value replacement field, a constant assignment field, a delete field,
the streaming data parsing module further comprises:
the target data judgment module is used for judging by combining the field names and the field values and searching all data with the same configuration as the field names and the field values to obtain the target data;
and the target data analysis module is used for carrying out data analysis and treatment on the target data according to the analysis configuration.
Based on the module, the operations of modifying, reserving or deleting the original data field and the like are realized based on the lower value replacing field, the constant value assignment field, the deleting field and the like in the data analyzing process, the data is not required to be managed through SQL in the subsequent data processing process, the flexibility of data analysis is further improved, and the data management cost in the subsequent data processing process is reduced.
In some embodiments, the Join configuration includes one or any combination of a Join field, a code table primary key field, a code table value field, and a rename field.
Based on the configuration, Join connection is carried out on the data, renaming of data fields is supported, and flexible data management is achieved.
In a third aspect, an embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the computer program, implements the streaming data processing method according to the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the streaming data processing method according to the first aspect.
Compared with the related art, the streaming data processing method, the streaming data processing system, the computer device and the computer readable storage medium provided by the embodiment of the application particularly relate to a data capability foundation, and are particularly applied to data cleaning, the parsing processing of streaming data is realized based on the flexible configuration of a data reading configuration file, the Join connection of the streaming data is supported, the operations of modifying, reserving or deleting, renaming and the like of original data fields are also supported, the data parsing and the data governance are realized, the streaming data processing method, the system, the computer device and the computer readable storage medium are flexible and convenient, and the overall data processing cost is reduced.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flow chart of a streaming data processing method according to an embodiment of the present application;
FIG. 2 is a flow chart of a method of streaming data processing according to a preferred embodiment of the present application;
fig. 3 is a block diagram of a streaming data processing system according to an embodiment of the present application.
Wherein:
1. a streaming data access module; 2. a streaming data parsing module; 3. a streaming data connection module; 201. a target data judgment module; 202. and a target data analysis module.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
Spark Streaming is a component of Streaming computation on real-time data provided by Spark, provides an API to manipulate the data stream, and highly corresponds to the RDD API in Spark Core. Spark Streaming supports the same level of fault tolerance, throughput and scalability as Spark Core.
Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all the action flow data of a consumer in a web site.
Join refers to performing Join operations on two or more tables in a database.
At present, data provided by most streaming data access systems is not valuable data which is finally needed, the method mainly solves the problem that how to retain the valuable data through simple configuration aiming at the streaming data, and the method has no direct Join connection as the offline data in consideration of the fact that the streaming data cannot be subjected to sorting and matching of a whole table, and simultaneously solves the requirement of carrying out streaming data connection to obtain the needed data.
The embodiment provides a streaming data processing method. Fig. 1 is a flowchart of a streaming data processing method according to an embodiment of the present application, and as shown in fig. 1, the flowchart includes the following steps:
a step S1 of accessing the streaming data in the streaming processing platform based on the streaming computing component; the stream computing component is specifically Spark Streaming, and the stream processing platform is Kafka as an example, but not limited to Kafka, and may be other stream processing platforms.
A streaming data analyzing step S2, configuring a data reading configuration file and analyzing the streaming data according to the data reading configuration file to obtain target data; optionally, the data format of the data reading configuration file is json (javascript Object notification). Notably, the data read profile is case sensitive. Wherein, the data reading configuration file at least comprises: field name, field value, parsing configuration, and Join configuration.
Based on the steps, the embodiment of the application realizes that the reading of the data stream is configured according to the requirements of the user, and the data is analyzed based on the data reading configuration file. The embodiment of the application supports Join connection on streaming data based on the data reading configuration file and supports data governance based on the data reading configuration file.
In some of these embodiments, the method further comprises: the streaming data connection step S3 is to perform data Join according to the master table and Join configuration of a target database and output the data Join. Optionally, the target database is a MySQL database, but is not limited to the MySQL database. It should be noted that the stream processing platform is configured with an output configuration file, the output configuration file is used for configuring a data table to be output, and a table not belonging to the configuration file is not output.
Based on the steps, the embodiment of the application provides a new streaming data Join method according to the streaming data Join connection configured by the Join, so that the field is extracted according to the requirement instead of roughly loading all data to carry out the Join, and the flexibility of the data Join is improved.
In some embodiments, the parsing configuration includes one or any combination of a data field, a sink field, a secondary data field, a table name, a secondary sink field, a lower value replacement field, a constant assignment field, and a deletion field, and based on the parsing configuration, the streaming data parsing step S2 further includes:
a target data judgment step S201, judging and searching all data with the same configuration as the field name and the field value by combining the field name and the field value to obtain target data;
and a target data analysis step S202, performing data analysis and treatment on the target data according to the analysis configuration.
Based on the steps, the operations of modifying, reserving or deleting the original data field and the like are realized based on the lower value replacing field, the constant value assignment field, the deleting field and the like in the data analyzing process, and the data does not need to be managed through SQL in the subsequent data processing process, so that the flexibility of data analysis is further improved, and the data management cost in the subsequent data processing process is reduced.
In some embodiments, the Join configuration includes one or any combination of a Join field, a code table primary key field, a code table value field, a rename field.
Based on the configuration, Join connection is carried out on the data, renaming of data fields is supported, and flexible data management is achieved.
The embodiments of the present application are described and illustrated below by means of preferred embodiments.
Fig. 2 is a flow chart of a streaming data processing method according to a preferred embodiment of the present application. As shown in fig. 2, the streaming data processing method includes the steps of:
firstly, data in Kafka is accessed based on Spark Streaming; and then configuring the data reading configuration file to obtain a JSON configuration file, and analyzing according to the configured JSON configuration file to obtain data. In this embodiment, Join is performed according to the main table in MySQL, and then output to Kafka. The above process is based on the configuration of the MySQL database, the output configuration of Kafka and the configuration of the JSON configuration file, and it is noted that the table that is not in the output configuration of Kafka is not output, thereby improving the accuracy of the output target data. For ease of understanding, the following examples of configurations are provided.
(1) An example of a configuration file for a MySQL database is as follows:
Figure BDA0003177684350000071
and the configuration parameters are explained as follows:
parameter(s) Description of the invention
url=XXXXXXXXXX Connection URL of MySQL
user=aaa Username of MySQL
password=123456 Password of MySQL
dbtable=aaa MySQL database name
(2) An example of an output profile for Kafka is as follows:
Figure BDA0003177684350000072
(3) examples of the format and configuration of JSON configuration files are as follows:
Figure BDA0003177684350000073
Figure BDA0003177684350000081
and the configuration parameters are explained as follows:
Figure BDA0003177684350000082
Figure BDA0003177684350000091
it should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
This embodiment also provides a streaming data processing system, which is used to implement the foregoing embodiments and preferred embodiments, and the description of which has been already made is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. While the system described in the embodiments below is preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
Fig. 3 is a block diagram of a streaming data processing system according to an embodiment of the present application, and as shown in fig. 3, the system includes:
the stream data access module 1 is used for accessing stream data in the stream processing platform based on the stream computing component; the stream computing component is embodied as Spark Streaming, and the stream processing platform can be Kafka, but is not limited to Kafka, and can also be other stream processing platforms.
The streaming data analysis module 2 is used for configuring a data reading configuration file and analyzing the streaming data according to the data reading configuration file to obtain target data; optionally, the data format of the data reading configuration file is JSON. Notably, the data read profile is case sensitive. Wherein, the data reading configuration file at least comprises: field name, field value, parsing configuration, and Join configuration. Specifically, the parsing configuration includes one of or any combination of a data field, a sinking field, a secondary data field, a table name, a secondary sinking field, a lower value replacement field, a constant assignment field, and a deletion field. Based on this, the streaming data parsing module 2 further includes: a target data judgment module 201, configured to judge and search all data with the same configuration as the field name and the field value by using the field name and the field value to obtain target data; the target data analysis module 202 is configured to perform data analysis and treatment on target data according to analysis configuration, so that operations such as modification, retention, or deletion of an original data field are realized based on a lower value replacement field, a constant assignment field, a deletion field, and the like in a data analysis process, and data treatment through SQL is not needed in a subsequent data processing process, so that not only is the flexibility of data analysis further improved, but also the data treatment cost in the subsequent data processing process is reduced. Specifically, the Join configuration includes one or any combination of a Join field, a code table primary key field, a code table value field, and a rename field. Based on the Join configuration, Join connection of data is realized, renaming of data fields is supported, and flexible data management is realized. Based on the streaming data analysis module 2, the embodiment of the application realizes that the reading of the data stream is configured according to the requirement of the user, and the data is analyzed based on the data reading configuration file. The embodiment of the application supports Join connection on streaming data based on the data reading configuration file and supports data governance based on the data reading configuration file.
And the streaming data connection module 3 is used for carrying out data Join according to a main table and Join configuration of a target database and outputting the data Join. The target database is exemplified by a MySQL database, but is not limited to the MySQL database. It should be noted that the stream processing platform is configured with an output configuration file, the output configuration file is used for configuring a data table to be output, and a table not belonging to the configuration file is not output. Based on the modules, the embodiment of the application provides a new streaming data Join method according to the streaming data Join connection configured by the Join, so that fields are extracted according to requirements instead of roughly loading all data to carry out the Join, and the flexibility of the data Join is improved.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
In addition, the streaming data processing method described in the embodiment of the present application in conjunction with fig. 1 may be implemented by a computer device. The computer device may include a processor and a memory storing computer program instructions.
In particular, the processor may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
The memory may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. The memory may include removable or non-removable (or fixed) media, where appropriate. The memory may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory is a Non-Volatile (Non-Volatile) memory. In particular embodiments, the Memory includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (earrom), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory may be used to store or cache various data files for processing and/or communication use, as well as possibly computer program instructions for execution by the processor.
The processor reads and executes the computer program instructions stored in the memory to implement any one of the streaming data processing methods in the above embodiments.
In some of these embodiments, the computer device may also include a communication interface and a bus. The processor, the memory and the communication interface are connected through a bus and complete mutual communication.
The communication interface is used for realizing communication among modules, devices, units and/or equipment in the embodiment of the application. The communication port may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
A bus comprises hardware, software, or both that couple components of a computer device to one another. Buses include, but are not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, a Bus may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, abbreviated VLB) bus or other suitable bus or a combination of two or more of these. A bus may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The computer device may execute the streaming data processing method in the embodiment of the present application based on the obtained streaming data, thereby implementing the streaming data processing method described in conjunction with fig. 1.
In addition, in combination with the streaming data processing method in the foregoing embodiments, the embodiments of the present application may provide a computer-readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the streaming data processing methods of the above embodiments.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A streaming data processing method, comprising:
a step of accessing streaming data, namely accessing the streaming data in the streaming processing platform based on a streaming computing component;
a streaming data analysis step, namely, configuring a data reading configuration file and analyzing the streaming data according to the data reading configuration file to obtain target data;
wherein, the data reading configuration file at least comprises: field name, field value, parsing configuration, and Join configuration.
2. The streaming data processing method of claim 1, further comprising:
and a streaming data connection step, namely performing data Join according to a main table of a target database and the Join configuration and outputting the data Join.
3. The streaming data processing method of claim 2, wherein the parsing configuration comprises one or any combination of a data field, a sink field, a secondary data field, a table name, a secondary sink field, a lower value replacement field, a constant assignment field, and a delete field,
the streaming data parsing step further comprises:
a target data judgment step of judging and searching all data with the same configuration as the field names and the field values by combining the field names and the field values to obtain target data;
and analyzing the target data, namely analyzing and treating the target data according to the analysis configuration.
4. A streaming data processing method according to claim 2 or 3, wherein the Join configuration comprises one or any combination of a Join field, a code table primary key field, a code table value field, and a rename field.
5. A streaming data processing system, comprising:
the stream data access module is used for accessing stream data in the stream processing platform based on the stream computing component;
the streaming data analysis module is used for configuring a data reading configuration file and analyzing the streaming data according to the data reading configuration file to obtain target data;
wherein, the data reading configuration file at least comprises: field name, field value, parsing configuration, and Join configuration.
6. The streaming data processing system of claim 5, further comprising:
and the streaming data connection module is used for carrying out data Join according to a main table of a target database and the Join configuration and outputting the data Join.
7. The streaming data processing system of claim 6, wherein the parsing configuration comprises one or any combination of a data field, a sink field, a secondary data field, a table name, a secondary sink field, a lower value replacement field, a constant assignment field, and a delete field,
the streaming data parsing module further comprises:
the target data judgment module is used for judging by combining the field names and the field values and searching all data with the same configuration as the field names and the field values to obtain the target data;
and the target data analysis module is used for carrying out data analysis and treatment on the target data according to the analysis configuration.
8. The streaming data processing system of claim 6 or 7, wherein the Join configuration comprises one or any combination of a Join field, a code table primary key field, a code table value field, and a rename field.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the streaming data processing method according to any of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a streaming data processing method according to any one of claims 1 to 4.
CN202110837361.1A 2021-07-23 2021-07-23 Streaming data processing method, system, computer device and readable storage medium Pending CN113535783A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110837361.1A CN113535783A (en) 2021-07-23 2021-07-23 Streaming data processing method, system, computer device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110837361.1A CN113535783A (en) 2021-07-23 2021-07-23 Streaming data processing method, system, computer device and readable storage medium

Publications (1)

Publication Number Publication Date
CN113535783A true CN113535783A (en) 2021-10-22

Family

ID=78088857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110837361.1A Pending CN113535783A (en) 2021-07-23 2021-07-23 Streaming data processing method, system, computer device and readable storage medium

Country Status (1)

Country Link
CN (1) CN113535783A (en)

Similar Documents

Publication Publication Date Title
US10452691B2 (en) Method and apparatus for generating search results using inverted index
US8380680B2 (en) Piecemeal list prefetch
CN108228799B (en) Object index information storage method and device
CN112597138A (en) Data deduplication method and device, computer equipment and computer-readable storage medium
CN108197204B (en) File processing method and device
CN105790967B (en) Network log processing method and device
US20180011923A1 (en) Value range synopsis in column-organized analytical databases
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN112527950A (en) MapReduce-based graph data deleting method and system
WO2015124086A1 (en) Virus signature matching method and apparatus
CN111666278B (en) Data storage method, data retrieval method, electronic device and storage medium
CN113535783A (en) Streaming data processing method, system, computer device and readable storage medium
US9201982B2 (en) Priority search trees
US9235639B2 (en) Filter regular expression
CN110555158A (en) mutually exclusive data processing method and system, and computer readable storage medium
CN113204706B (en) Data screening and extracting method and system based on MapReduce
KR101658885B1 (en) Graph processing method for breadth first searching and apparatus thereof
CN113535338A (en) Interaction method, system, storage medium and electronic device for data access
CN109344119B (en) File merging processing method and device, computing equipment and computer storage medium
CN113656830A (en) Database desensitization grammar parsing method, system, computer and readable storage medium
CN113342647A (en) Test data generation method and device
CN112667682A (en) Data processing method, data processing device, computer equipment and storage medium
CN112765938B (en) Method for constructing suffix array, terminal equipment and computer readable storage medium
US9864765B2 (en) Entry insertion apparatus, method, and program
CN111427870B (en) Resource management method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination