WO2020232880A1 - Data processing method and apparatus, storage medium and terminal device - Google Patents

Data processing method and apparatus, storage medium and terminal device Download PDF

Info

Publication number
WO2020232880A1
WO2020232880A1 PCT/CN2019/103039 CN2019103039W WO2020232880A1 WO 2020232880 A1 WO2020232880 A1 WO 2020232880A1 CN 2019103039 W CN2019103039 W CN 2019103039W WO 2020232880 A1 WO2020232880 A1 WO 2020232880A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
regular expression
format
data packet
matching
Prior art date
Application number
PCT/CN2019/103039
Other languages
French (fr)
Chinese (zh)
Inventor
孙云雷
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020232880A1 publication Critical patent/WO2020232880A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Definitions

  • This application belongs to the field of computer technology, and in particular relates to a data processing method, device, computer non-volatile readable storage medium, and terminal equipment.
  • the embodiments of the present application provide a data processing method, device, computer non-volatile readable storage medium, and terminal equipment to solve the existing manual data processing that consumes a lot of time and labor costs. , The problem of very low efficiency.
  • the first aspect of the embodiments of the present application provides a data processing method, which may include:
  • Each data record in the data packet is processed separately according to the target processing rule to obtain a processed data packet.
  • the second aspect of the embodiments of the present application provides a data processing device, which may include a module for implementing the steps of the above data processing method.
  • the third aspect of the embodiments of the present application provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores computer readable instructions, and the computer readable instructions are executed by a processor When realizing the steps of the above data processing method.
  • the fourth aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes the computer
  • the steps of the above data processing method are realized when the instructions are readable.
  • the data format of the data packet is automatically determined by regular matching, and the data processing is further automatically performed according to the corresponding processing rules, that is, the complete process of data format matching and data processing is realized in a fully automated manner.
  • the entire process does not require any manual intervention, saving a lot of time and labor costs, and greatly improving the efficiency of data processing.
  • FIG. 1 is a flowchart of an embodiment of a data processing method in an embodiment of this application
  • Figure 2 is a schematic flow chart of format matching of data packets
  • FIG. 3 is a schematic diagram of setting multiple standby data processing terminals to perform parallel processing on data packets
  • Fig. 4 is a schematic flow chart of offloading processing of data packets
  • FIG. 5 is a structural diagram of an embodiment of a data processing device in an embodiment of the application.
  • Fig. 6 is a schematic block diagram of a terminal device in an embodiment of the application.
  • an embodiment of a data processing method in the embodiment of the present application may include:
  • Step S101 Receive a data packet collected and sent by a preset packet capture tool.
  • the packet capture tool is a tool for collecting data transmitted on the network.
  • the packet capture tool includes but is not limited to tools such as fiddler and wireshark.
  • the packet capture tool packs the collected data into several data packets, and sends the data packets to a preset data processing terminal.
  • the data processing terminal is the implementation subject of this embodiment.
  • each data packet includes more than one data record.
  • the number of data records contained in each data packet can be set according to the actual situation. For example, it can be set to 1000, 2000, 5000 or Other values and so on.
  • each row is a data record.
  • each data record in each data package is data collected for the same business scenario.
  • Each data record in the same data package has the same data format.
  • the data records in different data packages can be Have the same data format, or different data formats.
  • Step S102 Perform format matching on the target record according to a preset regular expression resource library, and determine the data format of the data packet.
  • the regular expression resource library includes more than one regular expression, and each regular expression corresponds to a data format.
  • the regular expression corresponding to its data format can be preset, and the regular expression of each data format can be constructed into the regular expression resource library as shown in the following table:
  • Regular expression Data format a Regular expression 1 Data format b Regular expression 2 Data format c Regular expression 3 ... ... ... ...
  • Regular expression also known as regular expression (Regular Expression) is a concept of computer science. It is usually used to retrieve and replace texts that conform to a certain pattern (rule), which is to use certain pre-defined characters, and The combination of these specific characters forms a "rule string", which is used to express a filtering logic for the string.
  • a regular expression is a text pattern that describes one or more strings to be matched when searching for text.
  • the data processing terminal When the data processing terminal needs to match the format of the received data packet, it first selects one of the regular expressions from the regular expression resource library to match the data records in the data packet, and if the matching is successful, It can be determined that the data format of the data packet is the data format corresponding to the regular expression in the regular expression resource library.
  • each data record in the same data packet has the same data format, when using regular expressions for format matching, one data record (that is, the target record) can be arbitrarily selected from the data packet.
  • the format matching is sufficient, and format matching is not required for all data records in the data packet.
  • the next regular expression is selected to perform format matching on the data packet, and the above process is repeated continuously until the format matching succeeds.
  • the specific content of the above regular expression resource library can be adjusted according to the actual situation. For example, when some data format data no longer needs to be analyzed, the corresponding entry can be removed from the regular expression resource library When data in some new data formats needs to be analyzed, the corresponding entries can be added to the regular expression resource library, and the regular expression corresponding to a certain data format can also be determined according to the actual situation. Make modifications to keep the regular expression resource library applicable to the latest business scenarios.
  • the data packets can be format matched according to the process shown in Figure 2:
  • step S1021 the matching success rate of each regular expression in the regular expression resource library is calculated according to the historical matching records in the preset statistical time period.
  • the statistical period can be set to 1 month, 2 months, 3 months, half a year, one year, or other values according to the actual situation. Because the data that is too long has little reference, it is generally set within one year. Appropriate within.
  • the matching success rate is positively correlated with the number of matching successes of the regular expression in the historical matching record, that is, the more matching successes, the higher the matching success rate, and the fewer the matching successes, the lower the matching success rate.
  • the historical matching record records the regular expression used every time the data packet format is successfully matched. For example, if a total of 50 data packets are formatted in history, 30 of them are caused by regular expression 1.
  • the matching is successful, 14 times are matched by regular expression 2 and 6 times are matched by regular expression 3, indicating that the success rate of matching using regular expression 1 is the highest, and using regular expression 2 for matching
  • the success rate of matching is the second, and the matching success rate using regular expression 3 is the lowest. You can set regular expression 1 to the highest matching success rate, and regular expression 2 to the second highest matching success rate, and use regular expression Equation 3 is set as the lowest matching success rate.
  • the statistical period can be first divided into T sub-periods, where T is a positive integer, and the value of T can be set according to actual conditions, for example, it can be set It is 5, 10, 20 or other values. It should be noted that the larger the value of T, the greater the amount of calculation, but the higher the calculation accuracy; the smaller the value of T, the greater the amount of calculation, but the lower the calculation accuracy, you need to adjust the two according to the actual situation The trade-off.
  • n is the sequence number of the regular expression, 1 ⁇ n ⁇ N
  • N is the total number of regular expressions in the regular expression resource library
  • t is the sequence number of the sub-period in chronological order, 1 ⁇ t ⁇ T , The earlier the sub-period in the time dimension, the smaller the value of t
  • MatSucNum n,t is the number of successful matches of the n-th regular expression in the regular expression resource library in the t-th sub-period
  • Weight t It is the preset weight coefficient
  • Weight t ⁇ Weight t+1 that is, the later the sub-period has the larger the weight coefficient.
  • MatSucRatio n is the nth in the regular expression resource library. The matching success rate of a regular expression.
  • Step S1022 from the regular expression resource library, select a regular expression with the highest matching success rate that has not been selected as a candidate expression.
  • Step S1023 Use the candidate expression to perform format matching on the target record.
  • any one of the data records can be successfully matched with the candidate expression, so it can be determined that the format matching is successful, otherwise, it can be determined that the format matching fails.
  • Step S1024 Determine whether the format matching is successful.
  • step S1025 If the format matching fails, return to step S1022 and subsequent steps until the format matching succeeds; if the format matching succeeds, perform step S1025.
  • Step S1025 Determine the data format corresponding to the candidate expression as the data format of the data packet.
  • each regular expression is selected from the regular expression resource library in order of the matching success rate from high to low. In this way, the format matching process can be completed with the least number of matches, and the data packet The speed of format matching.
  • Step S103 Search for the target processing rule in a preset data processing rule library.
  • the target processing rule is a data processing rule corresponding to the data format of the data packet.
  • Data processing rules corresponding to each data format can be preset , Construct the data processing rules of each data format into the data processing rule library shown in the following table:
  • Data processing rules Data format a Data Processing Rule 1 Data format b Data processing rules 2 Data format c Data Processing Rule 3 ... ... ... ... ...
  • Step S104 Process each data record in the data packet separately according to the target processing rule to obtain a processed data packet.
  • the corresponding data processing rules can be set as follows: divide each data record into two parts, the first part is the data before the equal sign (c0-e1), the first part is the data after the equal sign (string:tsalesApplyCustContact), each One part of the data is enclosed in quotation marks, and the two parts are separated by a colon ("c0-e1": “string:tsalesApplyCustContact”). Finally, each piece of data is separated by a comma, and the whole set of braces is added to form the following Data packet showing the data format:
  • the data processing rule corresponding to the data format of the data packet to be processed can be set according to the specific scenario, which will not be repeated here.
  • multiple standby data processing terminals can also be set to process data packets in parallel.
  • the total number of data packets waiting to be processed in the data processing terminal may be counted first, if the total number of data packets waiting to be processed is less than or equal to a preset
  • the number threshold is still processed in accordance with the process shown in FIG. 1.
  • the number threshold can be set according to actual conditions, for example, it can be set to 100, 200, 500 or other values. If the total number of data packets waiting to be processed is greater than the number threshold, processing is performed according to the process shown in FIG. 4:
  • Step S401 Obtain the preset configuration files of each standby data processing terminal, and determine the data format corresponding to each standby data processing terminal according to the configuration file.
  • Each spare data processing terminal is dedicated to processing data packets of a certain data format, and this corresponding relationship will be stored in advance in the configuration files of each spare data processing terminal, and the data processing terminal can obtain these configuration files. Based on this, the data format corresponding to each standby data processing terminal is determined.
  • Step S402 Divide each standby data processing terminal into a corresponding data processing cluster.
  • all data processing terminals are preferably divided into two or more data processing clusters, wherein the data formats corresponding to the spare data processing terminals in the same data processing cluster are all consistent.
  • Step S403 Select a target cluster corresponding to the data packet.
  • the data format corresponding to each spare data processing terminal in the target cluster is consistent with the data format of the data packet.
  • Step S404 Send the data packet to the target cluster for processing.
  • each data processing terminal in the target cluster has the same data format as the data packet, the data packet can be processed more quickly.
  • the data processing terminal may respectively send a data packet query request to each backup data processing terminal in the target cluster, and respectively receive the number of to-be-processed data packets fed back by each backup data processing terminal in the target cluster, Then select the backup data processing terminal with the smallest number of data packets to be processed from the target cluster as the preferred processing terminal, and allocate the data packets to the preferred processing terminal for processing.
  • the processing procedure of the preferred terminal is the same as step S104 The processing process in is similar. For details, please refer to the foregoing specific content, which will not be repeated here.
  • each data packet is distributed to the data processing cluster corresponding to its data format for processing according to the result of the format matching.
  • each standby data processing terminal in each data processing cluster will simultaneously process data packets in each data format in parallel, thereby improving overall data processing efficiency.
  • the embodiment of the application uses regular matching to automatically determine the data format of the data packet, and further automatically performs data processing according to the corresponding processing rules, that is, the data format matching and data processing are realized in a fully automated manner.
  • the whole process without any manual intervention, saves a lot of time and labor costs, and greatly improves the efficiency of data processing.
  • FIG. 5 shows a structural diagram of an embodiment of a data processing apparatus provided in an embodiment of the present application.
  • a data processing device may include:
  • the data packet receiving module 501 is configured to receive data packets collected and sent by a preset packet capture tool
  • the format matching module 502 is configured to perform format matching on the target record according to a preset regular expression resource library, and determine the data format of the data packet;
  • the processing rule search module 503 is used to search for the target processing rule in a preset data processing rule library
  • the data processing module 504 is configured to separately process each data record in the data packet according to the target processing rule to obtain a processed data packet.
  • the format matching module may include:
  • the matching success rate calculation unit is configured to calculate the matching success rate of each regular expression in the regular expression resource library according to historical matching records in a preset statistical period;
  • the candidate expression selection unit is used to select a regular expression with the highest matching success rate that has not been selected as a candidate expression from the regular expression resource library;
  • a format matching unit configured to use the candidate expression to perform format matching on the target record
  • the first processing unit is configured to return and execute the step of selecting a regular expression with the highest matching success rate from the regular expression resource library that has not been selected as a candidate expression if the format matching fails, until the format Until the match is successful;
  • the second processing unit is configured to determine the data format corresponding to the candidate expression successfully matched as the data format of the data packet if the format matching is successful.
  • the matching success rate calculation unit may include:
  • the sub-period division sub-unit is used to divide the statistical period into T sub-periods, where T is a positive integer;
  • the frequency counting subunit is used to separately count the number of successful matches of each regular expression in the regular expression resource library in each sub-period;
  • the matching success rate calculation subunit is used to calculate the matching success rate of each regular expression in the regular expression resource library.
  • the data processing device may further include:
  • Data packet number statistics module used to count the total number of data packets waiting to be processed
  • the configuration file obtaining module is configured to obtain the preset configuration files of each standby data processing terminal if the total number of data packets waiting to be processed is greater than the preset number threshold, and determine each standby data processing according to the configuration file The data format corresponding to the terminal;
  • the cluster division module is used to divide each standby data processing terminal into the corresponding data processing cluster
  • a cluster selection module for selecting a target cluster corresponding to the data packet
  • the data packet sending module is used to send the data packet to the target cluster for processing.
  • the data processing device may further include:
  • the number query module is configured to send a data packet query request to each backup data processing terminal in the target cluster, and respectively receive the number of data packets to be processed fed back by each backup data processing terminal in the target cluster;
  • a terminal selection module configured to select a backup data processing terminal with the smallest number of data packets to be processed from the target cluster as a preferred processing terminal
  • the data packet distribution module is used to distribute the data packet to the preferred processing terminal for processing.
  • FIG. 6 shows a schematic block diagram of a terminal device according to an embodiment of the present invention. For ease of description, only parts related to the embodiment of the present invention are shown.
  • the terminal device 6 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the terminal device 6 may include: a processor 60, a memory 61, and computer-readable instructions 62 stored in the memory 61 and running on the processor 60, such as computer-readable instructions for executing the aforementioned data processing method .
  • the processor 60 executes the computer-readable instructions 62, the steps in the foregoing embodiments of the data processing method are implemented, such as steps S101 to S104 shown in FIG. 1.
  • the processor 60 executes the computer-readable instructions 62, the functions of the modules/units in the foregoing device embodiments, such as the functions of the modules 501 to 504 shown in FIG. 5, are realized.
  • the computer-readable instruction 62 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 61 and executed by the processor 60, To complete the present invention.
  • the one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 62 in the terminal device 6.
  • the processor 60 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 61 may be an internal storage unit of the terminal device 6, such as a hard disk or memory of the terminal device 6.
  • the memory 61 may also be an external storage device of the terminal device 6, for example, a plug-in hard disk equipped on the terminal device 6, a smart memory card (Smart Media Card, SMC), or a Secure Digital (SD) Card, Flash Card, etc. Further, the memory 61 may also include both an internal storage unit of the terminal device 6 and an external storage device.
  • the memory 61 is used to store the computer-readable instructions and other instructions and data required by the terminal device 6.
  • the memory 61 can also be used to temporarily store data that has been output or will be output.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through computer-readable instructions.
  • the computer-readable instructions can be stored in a non-volatile computer. Readable storage medium.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present application falls within the technical field of computers, and relates in particular to a data processing method and apparatus, a non-volatile computer-readable storage medium and a terminal device. The method comprises: receiving a data packet collected and sent by a preset packet capturing tool, wherein the data packet comprises one or more data records; carrying out format matching on a target record according to a preset regular expression resource library, and determining a data format of the data packet, wherein the regular expression resource library comprises one or more regular expressions, and each regular expression corresponds to a data format; searching a preset data processing rule base for a target processing rule, wherein the target processing rule is a data processing rule corresponding to the data format of the data packet; and respectively processing each data record in the data packet according to the target processing rule to obtain a processed data packet. No manual intervention is needed throughout the process, a large amount of time cost and labor cost is saved, and the efficiency is greatly improved.

Description

数据处理方法、装置、存储介质及终端设备Data processing method, device, storage medium and terminal equipment
本申请要求于2019年5月21日提交中国专利局、申请号为201910423175.6、发明名称为“数据处理方法、装置、计算机可读存储介质及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910423175.6, and the invention title is "data processing methods, devices, computer-readable storage media and terminal equipment" on May 21, 2019, and its entire contents Incorporated in this application by reference.
技术领域Technical field
本申请属于计算机技术领域,尤其涉及一种数据处理方法、装置、计算机非易失性可读存储介质及终端设备。This application belongs to the field of computer technology, and in particular relates to a data processing method, device, computer non-volatile readable storage medium, and terminal equipment.
背景技术Background technique
随着大数据技术的不断普及,越来越多的场景中需要对海量的数据进行分析计算,而在对这些数据进行分析计算之前,首先需要将这些数据进行预处理,将其转换为数据分析工具便于分析计算的数据格式,目前,这些数据处理工作主要依靠人工完成,在数据量较大的情况下,需要耗费大量的时间成本和人力成本,效率十分低下。With the continuous popularization of big data technology, more and more scenarios need to analyze and calculate massive amounts of data. Before analyzing and calculating these data, the data needs to be preprocessed and converted into data analysis. The tool is easy to analyze and calculate the data format. At present, these data processing tasks are mainly done manually. In the case of a large amount of data, it takes a lot of time and labor costs, and the efficiency is very low.
技术问题technical problem
有鉴于此,本申请实施例提供了一种数据处理方法、装置、计算机非易失性可读存储介质及终端设备,以解决现有的依靠人工进行数据处理时耗费大量的时间成本和人力成本,效率十分低下的问题。In view of this, the embodiments of the present application provide a data processing method, device, computer non-volatile readable storage medium, and terminal equipment to solve the existing manual data processing that consumes a lot of time and labor costs. , The problem of very low efficiency.
技术解决方案Technical solutions
本申请实施例的第一方面提供了一种数据处理方法,可以包括:The first aspect of the embodiments of the present application provides a data processing method, which may include:
接收预设的抓包工具采集并发送的数据包;Receive data packets collected and sent by the preset packet capture tool;
根据预设的正则表达式资源库对目标记录进行格式匹配,确定所述数据包的数据格式;Performing format matching on the target record according to a preset regular expression resource library, and determining the data format of the data packet;
在预设的数据处理规则库中查找目标处理规则;Find the target processing rule in the preset data processing rule library;
根据所述目标处理规则对所述数据包中的各条数据记录分别进行处理,得到处理后的数据包。Each data record in the data packet is processed separately according to the target processing rule to obtain a processed data packet.
本申请实施例的第二方面提供了一种数据处理装置,可以包括用于实现上述数据处理方法的步骤的模块。The second aspect of the embodiments of the present application provides a data processing device, which may include a module for implementing the steps of the above data processing method.
本申请实施例的第三方面提供了一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现上述数据处理方法的步骤。The third aspect of the embodiments of the present application provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores computer readable instructions, and the computer readable instructions are executed by a processor When realizing the steps of the above data processing method.
本申请实施例的第四方面提供了一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算 机可读指令时实现上述数据处理方法的步骤。The fourth aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes the computer The steps of the above data processing method are realized when the instructions are readable.
有益效果Beneficial effect
通过本申请实施例,使用正则匹配的方式自动确定出数据包的数据格式,并进一步根据相应的处理规则自动进行数据处理,即通过全自动化的方式实现了数据格式匹配以及数据处理的完整过程,整个过程无需任何人工干预,节省了大量的时间成本和人力成本,极大提升了数据处理的效率。Through the embodiment of this application, the data format of the data packet is automatically determined by regular matching, and the data processing is further automatically performed according to the corresponding processing rules, that is, the complete process of data format matching and data processing is realized in a fully automated manner. The entire process does not require any manual intervention, saving a lot of time and labor costs, and greatly improving the efficiency of data processing.
附图说明Description of the drawings
图1为本申请实施例中一种数据处理方法的一个实施例流程图;FIG. 1 is a flowchart of an embodiment of a data processing method in an embodiment of this application;
图2为对数据包进行格式匹配的示意流程图;Figure 2 is a schematic flow chart of format matching of data packets;
图3为设置多个备用数据处理终端来对数据包进行并行处理的示意图;FIG. 3 is a schematic diagram of setting multiple standby data processing terminals to perform parallel processing on data packets;
图4为对数据包进行分流处理的示意流程图;Fig. 4 is a schematic flow chart of offloading processing of data packets;
图5为本申请实施例中一种数据处理装置的一个实施例结构图;FIG. 5 is a structural diagram of an embodiment of a data processing device in an embodiment of the application;
图6为本申请实施例中一种终端设备的示意框图。Fig. 6 is a schematic block diagram of a terminal device in an embodiment of the application.
本发明的实施方式Embodiments of the invention
请参阅图1,本申请实施例中一种数据处理方法的一个实施例可以包括:Referring to FIG. 1, an embodiment of a data processing method in the embodiment of the present application may include:
步骤S101、接收预设的抓包工具采集并发送的数据包。Step S101: Receive a data packet collected and sent by a preset packet capture tool.
所述抓包工具为对在网络中传输的数据进行采集的工具,在本实施例中,所述抓包工具包括但不限于fiddler、wireshark等工具。The packet capture tool is a tool for collecting data transmitted on the network. In this embodiment, the packet capture tool includes but is not limited to tools such as fiddler and wireshark.
所述抓包工具会将采集到的数据打包为若干个数据包,并将数据包发送至预设的数据处理终端中,所述数据处理终端即为本实施例的实施主体。其中,每个数据包中均包括一条以上的数据记录,每个数据包具体容纳的数据记录的条数可以根据实际情况进行设置,例如,可以将其设置为1000条、2000条、5000条或者其它取值等等。The packet capture tool packs the collected data into several data packets, and sends the data packets to a preset data processing terminal. The data processing terminal is the implementation subject of this embodiment. Among them, each data packet includes more than one data record. The number of data records contained in each data packet can be set according to the actual situation. For example, it can be set to 1000, 2000, 5000 or Other values and so on.
如下所示,即为某一数据包中各条数据记录的具体实例:The following is a specific example of each data record in a data packet:
c0-e1=string:tsalesApplyCustContactc0-e1=string:tsalesApplyCustContact
c0-e4=string:tsalesApplyCustc0-e4=string:tsalesApplyCust
c0-e6=string:truec0-e6=string:true
c0-e7=string:updc0-e7=string:upd
c0-e8=string:2011068c0-e8=string:2011068
c0-e9=string:2c0-e9=string: 2
c0-e10=string:1c0-e10=string:1
c0-e11=string:1c0-e11=string:1
c0-e12=string:c0-e12=string:
c0-e13=string:9c0-e13=string:9
c0-e14=string:tsalesApplyCustc0-e14=string:tsalesApplyCust
c0-e15=string:1c0-e15=string:1
其中,每一行均为一条数据记录。Among them, each row is a data record.
需要注意的是,每个数据包中的各条数据记录均为针对同一业务场景所采集的数据,同一数据包中的各条数据记录均具有相同的数据格式,不同数据包中的数据记录可以具有相同的数据格式,也可以是不同的数据格式。其中,数据格式是指数据记录所呈现出来的规律性的格式特征,如前例所示,其数据格式为:每条数据记录以c开头,其后为若干位十进制数字(至少一个),其后为“-e”,其后为若干位十进制数字(至少一个),其后为“=string:”,其后为由若干位十进制数字或字符组成的字符串(字符串长度可以为0)。It should be noted that each data record in each data package is data collected for the same business scenario. Each data record in the same data package has the same data format. The data records in different data packages can be Have the same data format, or different data formats. Among them, the data format refers to the regular format characteristics of the data record. As shown in the previous example, the data format is: each data record starts with c, followed by a number of decimal digits (at least one), and then It is "-e", followed by a number of decimal digits (at least one), followed by "=string:", followed by a character string consisting of several decimal digits or characters (the string length can be 0).
步骤S102、根据预设的正则表达式资源库对目标记录进行格式匹配,确定所述数据包的数据格式。Step S102: Perform format matching on the target record according to a preset regular expression resource library, and determine the data format of the data packet.
所述正则表达式资源库中包括一个以上的正则表达式,每个正则表达式均对应于一种数据格式。对于每一个业务场景,都可以预先设置与其数据格式相对应的正则表达式,将各个数据格式的正则表达式构造为如下表所示的正则表达式资源库:The regular expression resource library includes more than one regular expression, and each regular expression corresponds to a data format. For each business scenario, the regular expression corresponding to its data format can be preset, and the regular expression of each data format can be constructed into the regular expression resource library as shown in the following table:
数据格式Data Format 正则表达式Regular expression
数据格式aData format a 正则表达式1Regular expression 1
数据格式bData format b 正则表达式2Regular expression 2
数据格式cData format c 正则表达式3Regular expression 3
……... ……...
……... ……...
正则表达式,又称规则表达式(Regular Expression),是计算机科学的一个概念,通常被用来检索、替换那些符合某个模式(规则)的文本,就是用事先定义好的一些特定字符、及这些特定字符的组合,组成一个“规则字符串”,这个“规则字符串”用来表达对字符串的一种过滤逻辑。正则表达式是一种文本模式,模式描述在搜索文本时要匹配的一个或多个字符串。Regular expression, also known as regular expression (Regular Expression), is a concept of computer science. It is usually used to retrieve and replace texts that conform to a certain pattern (rule), which is to use certain pre-defined characters, and The combination of these specific characters forms a "rule string", which is used to express a filtering logic for the string. A regular expression is a text pattern that describes one or more strings to be matched when searching for text.
当所述数据处理终端需要对接收到的数据包进行格式匹配时,首先从该正则表达式资源库中选取其中的一个正则表达式对所述数据包中的数据记录进行匹配,若匹配成功,则可确定所述数据包的数据格式即为在正则表达式资源库中这一正则表达式所对应的数据格式。When the data processing terminal needs to match the format of the received data packet, it first selects one of the regular expressions from the regular expression resource library to match the data records in the data packet, and if the matching is successful, It can be determined that the data format of the data packet is the data format corresponding to the regular expression in the regular expression resource library.
由于同一数据包中的各条数据记录均具有相同的数据格式,因此,在使用正则表 达式进行格式匹配时,可以从所述数据包中任意选取一条数据记录(也即所述目标记录)进行格式匹配即可,而无需对所述数据包中的所有数据记录均进行格式匹配。Since each data record in the same data packet has the same data format, when using regular expressions for format matching, one data record (that is, the target record) can be arbitrarily selected from the data packet. The format matching is sufficient, and format matching is not required for all data records in the data packet.
若格式匹配失败,再选取下一个正则表达式对所述数据包进行格式匹配,并不断重复以上过程,直至格式匹配成功为止。If the format matching fails, the next regular expression is selected to perform format matching on the data packet, and the above process is repeated continuously until the format matching succeeds.
需要注意的是,上述正则表达式资源库的具体内容可以根据实际情况进行调整,例如,当某些数据格式的数据不再需要进行分析时,可以将其对应的条目从该正则表达式资源库中删除,当新增了某些数据格式的数据需要进行分析时,可以将其对应的条目添加入该正则表达式资源库中,还可以根据实际情况对某一数据格式所对应的正则表达式进行修改,以保持该正则表达式资源库可以适用于最新的业务场景。It should be noted that the specific content of the above regular expression resource library can be adjusted according to the actual situation. For example, when some data format data no longer needs to be analyzed, the corresponding entry can be removed from the regular expression resource library When data in some new data formats needs to be analyzed, the corresponding entries can be added to the regular expression resource library, and the regular expression corresponding to a certain data format can also be determined according to the actual situation. Make modifications to keep the regular expression resource library applicable to the latest business scenarios.
优选地,考虑到不同数据格式的数据包出现的概率可能会存在着较大的差异,例如,某一个或者某几个数据格式的数据包总数可能会占据所有数据包总数的绝大部分,而其它数据格式的数据包可能只有很少的数量,为了减少匹配次数,可以根据如图2所示的过程对所述数据包进行格式匹配:Preferably, considering that there may be large differences in the probability of data packets in different data formats, for example, the total number of data packets in a certain or several data formats may occupy most of the total number of data packets, and There may be only a small number of data packets in other data formats. In order to reduce the number of matching times, the data packets can be format matched according to the process shown in Figure 2:
步骤S1021、根据在预设的统计时段内的历史匹配记录分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率。In step S1021, the matching success rate of each regular expression in the regular expression resource library is calculated according to the historical matching records in the preset statistical time period.
所述统计时段可以根据实际情况设置为1个月,2个月,3个月,半年,一年或者其它取值,由于过于久远的数据参考意义不大,因此一般将其设置在一年之内为宜。The statistical period can be set to 1 month, 2 months, 3 months, half a year, one year, or other values according to the actual situation. Because the data that is too long has little reference, it is generally set within one year. Appropriate within.
匹配成功率与正则表达式在历史匹配记录中的匹配成功次数正相关,即匹配成功次数越多,则匹配成功率越高,匹配成功次数越少,则匹配成功率越低。该历史匹配记录中记录了每次对数据包格式匹配成功时所使用的正则表达式,例如,若历史上共对50个数据包进行了格式匹配,其中,有30次是由正则表达式1匹配成功的,有14次是由正则表达式2匹配成功的,有6次是由正则表达式3匹配成功的,说明使用正则表达式1进行匹配的成功率最高,使用正则表达式2进行匹配的成功率次之,使用正则表达式3进行匹配的成功率最低,则可以将正则表达式1设置为最高的匹配成功率,将正则表达式2设置为次高的匹配成功率,将正则表达式3设置为最低的匹配成功率。The matching success rate is positively correlated with the number of matching successes of the regular expression in the historical matching record, that is, the more matching successes, the higher the matching success rate, and the fewer the matching successes, the lower the matching success rate. The historical matching record records the regular expression used every time the data packet format is successfully matched. For example, if a total of 50 data packets are formatted in history, 30 of them are caused by regular expression 1. The matching is successful, 14 times are matched by regular expression 2 and 6 times are matched by regular expression 3, indicating that the success rate of matching using regular expression 1 is the highest, and using regular expression 2 for matching The success rate of matching is the second, and the matching success rate using regular expression 3 is the lowest. You can set regular expression 1 to the highest matching success rate, and regular expression 2 to the second highest matching success rate, and use regular expression Equation 3 is set as the lowest matching success rate.
为了进行精确计算,在本实施例的一种具体实现中,可以首先将所述统计时段划分为T个子时段,T为正整数,T的取值可以根据实际情况设置,例如,可以将其设置为5、10、20或者其它取值。需要注意地是,T取值越大,则计算量也越大,但计算精度越高;T取值越小,则计算量也越大,但计算精度越低,需要根据实际情况对这两者进行权衡。In order to perform accurate calculations, in a specific implementation of this embodiment, the statistical period can be first divided into T sub-periods, where T is a positive integer, and the value of T can be set according to actual conditions, for example, it can be set It is 5, 10, 20 or other values. It should be noted that the larger the value of T, the greater the amount of calculation, but the higher the calculation accuracy; the smaller the value of T, the greater the amount of calculation, but the lower the calculation accuracy, you need to adjust the two according to the actual situation The trade-off.
然后,分别统计所述正则表达式资源库中的各个正则表达式在各个子时段内的匹 配成功次数,并根据下式分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率:Then, the number of matching successes of each regular expression in the regular expression resource library in each sub-period is counted separately, and the matching success rate of each regular expression in the regular expression resource library is calculated separately according to the following formula :
Figure PCTCN2019103039-appb-000001
Figure PCTCN2019103039-appb-000001
其中,n为正则表达式的序号,1≤n≤N,N为所述正则表达式资源库中的正则表达式的总数,t为子时段按照时间先后顺序排列的序号,1≤t≤T,在时间维度上越早的子时段其t的取值越小,MatSucNum n,t为所述正则表达式资源库中的第n个正则表达式在第t个子时段内的匹配成功次数,Weight t为预设的权重系数,且Weight t<Weight t+1,即越靠后的子时间段权重系数越大,这是因为与当前时刻越接近的数据,其参考意义越大,而与当前时刻越久远的数据,其参考意义越小,例如,本周记录的数据显然要比几个月前的数据更能反映用户当前的使用习惯,MatSucRatio n为所述正则表达式资源库中的第n个正则表达式的匹配成功率。 Among them, n is the sequence number of the regular expression, 1≤n≤N, N is the total number of regular expressions in the regular expression resource library, t is the sequence number of the sub-period in chronological order, 1≤t≤T , The earlier the sub-period in the time dimension, the smaller the value of t, MatSucNum n,t is the number of successful matches of the n-th regular expression in the regular expression resource library in the t-th sub-period, Weight t It is the preset weight coefficient, and Weight t <Weight t+1 , that is, the later the sub-period has the larger the weight coefficient. This is because the closer the data to the current moment, the greater the reference significance, and the greater the reference The longer the data, the smaller the reference meaning. For example, the data recorded this week obviously reflects the current user habits better than the data a few months ago. MatSucRatio n is the nth in the regular expression resource library. The matching success rate of a regular expression.
步骤S1022、从所述正则表达式资源库中选取一个尚未被选取过的匹配成功率最高的正则表达式作为候选表达式。Step S1022, from the regular expression resource library, select a regular expression with the highest matching success rate that has not been selected as a candidate expression.
步骤S1023、使用所述候选表达式对所述目标记录进行格式匹配。Step S1023: Use the candidate expression to perform format matching on the target record.
例如,若候选表达式为:“^c[0-9]{1,}-e[0-9]{1,}=string:”,其中,^表示行首位置,[0-9]表示0至9中的任意一个数字,{1,}表示至少匹配一次,则该正则表达式即可对数据记录以c开头,其后为若干位十进制数字(至少一个),其后为“-e”,其后为若干位十进制数字(至少一个),其后为“=string:”,其后为由若干位十进制数字或字符组成的字符串(字符串长度可以为0)这样的数据记录进行匹配,仍以上述所举的数据包为例,其中的任一条数据记录均可与该候选表达式匹配成功,因此可确定格式匹配成功,反之,则可确定格式匹配失败。For example, if the candidate expression is: "^c[0-9]{1,}-e[0-9]{1,}=string:", where ^ represents the position at the beginning of the line and [0-9] represents Any number from 0 to 9, {1,} means at least one match, then the regular expression can start the data record with c, followed by a number of decimal digits (at least one), followed by "-e ", followed by a number of decimal digits (at least one), followed by "=string:", followed by a string of decimal digits or characters (string length can be 0) such data recording For matching, still taking the data packet mentioned above as an example, any one of the data records can be successfully matched with the candidate expression, so it can be determined that the format matching is successful, otherwise, it can be determined that the format matching fails.
步骤S1024、判断格式匹配是否成功。Step S1024: Determine whether the format matching is successful.
若格式匹配失败,则返回执行步骤S1022及其后续步骤,直至格式匹配成功为止;若格式匹配成功,则执行步骤S1025。If the format matching fails, return to step S1022 and subsequent steps until the format matching succeeds; if the format matching succeeds, perform step S1025.
步骤S1025、将与所述候选表达式对应的数据格式确定为所述数据包的数据格式。Step S1025: Determine the data format corresponding to the candidate expression as the data format of the data packet.
在进行格式匹配时,按照匹配成功率从高到低的顺序从所述正则表达式资源库中依次选取各个正则表达式,这样,可以通过最少的匹配次数来完成格式匹配过程,加快对数据包进行格式匹配的速度。When format matching is performed, each regular expression is selected from the regular expression resource library in order of the matching success rate from high to low. In this way, the format matching process can be completed with the least number of matches, and the data packet The speed of format matching.
步骤S103、在预设的数据处理规则库中查找目标处理规则。Step S103: Search for the target processing rule in a preset data processing rule library.
所述目标处理规则为与所述数据包的数据格式对应的数据处理规则。The target processing rule is a data processing rule corresponding to the data format of the data packet.
在本实施例中,对各种不同数据格式的数据包将会采取不同的数据处理规则,从而生成后续数据分析工具便于分析计算的数据格式,可以预先设置与各个数据格式相对应的数据处理规则,将各个数据格式的数据处理规则构造为如下表所示的数据处理规则库:In this embodiment, different data processing rules will be adopted for data packets of various data formats, so as to generate data formats that are convenient for subsequent data analysis tools to analyze and calculate. Data processing rules corresponding to each data format can be preset , Construct the data processing rules of each data format into the data processing rule library shown in the following table:
数据格式Data Format 数据处理规则Data processing rules
数据格式aData format a 数据处理规则1Data Processing Rule 1
数据格式bData format b 数据处理规则2Data processing rules 2
数据格式cData format c 数据处理规则3Data Processing Rule 3
……... ……...
……... ……...
需要注意的是,上述正则库数据处理规则库的具体内容可以根据实际情况进行调整,包括但不限于对数据处理规则的新增、删除及修改等。It should be noted that the specific content of the above-mentioned regular database data processing rule database can be adjusted according to actual conditions, including but not limited to adding, deleting, and modifying data processing rules.
步骤S104、根据所述目标处理规则对所述数据包中的各条数据记录分别进行处理,得到处理后的数据包。Step S104: Process each data record in the data packet separately according to the target processing rule to obtain a processed data packet.
以如下所示数据格式的数据包为例:Take the data packet in the following data format as an example:
c0-e1=string:tsalesApplyCustContactc0-e1=string:tsalesApplyCustContact
c0-e4=string:tsalesApplyCustc0-e4=string:tsalesApplyCust
c0-e6=string:truec0-e6=string:true
c0-e7=string:updc0-e7=string:upd
c0-e8=string:2011068c0-e8=string:2011068
c0-e9=string:2c0-e9=string: 2
c0-e10=string:1c0-e10=string:1
c0-e11=string:1c0-e11=string:1
c0-e12=string:c0-e12=string:
c0-e13=string:9c0-e13=string:9
c0-e14=string:tsalesApplyCustc0-e14=string:tsalesApplyCust
c0-e15=string:1c0-e15=string:1
与其对应的数据处理规则可以设置为:将每条数据记录分为两个部分,第一部分为等号之前的数据(c0-e1),第一部分为等号之后的数据(string:tsalesApplyCustContact),每一部分的数据均加引号,且两部分之间用冒号分隔("c0-e1":"string:tsalesApplyCustContact"),最后,将各条数据用逗号分隔,并整体加上大括号,从而形成如下所示数据格式的数据包:The corresponding data processing rules can be set as follows: divide each data record into two parts, the first part is the data before the equal sign (c0-e1), the first part is the data after the equal sign (string:tsalesApplyCustContact), each One part of the data is enclosed in quotation marks, and the two parts are separated by a colon ("c0-e1": "string:tsalesApplyCustContact"). Finally, each piece of data is separated by a comma, and the whole set of braces is added to form the following Data packet showing the data format:
{"c0-e1":"string:tsalesApplyCustContact","c0-e4":"string:tsalesApplyCust","c0-e6":"string:true","c0-e7":"string:upd","c0-e8":"string:2011068","c0-e9":"string:2","c0-e10":"string:1","c0-e11":"string:1","c0-e12":"string:","c0-e13":"string:9","c0-e14":"string:tsalesApplyCust","c0-e15":"string:1"}{"c0-e1": "string:tsalesApplyCustContact","c0-e4":"string:tsalesApplyCust","c0-e6":"string:true","c0-e7":"string:upd"," c0-e8":"string:2011068","c0-e9":"string:2","c0-e10":"string:1","c0-e11":"string:1","c0- e12":"string:","c0-e13":"string:9","c0-e14":"string:tsalesApplyCust","c0-e15":"string:1"}
需要注意的是,以上仅为数据处理规则的一个示例,实际使用中,可以根据具体场景设置与要处理的数据包的数据格式相对应的数据处理规则,此处不再赘述。It should be noted that the above is only an example of the data processing rule. In actual use, the data processing rule corresponding to the data format of the data packet to be processed can be set according to the specific scenario, which will not be repeated here.
进一步地,考虑到在实际应用中可能会出现海量数据包待处理的极端情况,而在这种极端情况下,仅通过所述数据处理终端进行处理,则会负荷过重,为了解决这一问题,如图3所示,本实施例中还可以设置多个备用数据处理终端来对数据包进行并行处理。Further, considering that there may be extreme cases of massive data packets to be processed in practical applications, and in this extreme case, only processing through the data processing terminal will be overloaded, in order to solve this problem As shown in FIG. 3, in this embodiment, multiple standby data processing terminals can also be set to process data packets in parallel.
具体地,在步骤S102确定所述数据包的数据格式之后,可以首先统计所述数据处理终端中等待处理的数据包的总数目,若所述等待处理的数据包的总数目小于或等于预设的数目阈值,则仍按照图1所示的过程进行处理,所述数目阈值可以根据实际情况进行设置,例如,可以将其设置为100、200、500或者其它取值。若所述等待处理的数据包的总数目大于所述数目阈值,则按照图4所示的过程进行处理:Specifically, after the data format of the data packet is determined in step S102, the total number of data packets waiting to be processed in the data processing terminal may be counted first, if the total number of data packets waiting to be processed is less than or equal to a preset The number threshold is still processed in accordance with the process shown in FIG. 1. The number threshold can be set according to actual conditions, for example, it can be set to 100, 200, 500 or other values. If the total number of data packets waiting to be processed is greater than the number threshold, processing is performed according to the process shown in FIG. 4:
步骤S401、获取预设的各个备用数据处理终端的配置文件,并根据所述配置文件确定各个备用数据处理终端所对应的数据格式。Step S401: Obtain the preset configuration files of each standby data processing terminal, and determine the data format corresponding to each standby data processing terminal according to the configuration file.
每个备用数据处理终端专职用于对某一种数据格式的数据包进行处理,这一对应关系会预先存储在各个备用数据处理终端的配置文件中,所述数据处理终端可以获取这些配置文件,并据此确定各个备用数据处理终端所对应的数据格式。Each spare data processing terminal is dedicated to processing data packets of a certain data format, and this corresponding relationship will be stored in advance in the configuration files of each spare data processing terminal, and the data processing terminal can obtain these configuration files. Based on this, the data format corresponding to each standby data processing terminal is determined.
步骤S402、将各个备用数据处理终端划分至对应的数据处理集群中。Step S402: Divide each standby data processing terminal into a corresponding data processing cluster.
如图4所示,在本实施例中,优选将所有的数据处理终端划分为两个以上的数据处理集群,其中,同一数据处理集群中的备用数据处理终端所对应的数据格式均一致。As shown in FIG. 4, in this embodiment, all data processing terminals are preferably divided into two or more data processing clusters, wherein the data formats corresponding to the spare data processing terminals in the same data processing cluster are all consistent.
步骤S403、选取与所述数据包对应的目标集群。Step S403: Select a target cluster corresponding to the data packet.
所述目标集群中各个备用数据处理终端所对应的数据格式与所述数据包的数据格式一致。The data format corresponding to each spare data processing terminal in the target cluster is consistent with the data format of the data packet.
步骤S404、将所述数据包发送至所述目标集群进行处理。Step S404: Send the data packet to the target cluster for processing.
由于所述目标集群中的各个数据处理终端与所述数据包的数据格式均一致,能够更加快速的对所述数据包进行处理。Since each data processing terminal in the target cluster has the same data format as the data packet, the data packet can be processed more quickly.
进一步地,所述数据处理终端可以向所述目标集群中的各个备用数据处理终端分别发送数据包查询请求,并分别接收所述目标集群中的各个备用数据处理终端反馈的待处理数据包数目,然后从所述目标集群中选取待处理数据包数目最小的备用数据处 理终端作为优选处理终端,并将所述数据包分配至所述优选处理终端进行处理,所述优选终端的处理过程与步骤S104中的处理过程类似,具体可参照前述具体内容,此处不再赘述。Further, the data processing terminal may respectively send a data packet query request to each backup data processing terminal in the target cluster, and respectively receive the number of to-be-processed data packets fed back by each backup data processing terminal in the target cluster, Then select the backup data processing terminal with the smallest number of data packets to be processed from the target cluster as the preferred processing terminal, and allocate the data packets to the preferred processing terminal for processing. The processing procedure of the preferred terminal is the same as step S104 The processing process in is similar. For details, please refer to the foregoing specific content, which will not be repeated here.
通过图4所示的过程,当对数据流中的各个数据包均进行过格式匹配之后,按照格式匹配的结果将各个数据包分流到与其数据格式所对应的数据处理集群中进行处理。此时,各个数据处理集群中的各个备用数据处理终端将同时对各个数据格式的数据包进行并行处理,从而提升整体的数据处理效率。Through the process shown in FIG. 4, after format matching is performed on each data packet in the data stream, each data packet is distributed to the data processing cluster corresponding to its data format for processing according to the result of the format matching. At this time, each standby data processing terminal in each data processing cluster will simultaneously process data packets in each data format in parallel, thereby improving overall data processing efficiency.
综上所述,本申请实施例使用正则匹配的方式自动确定出数据包的数据格式,并进一步根据相应的处理规则自动进行数据处理,即通过全自动化的方式实现了数据格式匹配以及数据处理的完整过程,整个过程无需任何人工干预,节省了大量的时间成本和人力成本,极大提升了数据处理的效率。In summary, the embodiment of the application uses regular matching to automatically determine the data format of the data packet, and further automatically performs data processing according to the corresponding processing rules, that is, the data format matching and data processing are realized in a fully automated manner. The whole process, without any manual intervention, saves a lot of time and labor costs, and greatly improves the efficiency of data processing.
对应于上文实施例所述的一种数据处理方法,图5示出了本申请实施例提供的一种数据处理装置的一个实施例结构图。Corresponding to the data processing method described in the above embodiment, FIG. 5 shows a structural diagram of an embodiment of a data processing apparatus provided in an embodiment of the present application.
本实施例中,一种数据处理装置可以包括:In this embodiment, a data processing device may include:
数据包接收模块501,用于接收预设的抓包工具采集并发送的数据包;The data packet receiving module 501 is configured to receive data packets collected and sent by a preset packet capture tool;
格式匹配模块502,用于根据预设的正则表达式资源库对目标记录进行格式匹配,确定所述数据包的数据格式;The format matching module 502 is configured to perform format matching on the target record according to a preset regular expression resource library, and determine the data format of the data packet;
处理规则查找模块503,用于在预设的数据处理规则库中查找目标处理规则;The processing rule search module 503 is used to search for the target processing rule in a preset data processing rule library;
数据处理模块504,用于根据所述目标处理规则对所述数据包中的各条数据记录分别进行处理,得到处理后的数据包。The data processing module 504 is configured to separately process each data record in the data packet according to the target processing rule to obtain a processed data packet.
进一步地,所述格式匹配模块可以包括:Further, the format matching module may include:
匹配成功率计算单元,用于根据在预设的统计时段内的历史匹配记录分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率;The matching success rate calculation unit is configured to calculate the matching success rate of each regular expression in the regular expression resource library according to historical matching records in a preset statistical period;
候选表达式选取单元,用于从所述正则表达式资源库中选取一个尚未被选取过的匹配成功率最高的正则表达式作为候选表达式;The candidate expression selection unit is used to select a regular expression with the highest matching success rate that has not been selected as a candidate expression from the regular expression resource library;
格式匹配单元,用于使用所述候选表达式对所述目标记录进行格式匹配;A format matching unit, configured to use the candidate expression to perform format matching on the target record;
第一处理单元,用于若格式匹配失败,则返回执行所述从所述正则表达式资源库中选取一个尚未被选取过的匹配成功率最高的正则表达式作为候选表达式的步骤,直至格式匹配成功为止;The first processing unit is configured to return and execute the step of selecting a regular expression with the highest matching success rate from the regular expression resource library that has not been selected as a candidate expression if the format matching fails, until the format Until the match is successful;
第二处理单元,用于若格式匹配成功,则将与匹配成功的所述候选表达式对应的数据格式确定为所述数据包的数据格式。The second processing unit is configured to determine the data format corresponding to the candidate expression successfully matched as the data format of the data packet if the format matching is successful.
进一步地,所述匹配成功率计算单元可以包括:Further, the matching success rate calculation unit may include:
子时段划分子单元,用于将所述统计时段划分为T个子时段,T为正整数;The sub-period division sub-unit is used to divide the statistical period into T sub-periods, where T is a positive integer;
次数统计子单元,用于分别统计所述正则表达式资源库中的各个正则表达式在各个子时段内的匹配成功次数;The frequency counting subunit is used to separately count the number of successful matches of each regular expression in the regular expression resource library in each sub-period;
匹配成功率计算子单元,用于分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率。The matching success rate calculation subunit is used to calculate the matching success rate of each regular expression in the regular expression resource library.
进一步地,所述数据处理装置还可以包括:Further, the data processing device may further include:
数据包数目统计模块,用于统计等待处理的数据包的总数目;Data packet number statistics module, used to count the total number of data packets waiting to be processed;
配置文件获取模块,用于若所述等待处理的数据包的总数目大于预设的数目阈值,则获取预设的各个备用数据处理终端的配置文件,并根据所述配置文件确定各个备用数据处理终端所对应的数据格式;The configuration file obtaining module is configured to obtain the preset configuration files of each standby data processing terminal if the total number of data packets waiting to be processed is greater than the preset number threshold, and determine each standby data processing according to the configuration file The data format corresponding to the terminal;
集群划分模块,用于将各个备用数据处理终端划分至对应的数据处理集群中;The cluster division module is used to divide each standby data processing terminal into the corresponding data processing cluster;
集群选取模块,用于选取与所述数据包对应的目标集群;A cluster selection module for selecting a target cluster corresponding to the data packet;
数据包发送模块,用于将所述数据包发送至所述目标集群进行处理。The data packet sending module is used to send the data packet to the target cluster for processing.
进一步地,所述数据处理装置还可以包括:Further, the data processing device may further include:
数目查询模块,用于向所述目标集群中的各个备用数据处理终端分别发送数据包查询请求,并分别接收所述目标集群中的各个备用数据处理终端反馈的待处理数据包数目;The number query module is configured to send a data packet query request to each backup data processing terminal in the target cluster, and respectively receive the number of data packets to be processed fed back by each backup data processing terminal in the target cluster;
终端选取模块,用于从所述目标集群中选取待处理数据包数目最小的备用数据处理终端作为优选处理终端;A terminal selection module, configured to select a backup data processing terminal with the smallest number of data packets to be processed from the target cluster as a preferred processing terminal;
数据包分配模块,用于将所述数据包分配至所述优选处理终端进行处理。The data packet distribution module is used to distribute the data packet to the preferred processing terminal for processing.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置,模块和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the devices, modules and units described above can refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
图6示出了本发明实施例提供的一种终端设备的示意框图,为了便于说明,仅示出了与本发明实施例相关的部分。FIG. 6 shows a schematic block diagram of a terminal device according to an embodiment of the present invention. For ease of description, only parts related to the embodiment of the present invention are shown.
在本实施例中,所述终端设备6可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。该终端设备6可包括:处理器60、存储器61以及存储在所述存储器61中并可在所述处理器60上运行的计算机可读指令62,例如执行上述的数据处理方法的计算机可读指令。所述处理器60执行所述计算机可读指令62时实现上述各个数据处理方法实施例中的步骤,例如图1所示的步骤S101至S104。或者,所述处理器60执行所述计算机可读指令62时实现上述各装置实施例中各模块/单元的功能,例如图5所示模块501至504的功能。In this embodiment, the terminal device 6 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The terminal device 6 may include: a processor 60, a memory 61, and computer-readable instructions 62 stored in the memory 61 and running on the processor 60, such as computer-readable instructions for executing the aforementioned data processing method . When the processor 60 executes the computer-readable instructions 62, the steps in the foregoing embodiments of the data processing method are implemented, such as steps S101 to S104 shown in FIG. 1. Alternatively, when the processor 60 executes the computer-readable instructions 62, the functions of the modules/units in the foregoing device embodiments, such as the functions of the modules 501 to 504 shown in FIG. 5, are realized.
示例性的,所述计算机可读指令62可以被分割成一个或多个模块/单元,所述一个 或者多个模块/单元被存储在所述存储器61中,并由所述处理器60执行,以完成本发明。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述所述计算机可读指令62在所述终端设备6中的执行过程。Exemplarily, the computer-readable instruction 62 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 61 and executed by the processor 60, To complete the present invention. The one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 62 in the terminal device 6.
所述处理器60可以是中央处理单元(Central Processing Unit,CPU),还可以是其它通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 60 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
所述存储器61可以是所述终端设备6的内部存储单元,例如终端设备6的硬盘或内存。所述存储器61也可以是所述终端设备6的外部存储设备,例如所述终端设备6上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器61还可以既包括所述终端设备6的内部存储单元也包括外部存储设备。所述存储器61用于存储所述计算机可读指令以及所述终端设备6所需的其它指令和数据。所述存储器61还可以用于暂时地存储已经输出或者将要输出的数据。The memory 61 may be an internal storage unit of the terminal device 6, such as a hard disk or memory of the terminal device 6. The memory 61 may also be an external storage device of the terminal device 6, for example, a plug-in hard disk equipped on the terminal device 6, a smart memory card (Smart Media Card, SMC), or a Secure Digital (SD) Card, Flash Card, etc. Further, the memory 61 may also include both an internal storage unit of the terminal device 6 and an external storage device. The memory 61 is used to store the computer-readable instructions and other instructions and data required by the terminal device 6. The memory 61 can also be used to temporarily store data that has been output or will be output.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above-mentioned functional units and modules is used as an example. In practical applications, the above-mentioned functions can be allocated to different functional units and modules as required. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only used to facilitate distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which is not repeated here.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述 实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一计算机非易失性可读存储介质中。If the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. Readable storage medium.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一计算机非易失性可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims (20)

  1. 一种数据处理方法,其特征在于,包括:A data processing method, characterized by comprising:
    接收预设的抓包工具采集并发送的数据包,所述数据包中包括一条以上的数据记录;Receiving a data packet collected and sent by a preset packet capture tool, the data packet including more than one data record;
    根据预设的正则表达式资源库对目标记录进行格式匹配,确定所述数据包的数据格式,所述正则表达式资源库中包括一个以上的正则表达式,每个正则表达式均对应于一种数据格式,所述目标记录为所述数据包中的任意一条数据记录;The format of the target record is matched according to the preset regular expression resource library to determine the data format of the data packet. The regular expression resource library includes more than one regular expression, and each regular expression corresponds to one A data format, the target record is any data record in the data packet;
    在预设的数据处理规则库中查找目标处理规则,所述目标处理规则为与所述数据包的数据格式对应的数据处理规则;Searching for a target processing rule in a preset data processing rule library, where the target processing rule is a data processing rule corresponding to the data format of the data packet;
    根据所述目标处理规则对所述数据包中的各条数据记录分别进行处理,得到处理后的数据包。Each data record in the data packet is processed separately according to the target processing rule to obtain a processed data packet.
  2. 根据权利要求1所述的数据处理方法,其特征在于,所述根据预设的正则表达式资源库对目标记录进行格式匹配,确定所述数据包的数据格式包括:The data processing method according to claim 1, wherein the format matching of the target record according to a preset regular expression resource library, and determining the data format of the data packet comprises:
    根据在预设的统计时段内的历史匹配记录分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率;Respectively calculating the matching success rate of each regular expression in the regular expression resource library according to historical matching records in a preset statistical period;
    从所述正则表达式资源库中选取一个尚未被选取过的匹配成功率最高的正则表达式作为候选表达式;Selecting a regular expression with the highest matching success rate that has not been selected as a candidate expression from the regular expression resource library;
    使用所述候选表达式对所述目标记录进行格式匹配;Use the candidate expression to perform format matching on the target record;
    若格式匹配失败,则返回执行所述从所述正则表达式资源库中选取一个尚未被选取过的匹配成功率最高的正则表达式作为候选表达式的步骤,直至格式匹配成功为止;If the format matching fails, returning to the step of selecting a regular expression with the highest matching success rate from the regular expression resource library that has not been selected as a candidate expression until the format matching is successful;
    若格式匹配成功,则将与匹配成功的所述候选表达式对应的数据格式确定为所述数据包的数据格式。If the format matching is successful, the data format corresponding to the candidate expression that is successfully matched is determined as the data format of the data packet.
  3. 根据权利要求2所述的数据处理方法,其特征在于,所述根据在预设的统计时段内的历史匹配记录分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率包括:The data processing method according to claim 2, wherein the calculating the matching success rate of each regular expression in the regular expression resource library according to historical matching records within a preset statistical time period comprises:
    将所述统计时段划分为T个子时段,T为正整数;Divide the statistical period into T sub-periods, where T is a positive integer;
    分别统计所述正则表达式资源库中的各个正则表达式在各个子时段内的匹配成功次数;Respectively count the number of successful matching of each regular expression in the regular expression resource library in each sub-period;
    根据下式分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率:The matching success rate of each regular expression in the regular expression resource library is calculated according to the following formula:
    Figure PCTCN2019103039-appb-100001
    Figure PCTCN2019103039-appb-100001
    其中,n为正则表达式的序号,1≤n≤N,N为所述正则表达式资源库中的正则表达式的总数,t为子时段按照时间先后顺序排列的序号,1≤t≤T,MatSucNum n,t为所述正则表达式资源库中的第n个正则表达式在第t个子时段内的匹配成功次数,Weight t为预设的权重系数,且Weight t<Weight t+1,MatSucRatio n为所述正则表达式资源库中的第n个正则表达式的匹配成功率。 Among them, n is the sequence number of the regular expression, 1≤n≤N, N is the total number of regular expressions in the regular expression resource library, t is the sequence number of the sub-period in chronological order, 1≤t≤T , MatSucNum n,t is the number of successful matching of the nth regular expression in the t-th sub-period in the regular expression resource library, Weight t is the preset weight coefficient, and Weight t <Weight t+1 , MatSucRatio n is the matching success rate of the nth regular expression in the regular expression resource library.
  4. 根据权利要求1至3中任一项所述的数据处理方法,其特征在于,在确定所述数据包的数据格式之后,还包括:The data processing method according to any one of claims 1 to 3, wherein after determining the data format of the data packet, it further comprises:
    统计等待处理的数据包的总数目;Count the total number of data packets waiting to be processed;
    若所述等待处理的数据包的总数目大于预设的数目阈值,则获取预设的各个备用数据处理终端的配置文件,并根据所述配置文件确定各个备用数据处理终端所对应的数据格式;If the total number of data packets waiting to be processed is greater than a preset number threshold, acquiring preset configuration files of each standby data processing terminal, and determining the data format corresponding to each standby data processing terminal according to the configuration file;
    将各个备用数据处理终端划分至对应的数据处理集群中,其中,同一数据处理集群中的备用数据处理终端所对应的数据格式均一致;Divide each spare data processing terminal into corresponding data processing clusters, wherein the data formats corresponding to the spare data processing terminals in the same data processing cluster are all consistent;
    选取与所述数据包对应的目标集群,所述目标集群中各个备用数据处理终端所对应的数据格式与所述数据包的数据格式一致;Selecting a target cluster corresponding to the data packet, and the data format corresponding to each standby data processing terminal in the target cluster is consistent with the data format of the data packet;
    将所述数据包发送至所述目标集群进行处理。The data packet is sent to the target cluster for processing.
  5. 根据权利要求4所述的数据处理方法,其特征在于,在将所述数据包发送至所述目标集群进行处理之后,还包括:The data processing method according to claim 4, wherein after the data packet is sent to the target cluster for processing, the method further comprises:
    向所述目标集群中的各个备用数据处理终端分别发送数据包查询请求,并分别接收所述目标集群中的各个备用数据处理终端反馈的待处理数据包数目;Respectively sending a data packet query request to each backup data processing terminal in the target cluster, and respectively receiving the number of to-be-processed data packets fed back by each backup data processing terminal in the target cluster;
    从所述目标集群中选取待处理数据包数目最小的备用数据处理终端作为优选处理终端;Selecting a backup data processing terminal with the smallest number of data packets to be processed from the target cluster as a preferred processing terminal;
    将所述数据包分配至所述优选处理终端进行处理。The data packet is distributed to the preferred processing terminal for processing.
  6. 一种数据处理装置,其特征在于,包括:A data processing device, characterized by comprising:
    数据包接收模块,用于接收预设的抓包工具采集并发送的数据包,所述数据包中包括一条以上的数据记录;The data packet receiving module is configured to receive data packets collected and sent by a preset packet capture tool, and the data packets include more than one data record;
    格式匹配模块,用于根据预设的正则表达式资源库对目标记录进行格式匹配,确定所述数据包的数据格式,所述正则表达式资源库中包括一个以上的正则表达式,每个正则表达式均对应于一种数据格式,所述目标记录为所述数据包中的任意一条数据记录;The format matching module is used to perform format matching on the target record according to a preset regular expression resource library to determine the data format of the data packet. The regular expression resource library includes more than one regular expression, and each regular expression The expressions all correspond to a data format, and the target record is any data record in the data packet;
    处理规则查找模块,用于在预设的数据处理规则库中查找目标处理规则,所述目 标处理规则为与所述数据包的数据格式对应的数据处理规则;A processing rule search module, configured to search for a target processing rule in a preset data processing rule library, where the target processing rule is a data processing rule corresponding to the data format of the data packet;
    数据处理模块,用于根据所述目标处理规则对所述数据包中的各条数据记录分别进行处理,得到处理后的数据包。The data processing module is configured to process each data record in the data packet separately according to the target processing rule to obtain a processed data packet.
  7. 根据权利要求6所述的数据处理装置,其特征在于,所述格式匹配模块包括:The data processing device according to claim 6, wherein the format matching module comprises:
    匹配成功率计算单元,用于根据在预设的统计时段内的历史匹配记录分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率;The matching success rate calculation unit is configured to calculate the matching success rate of each regular expression in the regular expression resource library according to historical matching records in a preset statistical period;
    候选表达式选取单元,用于从所述正则表达式资源库中选取一个尚未被选取过的匹配成功率最高的正则表达式作为候选表达式;The candidate expression selection unit is used to select a regular expression with the highest matching success rate that has not been selected as a candidate expression from the regular expression resource library;
    格式匹配单元,用于使用所述候选表达式对所述目标记录进行格式匹配;A format matching unit, configured to use the candidate expression to perform format matching on the target record;
    第一处理单元,用于若格式匹配失败,则返回执行所述从所述正则表达式资源库中选取一个尚未被选取过的匹配成功率最高的正则表达式作为候选表达式的步骤,直至格式匹配成功为止;The first processing unit is configured to return and execute the step of selecting a regular expression with the highest matching success rate from the regular expression resource library that has not been selected as a candidate expression if the format matching fails, until the format Until the match is successful;
    第二处理单元,用于若格式匹配成功,则将与匹配成功的所述候选表达式对应的数据格式确定为所述数据包的数据格式。The second processing unit is configured to determine the data format corresponding to the candidate expression successfully matched as the data format of the data packet if the format matching is successful.
  8. 根据权利要求7所述的数据处理装置,其特征在于,所述匹配成功率计算单元包括:The data processing device according to claim 7, wherein the matching success rate calculation unit comprises:
    子时段划分子单元,用于将所述统计时段划分为T个子时段,T为正整数;The sub-period division sub-unit is used to divide the statistical period into T sub-periods, where T is a positive integer;
    次数统计子单元,用于分别统计所述正则表达式资源库中的各个正则表达式在各个子时段内的匹配成功次数;The frequency counting subunit is used to separately count the number of successful matches of each regular expression in the regular expression resource library in each sub-period;
    匹配成功率计算子单元,用于根据下式分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率:The matching success rate calculation subunit is used to calculate the matching success rate of each regular expression in the regular expression resource library according to the following formula:
    Figure PCTCN2019103039-appb-100002
    Figure PCTCN2019103039-appb-100002
    其中,n为正则表达式的序号,1≤n≤N,N为所述正则表达式资源库中的正则表达式的总数,t为子时段按照时间先后顺序排列的序号,1≤t≤T,MatSucNum n,t为所述正则表达式资源库中的第n个正则表达式在第t个子时段内的匹配成功次数,Weight t为预设的权重系数,且Weight t<Weight t+1,MatSucRatio n为所述正则表达式资源库中的第n个正则表达式的匹配成功率。 Among them, n is the sequence number of the regular expression, 1≤n≤N, N is the total number of regular expressions in the regular expression resource library, t is the sequence number of the sub-period in chronological order, 1≤t≤T , MatSucNum n,t is the number of successful matching of the nth regular expression in the t-th sub-period in the regular expression resource library, Weight t is the preset weight coefficient, and Weight t <Weight t+1 , MatSucRatio n is the matching success rate of the nth regular expression in the regular expression resource library.
  9. 根据权利要求6至8中任一项所述的数据处理装置,其特征在于,还包括:The data processing device according to any one of claims 6 to 8, further comprising:
    数据包数目统计模块,用于统计等待处理的数据包的总数目;Data packet number statistics module, used to count the total number of data packets waiting to be processed;
    配置文件获取模块,用于若所述等待处理的数据包的总数目大于预设的数目阈值,则获取预设的各个备用数据处理终端的配置文件,并根据所述配置文件确定各个备用数据处理终端所对应的数据格式;The configuration file obtaining module is configured to obtain the preset configuration files of each standby data processing terminal if the total number of data packets waiting to be processed is greater than the preset number threshold, and determine each standby data processing according to the configuration file The data format corresponding to the terminal;
    集群划分模块,用于将各个备用数据处理终端划分至对应的数据处理集群中,其中,同一数据处理集群中的备用数据处理终端所对应的数据格式均一致;The cluster division module is used to divide each standby data processing terminal into a corresponding data processing cluster, wherein the data formats corresponding to the standby data processing terminals in the same data processing cluster are all consistent;
    集群选取模块,用于选取与所述数据包对应的目标集群,所述目标集群中各个备用数据处理终端所对应的数据格式与所述数据包的数据格式一致;A cluster selection module, configured to select a target cluster corresponding to the data packet, and the data format corresponding to each standby data processing terminal in the target cluster is consistent with the data format of the data packet;
    数据包发送模块,用于将所述数据包发送至所述目标集群进行处理。The data packet sending module is used to send the data packet to the target cluster for processing.
  10. 根据权利要求9所述的数据处理装置,其特征在于,还包括:The data processing device according to claim 9, further comprising:
    数目查询模块,用于向所述目标集群中的各个备用数据处理终端分别发送数据包查询请求,并分别接收所述目标集群中的各个备用数据处理终端反馈的待处理数据包数目;The number query module is configured to send a data packet query request to each backup data processing terminal in the target cluster, and respectively receive the number of data packets to be processed fed back by each backup data processing terminal in the target cluster;
    终端选取模块,用于从所述目标集群中选取待处理数据包数目最小的备用数据处理终端作为优选处理终端;A terminal selection module, configured to select a backup data processing terminal with the smallest number of data packets to be processed from the target cluster as a preferred processing terminal;
    数据包分配模块,用于将所述数据包分配至所述优选处理终端进行处理。The data packet distribution module is used to distribute the data packet to the preferred processing terminal for processing.
  11. 一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现如下步骤:A computer non-volatile readable storage medium, the computer non-volatile readable storage medium storing computer readable instructions, wherein the computer readable instructions are executed by a processor to implement the following steps:
    接收预设的抓包工具采集并发送的数据包,所述数据包中包括一条以上的数据记录;Receiving a data packet collected and sent by a preset packet capture tool, the data packet including more than one data record;
    根据预设的正则表达式资源库对目标记录进行格式匹配,确定所述数据包的数据格式,所述正则表达式资源库中包括一个以上的正则表达式,每个正则表达式均对应于一种数据格式,所述目标记录为所述数据包中的任意一条数据记录;The format of the target record is matched according to the preset regular expression resource library to determine the data format of the data packet. The regular expression resource library includes more than one regular expression, and each regular expression corresponds to one A data format, the target record is any data record in the data packet;
    在预设的数据处理规则库中查找目标处理规则,所述目标处理规则为与所述数据包的数据格式对应的数据处理规则;Searching for a target processing rule in a preset data processing rule library, where the target processing rule is a data processing rule corresponding to the data format of the data packet;
    根据所述目标处理规则对所述数据包中的各条数据记录分别进行处理,得到处理后的数据包。Each data record in the data packet is processed separately according to the target processing rule to obtain a processed data packet.
  12. 根据权利要求11所述的计算机非易失性可读存储介质,其特征在于,所述根据预设的正则表达式资源库对目标记录进行格式匹配,确定所述数据包的数据格式包括:The computer non-volatile readable storage medium according to claim 11, wherein the format matching of the target record according to a preset regular expression resource library, and determining the data format of the data packet comprises:
    根据在预设的统计时段内的历史匹配记录分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率;Respectively calculating the matching success rate of each regular expression in the regular expression resource library according to historical matching records in a preset statistical period;
    从所述正则表达式资源库中选取一个尚未被选取过的匹配成功率最高的正则表达 式作为候选表达式;Selecting a regular expression with the highest matching success rate that has not been selected as a candidate expression from the regular expression resource library;
    使用所述候选表达式对所述目标记录进行格式匹配;Use the candidate expression to perform format matching on the target record;
    若格式匹配失败,则返回执行所述从所述正则表达式资源库中选取一个尚未被选取过的匹配成功率最高的正则表达式作为候选表达式的步骤,直至格式匹配成功为止;If the format matching fails, returning to the step of selecting a regular expression with the highest matching success rate from the regular expression resource library that has not been selected as a candidate expression until the format matching is successful;
    若格式匹配成功,则将与匹配成功的所述候选表达式对应的数据格式确定为所述数据包的数据格式。If the format matching is successful, the data format corresponding to the candidate expression that is successfully matched is determined as the data format of the data packet.
  13. 根据权利要求12所述的计算机非易失性可读存储介质,其特征在于,所述根据在预设的统计时段内的历史匹配记录分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率包括:The computer non-volatile readable storage medium according to claim 12, wherein the regular expressions in the regular expression resource library are respectively calculated according to historical matching records within a preset statistical period The matching success rate includes:
    将所述统计时段划分为T个子时段,T为正整数;Divide the statistical period into T sub-periods, where T is a positive integer;
    分别统计所述正则表达式资源库中的各个正则表达式在各个子时段内的匹配成功次数;Respectively count the number of successful matching of each regular expression in the regular expression resource library in each sub-period;
    根据下式分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率:The matching success rate of each regular expression in the regular expression resource library is calculated according to the following formula:
    Figure PCTCN2019103039-appb-100003
    Figure PCTCN2019103039-appb-100003
    其中,n为正则表达式的序号,1≤n≤N,N为所述正则表达式资源库中的正则表达式的总数,t为子时段按照时间先后顺序排列的序号,1≤t≤T,MatSucNum n,t为所述正则表达式资源库中的第n个正则表达式在第t个子时段内的匹配成功次数,Weight t为预设的权重系数,且Weight t<Weight t+1,MatSucRatio n为所述正则表达式资源库中的第n个正则表达式的匹配成功率。 Among them, n is the sequence number of the regular expression, 1≤n≤N, N is the total number of regular expressions in the regular expression resource library, t is the sequence number of the sub-period in chronological order, 1≤t≤T , MatSucNum n,t is the number of successful matching of the nth regular expression in the t-th sub-period in the regular expression resource library, Weight t is the preset weight coefficient, and Weight t <Weight t+1 , MatSucRatio n is the matching success rate of the nth regular expression in the regular expression resource library.
  14. 根据权利要求11至13中任一项所述的计算机非易失性可读存储介质,其特征在于,在确定所述数据包的数据格式之后,还包括:The computer non-volatile readable storage medium according to any one of claims 11 to 13, characterized in that, after the data format of the data packet is determined, further comprising:
    统计等待处理的数据包的总数目;Count the total number of data packets waiting to be processed;
    若所述等待处理的数据包的总数目大于预设的数目阈值,则获取预设的各个备用数据处理终端的配置文件,并根据所述配置文件确定各个备用数据处理终端所对应的数据格式;If the total number of data packets waiting to be processed is greater than a preset number threshold, acquiring preset configuration files of each standby data processing terminal, and determining the data format corresponding to each standby data processing terminal according to the configuration file;
    将各个备用数据处理终端划分至对应的数据处理集群中,其中,同一数据处理集群中的备用数据处理终端所对应的数据格式均一致;Divide each spare data processing terminal into corresponding data processing clusters, wherein the data formats corresponding to the spare data processing terminals in the same data processing cluster are all consistent;
    选取与所述数据包对应的目标集群,所述目标集群中各个备用数据处理终端所对应的数据格式与所述数据包的数据格式一致;Selecting a target cluster corresponding to the data packet, and the data format corresponding to each standby data processing terminal in the target cluster is consistent with the data format of the data packet;
    将所述数据包发送至所述目标集群进行处理。The data packet is sent to the target cluster for processing.
  15. 根据权利要求14所述的计算机非易失性可读存储介质,其特征在于,在将所述数据包发送至所述目标集群进行处理之后,还包括:The computer non-volatile readable storage medium according to claim 14, wherein after sending the data packet to the target cluster for processing, further comprising:
    向所述目标集群中的各个备用数据处理终端分别发送数据包查询请求,并分别接收所述目标集群中的各个备用数据处理终端反馈的待处理数据包数目;Respectively sending a data packet query request to each backup data processing terminal in the target cluster, and respectively receiving the number of to-be-processed data packets fed back by each backup data processing terminal in the target cluster;
    从所述目标集群中选取待处理数据包数目最小的备用数据处理终端作为优选处理终端;Selecting a backup data processing terminal with the smallest number of data packets to be processed from the target cluster as a preferred processing terminal;
    将所述数据包分配至所述优选处理终端进行处理。The data packet is distributed to the preferred processing terminal for processing.
  16. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A terminal device, comprising a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, wherein the processor executes the computer-readable instructions as follows step:
    接收预设的抓包工具采集并发送的数据包,所述数据包中包括一条以上的数据记录;Receiving a data packet collected and sent by a preset packet capture tool, the data packet including more than one data record;
    根据预设的正则表达式资源库对目标记录进行格式匹配,确定所述数据包的数据格式,所述正则表达式资源库中包括一个以上的正则表达式,每个正则表达式均对应于一种数据格式,所述目标记录为所述数据包中的任意一条数据记录;The format of the target record is matched according to the preset regular expression resource library to determine the data format of the data packet. The regular expression resource library includes more than one regular expression, and each regular expression corresponds to one A data format, the target record is any data record in the data packet;
    在预设的数据处理规则库中查找目标处理规则,所述目标处理规则为与所述数据包的数据格式对应的数据处理规则;Searching for a target processing rule in a preset data processing rule library, where the target processing rule is a data processing rule corresponding to the data format of the data packet;
    根据所述目标处理规则对所述数据包中的各条数据记录分别进行处理,得到处理后的数据包。Each data record in the data packet is processed separately according to the target processing rule to obtain a processed data packet.
  17. 根据权利要求16所述的终端设备,其特征在于,所述根据预设的正则表达式资源库对目标记录进行格式匹配,确定所述数据包的数据格式包括:The terminal device according to claim 16, wherein the format matching of the target record according to a preset regular expression resource library, and determining the data format of the data packet comprises:
    根据在预设的统计时段内的历史匹配记录分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率;Respectively calculating the matching success rate of each regular expression in the regular expression resource library according to historical matching records in a preset statistical period;
    从所述正则表达式资源库中选取一个尚未被选取过的匹配成功率最高的正则表达式作为候选表达式;Selecting a regular expression with the highest matching success rate that has not been selected as a candidate expression from the regular expression resource library;
    使用所述候选表达式对所述目标记录进行格式匹配;Use the candidate expression to perform format matching on the target record;
    若格式匹配失败,则返回执行所述从所述正则表达式资源库中选取一个尚未被选取过的匹配成功率最高的正则表达式作为候选表达式的步骤,直至格式匹配成功为止;If the format matching fails, returning to the step of selecting a regular expression with the highest matching success rate from the regular expression resource library that has not been selected as a candidate expression until the format matching is successful;
    若格式匹配成功,则将与匹配成功的所述候选表达式对应的数据格式确定为所述数据包的数据格式。If the format matching is successful, the data format corresponding to the candidate expression that is successfully matched is determined as the data format of the data packet.
  18. 根据权利要求17所述的终端设备,其特征在于,所述根据在预设的统计时段 内的历史匹配记录分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率包括:The terminal device according to claim 17, wherein the calculation of the matching success rate of each regular expression in the regular expression resource library according to historical matching records within a preset statistical period comprises:
    将所述统计时段划分为T个子时段,T为正整数;Divide the statistical period into T sub-periods, where T is a positive integer;
    分别统计所述正则表达式资源库中的各个正则表达式在各个子时段内的匹配成功次数;Respectively count the number of successful matching of each regular expression in the regular expression resource library in each sub-period;
    根据下式分别计算所述正则表达式资源库中的各个正则表达式的匹配成功率:The matching success rate of each regular expression in the regular expression resource library is calculated according to the following formula:
    Figure PCTCN2019103039-appb-100004
    Figure PCTCN2019103039-appb-100004
    其中,n为正则表达式的序号,1≤n≤N,N为所述正则表达式资源库中的正则表达式的总数,t为子时段按照时间先后顺序排列的序号,1≤t≤T,MatSucNum n,t为所述正则表达式资源库中的第n个正则表达式在第t个子时段内的匹配成功次数,Weight t为预设的权重系数,且Weight t<Weight t+1,MatSucRatio n为所述正则表达式资源库中的第n个正则表达式的匹配成功率。 Among them, n is the sequence number of the regular expression, 1≤n≤N, N is the total number of regular expressions in the regular expression resource library, t is the sequence number of the sub-period in chronological order, 1≤t≤T , MatSucNum n,t is the number of successful matching of the nth regular expression in the t-th sub-period in the regular expression resource library, Weight t is the preset weight coefficient, and Weight t <Weight t+1 , MatSucRatio n is the matching success rate of the nth regular expression in the regular expression resource library.
  19. 根据权利要求16至18中任一项所述的终端设备,其特征在于,在确定所述数据包的数据格式之后,还包括:The terminal device according to any one of claims 16 to 18, characterized in that, after determining the data format of the data packet, it further comprises:
    统计等待处理的数据包的总数目;Count the total number of data packets waiting to be processed;
    若所述等待处理的数据包的总数目大于预设的数目阈值,则获取预设的各个备用数据处理终端的配置文件,并根据所述配置文件确定各个备用数据处理终端所对应的数据格式;If the total number of data packets waiting to be processed is greater than a preset number threshold, acquiring preset configuration files of each standby data processing terminal, and determining the data format corresponding to each standby data processing terminal according to the configuration file;
    将各个备用数据处理终端划分至对应的数据处理集群中,其中,同一数据处理集群中的备用数据处理终端所对应的数据格式均一致;Divide each spare data processing terminal into corresponding data processing clusters, wherein the data formats corresponding to the spare data processing terminals in the same data processing cluster are all consistent;
    选取与所述数据包对应的目标集群,所述目标集群中各个备用数据处理终端所对应的数据格式与所述数据包的数据格式一致;Selecting a target cluster corresponding to the data packet, and the data format corresponding to each standby data processing terminal in the target cluster is consistent with the data format of the data packet;
    将所述数据包发送至所述目标集群进行处理。The data packet is sent to the target cluster for processing.
  20. 根据权利要求19所述的终端设备,其特征在于,在将所述数据包发送至所述目标集群进行处理之后,还包括:The terminal device according to claim 19, characterized in that, after sending the data packet to the target cluster for processing, further comprising:
    向所述目标集群中的各个备用数据处理终端分别发送数据包查询请求,并分别接收所述目标集群中的各个备用数据处理终端反馈的待处理数据包数目;Respectively sending a data packet query request to each backup data processing terminal in the target cluster, and respectively receiving the number of to-be-processed data packets fed back by each backup data processing terminal in the target cluster;
    从所述目标集群中选取待处理数据包数目最小的备用数据处理终端作为优选处理终端;Selecting a backup data processing terminal with the smallest number of data packets to be processed from the target cluster as a preferred processing terminal;
    将所述数据包分配至所述优选处理终端进行处理。The data packet is distributed to the preferred processing terminal for processing.
PCT/CN2019/103039 2019-05-21 2019-08-28 Data processing method and apparatus, storage medium and terminal device WO2020232880A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910423175.6A CN110245155A (en) 2019-05-21 2019-05-21 Data processing method, device, computer readable storage medium and terminal device
CN201910423175.6 2019-05-21

Publications (1)

Publication Number Publication Date
WO2020232880A1 true WO2020232880A1 (en) 2020-11-26

Family

ID=67884683

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103039 WO2020232880A1 (en) 2019-05-21 2019-08-28 Data processing method and apparatus, storage medium and terminal device

Country Status (2)

Country Link
CN (1) CN110245155A (en)
WO (1) WO2020232880A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656659A (en) * 2021-08-31 2021-11-16 上海观安信息技术股份有限公司 Data extraction method, device and system and computer readable storage medium
CN115757423A (en) * 2022-11-29 2023-03-07 中诚智信工程咨询集团股份有限公司 Engineering cost data correction method, system, equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113660530B (en) * 2021-07-27 2024-03-19 中央广播电视总台 Program stream data grabbing method and device, computer equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090287628A1 (en) * 2008-05-15 2009-11-19 Exegy Incorporated Method and System for Accelerated Stream Processing
CN103078808A (en) * 2012-12-29 2013-05-01 大连环宇移动科技有限公司 Data stream exchanging and multiplexing system and method suitable for multi-stream regular expression matching
CN107729475A (en) * 2017-10-16 2018-02-23 深圳视界信息技术有限公司 Web page element acquisition method, device, terminal and computer-readable recording medium
CN107766466A (en) * 2017-09-29 2018-03-06 上海望友信息科技有限公司 Recognition methods, system, computer-readable recording medium and the equipment of data type

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107786545A (en) * 2017-09-29 2018-03-09 中国平安人寿保险股份有限公司 A kind of attack detection method and terminal device
CN109299164B (en) * 2018-09-03 2024-05-17 中国平安人寿保险股份有限公司 Data query method, computer readable storage medium and terminal equipment
CN109656487B (en) * 2018-12-24 2023-04-28 平安科技(深圳)有限公司 Data processing method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090287628A1 (en) * 2008-05-15 2009-11-19 Exegy Incorporated Method and System for Accelerated Stream Processing
CN103078808A (en) * 2012-12-29 2013-05-01 大连环宇移动科技有限公司 Data stream exchanging and multiplexing system and method suitable for multi-stream regular expression matching
CN107766466A (en) * 2017-09-29 2018-03-06 上海望友信息科技有限公司 Recognition methods, system, computer-readable recording medium and the equipment of data type
CN107729475A (en) * 2017-10-16 2018-02-23 深圳视界信息技术有限公司 Web page element acquisition method, device, terminal and computer-readable recording medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李亮雄 (LI, LIANGXIONG): "基于负载特征与行为特征相结合的网络流分类系统 (Network flow classification system based on the combination of payload and behavior characteristic)", 中国优秀硕士学位论文全文数据库(电子期刊)工程科技I辑 (CHINESE MASTER’S THESES FULL-TEXT DATABASE (ELECTRONIC JOURNAL), ENGINEERING SCIENCE & TECHNOLOGY I), no. 02, 15 February 2013 (2013-02-15), DOI: 20200219192407X *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656659A (en) * 2021-08-31 2021-11-16 上海观安信息技术股份有限公司 Data extraction method, device and system and computer readable storage medium
CN115757423A (en) * 2022-11-29 2023-03-07 中诚智信工程咨询集团股份有限公司 Engineering cost data correction method, system, equipment and storage medium
CN115757423B (en) * 2022-11-29 2024-01-30 中诚智信工程咨询集团股份有限公司 Engineering cost data correction method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN110245155A (en) 2019-09-17

Similar Documents

Publication Publication Date Title
WO2020232880A1 (en) Data processing method and apparatus, storage medium and terminal device
US10579661B2 (en) System and method for machine learning and classifying data
EP2695087B1 (en) Processing data in a mapreduce framework
WO2020029368A1 (en) Data storage method and apparatus, computer device and storage medium
TWI512506B (en) Sorting method and device for search results
US11580136B2 (en) Method and apparatus of user clustering, computer device and medium
WO2019056681A1 (en) Real-time data monitoring method, device, terminal apparatus, and storage medium
WO2022142027A1 (en) Knowledge graph-based fuzzy matching method and apparatus, computer device, and storage medium
WO2021043057A1 (en) Task allocation method and apparatus, and readable storage medium and terminal device
WO2018036549A1 (en) Distributed database query method and device, and management system
WO2019052162A1 (en) Method, apparatus and device for improving data cleaning efficiency, and readable storage medium
US20240126817A1 (en) Graph data query
US10394600B2 (en) Systems and methods for caching task execution
US10114867B2 (en) Methods and systems for selectively retrieving data to provide a limited dataset for incorporation into a pivot table
WO2020082588A1 (en) Method and apparatus for identifying abnormal service request, electronic device, and medium
US20140052727A1 (en) Data processing for database aggregation operation
WO2019242120A1 (en) Data display method, computer readable storage medium, terminal apparatus, and device
WO2020140662A1 (en) Data table filling method, apparatus, computer device, and storage medium
WO2022017167A1 (en) Information processing method and system, electronic device, and storage medium
US10311093B2 (en) Entity resolution from documents
CN112328688B (en) Data storage method, device, computer equipment and storage medium
US10334011B2 (en) Efficient sorting for a stream processing engine
CN113468226A (en) Service processing method, device, electronic equipment and storage medium
WO2021027162A1 (en) Non-full-cell table content extraction method and apparatus, and terminal device
US11354373B2 (en) System and method for efficiently querying data using temporal granularities

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19929285

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19929285

Country of ref document: EP

Kind code of ref document: A1