WO2022048422A1 - 数据处理的方法、装置、设备及存储介质 - Google Patents

数据处理的方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022048422A1
WO2022048422A1 PCT/CN2021/112248 CN2021112248W WO2022048422A1 WO 2022048422 A1 WO2022048422 A1 WO 2022048422A1 CN 2021112248 W CN2021112248 W CN 2021112248W WO 2022048422 A1 WO2022048422 A1 WO 2022048422A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
deduplication
attribute
real
content
Prior art date
Application number
PCT/CN2021/112248
Other languages
English (en)
French (fr)
Inventor
周志刚
万月亮
火一莽
Original Assignee
北京锐安科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京锐安科技有限公司 filed Critical 北京锐安科技有限公司
Publication of WO2022048422A1 publication Critical patent/WO2022048422A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present application relate to data processing technologies, for example, to a data processing method, apparatus, device, and storage medium.
  • the processing method is to manually analyze the data. to be processed.
  • the present application provides a data processing method, apparatus, device and storage medium, so as to realize massive data processing and complete the extraction operation of valid data.
  • An embodiment of the present application provides a data processing method, including: receiving real-time stream data; performing de-duplication processing on the real-time stream data according to data de-duplication rules to obtain de-duplication data; The data is checked for correctness, valid data is obtained, and the valid data is stored.
  • An embodiment of the present application further provides a data processing device, the device includes: a data acquisition module configured to receive real-time stream data; a data deduplication module configured to perform deduplication processing on the real-time stream data according to a data deduplication rule , to obtain deduplicated data; the correctness verification module is configured to perform correctness detection on the deduplicated data according to the correctness detection rule to obtain valid data; the data storage module is configured to store the valid data.
  • Embodiments of the present application further provide an electronic device, including: one or more processors; a storage device configured to store one or more programs, when the one or more programs are executed by the one or more processors The execution causes the one or more processors to implement the data processing method provided by any embodiment of the present application.
  • the embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the program is executed by a processor to perform the data processing method provided by any embodiment of the present application.
  • Embodiment 1 is a flowchart of a data processing method in Embodiment 1 of the present application.
  • FIG. 2 is a flowchart of a data processing method in Embodiment 2 of the present application.
  • FIG. 3 is a schematic diagram of functional modules of a data processing apparatus in Embodiment 3 of the present application.
  • FIG. 4 is a schematic structural diagram of an electronic device provided in Embodiment 4 of the present application.
  • FIG. 1 is a flowchart of a data processing method provided in Embodiment 1 of the present application. This embodiment can be applied to the situation of obtaining valid data from massive data, the method can be executed by a data processing device, and the data processing device can be implemented by software and/or hardware, and the data processing device can be integrated in, for example, a computer or a server In electronic equipment such as the following steps are included.
  • Streams are made up of a series of immutable messages of a similar type.
  • a stream could be all click events on a website, all updates to a particular database, all logs generated by a service, or any other type of stream.
  • time data Streaming data is a set of sequential, large, fast, and consecutively arriving sequences of data.
  • streaming data can be viewed as a dynamic collection of data that grows infinitely over time.
  • Real-time streaming data indicates that streaming data has a time attribute. From the perspective of timestamps, a piece of data in real-time streaming data is generated at a certain moment, then the value of this moment can be the time when the data source generates the data, or the inflow of the data.
  • Receiving real-time streaming data can be receiving all action streaming data in the Internet through a high-throughput, low-latency Kafka stream processing platform, such as web browsing, searching, and other user actions.
  • receiving real-time streaming data may be receiving the real-time streaming data based on the Flink streaming framework.
  • the advantage of this setting is that the Flink streaming framework has high performance and fast data processing speed, and the Flink streaming framework is also fault-tolerant.
  • the fault-tolerant mechanism of the Flink streaming framework will reduce the performance and throughput of the streaming framework.
  • S120 Perform deduplication processing on the real-time stream data according to the data deduplication rule to obtain deduplication data.
  • the data deduplication rules can be configured manually, and the received real-time stream data is deduplicated by configuring the data deduplication rules.
  • the deduplication operation may be to compare multiple pieces of data in the real-time streaming data respectively, determine at least two duplicate data, retain one of the at least two duplicate data, delete other duplicate data, and obtain the deduplicated data. heavy data.
  • the data contents in any two pieces of data may be matched one by one, and it is determined that two pieces of data with identical data contents are duplicate data.
  • the deduplication operation can also be performed by comparing the data content of any two data according to the data type, comparing the data attributes of the data with the same type, and selecting the real-time streaming data with the same data type and data attributes. Consistent multiple data (that is, multiple data with the same data type and the same data content) are regarded as duplicate data.
  • the correctness detection rule can be pre-configured, for example, it can be formed by inputting the correctness detection code into the correctness detection rule template.
  • the correctness detection rule can be the correctness detection rule of the data attribute, and different data attributes correspond to different correctness detection rules. By configuring the correctness detection rules corresponding to different data attributes, the data content of the data attributes corresponding to the acquired deduplicated data is tested for correctness, and the deduplicated data that conforms to the correctness detection rules are selected as valid. data.
  • the correctness detection rules corresponding to each data attribute may be stored independently.
  • the correctness detection rules corresponding to each data attribute may be stored in the correctness detection rule database.
  • the data attributes included in the real-time streaming data Invoke the corresponding correctness detection rule.
  • prompt information is generated to prompt the configuration of a new correctness detection rule.
  • invalid data with wrong data is deleted, so as to avoid the occupation of storage space by invalid data.
  • prompt information of invalid data is generated based on the invalid data, and the prompt information of invalid data is displayed or sent to the associated terminal, so that the associated terminal or the operating user can correct the invalid data.
  • the data deduplication rule and/or the correctness detection rule may be in an XML file format.
  • the configuration rules are in XML file format. The advantage of this setting is that XML is a file format described in text form, which is readable and object-oriented.
  • the working principle of the data processing method by receiving real-time stream data, configuring data deduplication rules, performing preliminary deduplication operations on real-time stream data, filtering real-time stream data to obtain deduplication data, and configuring correctness detection rules again to deduplicate
  • the data is checked for correctness of the data.
  • the invalid data is filtered again to obtain valid data.
  • the valid data is stored in the local database and the cloud.
  • the received real-time streaming data is processed twice consecutively through the data deduplication rule and the correctness detection rule, so as to remove duplicate data and erroneous data in the real-time streaming data and avoid invalid data.
  • the occupation of storage space solves the problems of high data storage pressure and invalid data, and achieves the effect of reducing data storage pressure and improving data validity.
  • FIG. 2 is a flowchart of a data processing method in Embodiment 2 of the present application. Based on the above embodiments, the method includes the following steps.
  • S210 Receive real-time streaming data.
  • the real-time streaming data includes a plurality of data, and each data includes a data type identifier, at least one data attribute, and data content of each data attribute.
  • the format of the data can be defined, the format of the received data is defined as the first field, the first field is the data type code, and the following fields are the data attributes in sequence, exemplarily, the data format Can be defined as [Data Type Code], [Attribute 1], [Attribute 2]....
  • the received real-time streaming data can be preprocessed, for example, the data type of the received real-time streaming data can be identified, and the encoding of the identified data type can be used in the first real-time streaming data.
  • S220 Perform deduplication processing on the real-time stream data according to the data deduplication rule to obtain deduplication data.
  • performing deduplication processing on the real-time stream data according to the data deduplication rule to obtain deduplicated data includes: when the data type identifiers of the two pieces of data in the real-time stream data are the same, deduplicating the real-time stream data. At least one data attribute of one data in the two data is compared with at least one data attribute of the other data; when all the data attributes of the one data are the same as all the data attributes of the other data, all data attributes of the other data are compared.
  • the data content of at least one data attribute of the one data is compared with the data content of at least one data attribute of the other data; when the data content of all the data attributes of the one data is compared with all the data of the other data
  • the data contents of the attributes are the same, it is determined that the two pieces of data are duplicate data, and the duplicate data is deduplicated;
  • the data contents of the attributes are different, it is determined that the two data are not duplicate data, and the two data are retained; when at least one data attribute of the one data is different from at least one data attribute of the other data, it is determined that the two data
  • the two pieces of data are not duplicate data, and the two pieces of data are reserved.
  • the data deduplication rule in the real-time streaming data, compare the data attributes of any two data types with the same encoding. At the same time, the two compared data are determined as duplicate data, and the duplicate data is deduplicated, that is, any one of the two data can be selected.
  • Comparing the data attributes of any two data types with the same encoding is performed.
  • the encoding of the any two data types is the same.
  • the data is determined not to be duplicate data, and the data encoded with the same arbitrary two data types is reserved.
  • data 1 is represented as [01], [username], [mobile phone number]
  • data 2 is represented as [01], [username], [gender], and data 1 is compared with data 2, wherein the data The [mobile phone number] attribute of 1 is different from the [gender] attribute of data 2, so data 1 and data 2 are different stream data.
  • the data content of the data encoded with the same two data types is compared, and any two data types with the same data content are encoded with the same data content.
  • the data is determined to be duplicate data, and any two data types with different data content encode the same data as different stream data.
  • data 3 is represented as [01], [user name A], [mobile phone number B]
  • data 4 is represented as [01], [user name A], [mobile phone number C], data 3 and data 4 are performed.
  • the comparison if the data content of the mobile phone number attribute is different, it is determined that the data 3 and the data 4 are different stream data.
  • performing deduplication processing on the real-time stream data according to the data deduplication rule to obtain deduplication data includes: determining at least one deduplication key attribute of each data in the real-time stream data; When the data type identifiers of the two data in the data are the same, compare at least one deduplication key attribute of one data in the two data with at least one deduplication key attribute of the other data; when all the deduplication keys of the one data When the attribute is the same as all the deduplication key attributes of the other data, the data content of the at least one deduplication key attribute of the one data is compared with the data content of the at least one deduplication key attribute of the other data.
  • the deduplication processing is performed on the real-time stream data according to the data deduplication rule, the deduplication data is obtained, and the deduplication key attributes of the multiple data are determined.
  • the deduplication key attribute It can be one or more of multiple data attributes.
  • the key attribute of deduplication can be set and updated according to user requirements, which is not limited.
  • data 4 is represented as [04], [username], [mobile phone number], [gender], [password], [ID number]
  • data 5 is represented as [04], [username], [ Mobile phone number], [age], [ID number], you can choose [user name], [mobile phone number] and [ID number] as key attributes.
  • data deduplication is performed by selecting at least one key attribute to perform a one-to-one comparison between the two pieces of data.
  • the duplicate key attributes are the same, the attribute contents of the deduplicated key attributes of any two data are compared, and if the attribute contents of all the deduplicated key attributes of the any two data are the same, the arbitrary deduplication key attributes are compared.
  • the two data are determined to be duplicate data, and the duplicate data is deduplicated.
  • the deduplication process may be to select any one of the two duplicate data.
  • data 4 is represented as [04], [username ], [mobile phone number], [gender], [password], [ID number]
  • data 5 is represented as [04], [username], [mobile phone number], [age], [ID number], when Select [user name], [mobile phone number] and [ID card number] as key attributes, compare the data content of the key attributes of data 4 and data 5, when the data content of the key attributes of data 4 and data 5 are the same Meanwhile, data 4 and data 5 are determined to be the same stream data, and either data 4 or data 5 may be selected.
  • the correctness of the deduplicated data is detected according to the standard of the configured correctness detection rule to determine the validity of the data.
  • the correctness detection rule can be the correctness of at least one data attribute corresponding to the data type.
  • the correctness detection rule (correctness detection standard), the correctness detection rule of at least one data attribute corresponding to the data type can be the correctness detection rule of different data attributes configured according to the at least one data attribute, and the correctness detection rule of the data attribute can be It is set using regular expressions.
  • the correctness of the deduplicated data is detected by the correctness detection rules of the data attributes to obtain valid data.
  • the correctness detection rules are configured for different data attributes.
  • the mobile phone number attribute of a data is checked for correctness, the data that does not meet the conditions is excluded, and the correctness detection rule is selected. For example, when the mobile phone number is 1352, it is checked for correctness. By detecting that the mobile phone number does not meet the conditions of the correct mobile phone number, the mobile phone number will not be obtained.
  • the correct Sex detection detects that the mobile phone number meets the conditions of the correct mobile phone number, and stores the mobile phone number in the database.
  • the data format is defined, the real-time stream data is received, the received real-time stream data is deduplicated through attribute comparison based on the data deduplication rule, the deduplication data is obtained, and the difference in the configuration data is The correctness detection rule corresponding to the attribute, the attribute of the deduplicated data is checked for correctness of the data attribute through the correctness detection rule corresponding to the attribute, and the real-time streaming data with the correct data attribute is saved in the database.
  • FIG. 3 is a schematic diagram of functional modules of a data processing apparatus in Embodiment 3 of the present application.
  • the present application provides a data processing device, comprising: a data acquisition module 310 configured to receive real-time stream data; a data deduplication module 320 configured to perform deduplication processing on the real-time stream data according to data deduplication rules to obtain deduplication Duplicate data; the correctness verification module 330 is configured to perform correctness detection on the deduplicated data according to the correctness detection rule to obtain valid data; the data storage module 340 is configured to store the valid data.
  • the data collection module 310 is configured to receive the real-time streaming data based on the Flink streaming framework.
  • the real-time streaming data includes a plurality of data, and each data includes a data type identifier, at least one data attribute, and data content of each data attribute.
  • the data deduplication module 320 is configured to, when the data type identifiers of the two data in the real-time streaming data are the same, compare at least one data attribute of one data in the two data with at least one data attribute of the other data. One data attribute is compared; when at least one data attribute of the one data is different from at least one data attribute of the other data, it is determined that the two data are not duplicate data, and the two data are retained; When all data attributes of one data are the same as all data attributes of the other data, comparing the data content of at least one data attribute of the one data with the data content of at least one data attribute of the other data; When the data content of all data attributes of the one data is the same as the data content of all data attributes of the other data, it is determined that the two pieces of data are duplicate data, and the duplicate data is deduplicated; When the data content of at least one data attribute in one piece of data is different from the data content of at least one data attribute of the other data, it is determined that the two pieces of data are not duplicate
  • the data deduplication module 320 is configured to determine at least one deduplication key attribute of each data in the real-time streaming data; when the data type identifiers of the two data in the real-time streaming data are the same, the two data At least one deduplication key attribute of one data is compared with at least one deduplication key attribute of another data; when all the deduplication key attributes of the one data are compared with all the deduplication key attributes of the other data At the same time, the data content of the at least one deduplication key attribute of the one data is compared with the data content of the at least one deduplication key attribute of the other data; when the data content of the at least one deduplication key attribute of the one data is When the data content is different from the data content of at least one deduplicated key attribute of the other data, it is determined that the two data are not duplicate data, and the two data are retained; when all the data of the one data is deduplicated key attribute data When the content is the same as the data content of all the deduplication key attributes
  • the correctness verification module 330 is configured to call the correctness detection rule corresponding to the data type according to the data type of the deduplicated data to determine valid data in the deduplicated data, wherein the correctness The detection rule includes the correctness detection standard of each data attribute in the at least one data attribute corresponding to the data type.
  • the data deduplication rule and/or the correctness detection rule are in an XML file format.
  • the data collection module receives the real-time stream data, firstly deduplicates the received real-time stream data through the data deduplication module according to deduplication rules, and obtains deduplication data, and then deduplicates the data.
  • the correctness detection module performs correctness detection on the deduplicated data according to the configured correctness detection rules to obtain valid data, and finally the valid data is stored through the data storage module to complete the data storage.
  • the above product can execute the method provided by any embodiment of the present application, and has functional modules corresponding to the execution method.
  • FIG. 4 is a schematic structural diagram of an electronic device provided in Embodiment 4 of the present application.
  • the device includes a processor 410 , a memory 420 , an input device 430 and an output device 440 ; the number of processors 410 in the device may be one or more, and one processor 410 is taken as an example in FIG. 4 .
  • the processor 410 , the memory 420 , the input device 430 and the output device 440 in the device may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 4 .
  • the memory 420 may be configured to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to data processing in the embodiments of the present application (for example, data acquisition in the data processing device). module 310, data deduplication module 320, correctness verification module 330 and data storage module 340).
  • the processor 410 executes various functional applications and data processing of the device by running the software programs, instructions and modules stored in the memory 420 , that is, implements the above-mentioned data processing method.
  • the memory 420 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Additionally, memory 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some instances, memory 420 may also include memory located remotely from processor 410, which may be connected to the device through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the input device 430 may be configured to receive incoming streaming data and to generate data input related to user settings and functional control of the device.
  • the output device 440 may include a display device such as a display screen.
  • the fifth embodiment provides a computer-readable storage medium on which a computer program is stored.
  • the data processing method provided by any embodiment of the present application is implemented, and the method includes: receiving real-time streaming data Carry out deduplication processing to the real-time stream data according to the data deduplication rule to obtain deduplication data; carry out correctness detection to the deduplication data according to the correctness detection rule, obtain valid data, and store the valid data .
  • the computer storage medium of the embodiments of the present application may adopt any combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination of the above.
  • Examples (a non-exhaustive list) of computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (Read- Only Memory, ROM), Erasable Programmable Read-Only-Memory (EPROM) or Flash, Optical Fiber, Portable Compact Disc Read-Only Memory (CD-ROM), Optical A memory device, a magnetic memory device, or any suitable combination of the foregoing.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • the program code embodied on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wire, optical fiber cable, radio frequency (RF), etc., or any suitable combination of the above.
  • suitable medium including but not limited to: wireless, wire, optical fiber cable, radio frequency (RF), etc., or any suitable combination of the above.
  • Computer program code for carrying out the operations of the present application may be written in one or more programming languages, including object-oriented programming languages, such as Java, Smalltalk, C++, and conventional A procedural programming language, such as the "C" language or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or Wide Area Network (WAN), or may be connected to an external computer (eg, use an internet service provider to connect via the internet).
  • LAN Local Area Network
  • WAN Wide Area Network
  • the above-mentioned multiple modules or multiple steps of the present application can be implemented by a general-purpose computing device, and they can be centralized on a single computing device or distributed on a network composed of multiple computing devices , optionally, they can be implemented with program codes executable by a computer device, so that they can be stored in a storage device and executed by the computing device, or they can be respectively made into a plurality of integrated circuit modules, or some of them can be Multiple modules or steps are implemented as a single integrated circuit module.
  • the present application is not limited to any particular combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种数据处理的方法、装置、设备及存储介质。一种数据处理方法,包括:接收实时流数据;根据数据去重规则对所述实时流数据进行去重处理,得到去重数据;根据正确性检测规则对所述去重数据进行正确性检测,得到有效数据;存储所述有效数据。

Description

数据处理的方法、装置、设备及存储介质
本申请要求在2020年09月02日提交中国专利局、申请号为202010910153.5的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及数据处理技术,例如涉及一种数据处理的方法、装置、设备及存储介质。
背景技术
随着互联网技术的快速发展,互联网中的数据量呈现爆发式指数级别增长,数据的处理和存储面临着巨大的考验。
互联网时代,存储海量数据需要占用大量的存储空间,然而被占用的存储空间存在着空间无意义占用的现象,从海量数据中查找有效数据变得越来越困难,处理方法是通过人工方式对数据进行处理。
面对互联网中的海量数据,采用人工方式对数据进行过滤以获取有效的数据,需要花费大量时间,处理后的数据中往往混有一些无效的数据。
发明内容
本申请提供一种数据处理的方法、装置、设备及存储介质,以实现海量数据处理,完成有效数据的提取操作。
本申请实施例提供了一种数据处理方法,包括:接收实时流数据;根据数据去重规则对所述实时流数据进行去重处理,得到去重数据;根据正确性检测规则对所述去重数据进行正确性检测,得到有效数据,并存储所述有效数据。
本申请实施例还提供了一种数据处理装置,该装置包括:数据采集模块,设置为接收实时流数据;数据去重模块,设置为根据数据去重规则对所述实时流数据进行去重处理,得到去重数据;正确性验证模块,设置为根据正确性检测规则对所述去重数据进行正确性检测,得到有效数据;数据存储模块,设置为存储所述有效数据。
本申请实施例还提供了一种电子设备,包括:一个或多个处理器;存储装置,设置为存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本申请任意实施例所提供的数据处理方法。
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行如本申请任意实施例所提供的数据处理方法。
附图说明
图1是本申请实施例一中的一种数据处理方法的流程图;
图2是本申请实施例二中的一种数据处理方法的流程图;
图3是本申请实施例三中的一种数据处理装置的功能模块的示意图;
图4是本申请实施例四提供的一种电子设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请进行说明。可以理解的是,此处所描述的实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
实施例一
图1为本申请实施例一提供的一种数据处理方法的流程图。本实施例可适用于在海量数据中获取有效数据的情况,该方法可以由数据处理装置来执行,数据处理装置可通过软件和/或硬件方式实现,该数据处理装置可集成于诸如计算机或者服务器等的电子设备中,包括如下步骤。
S110、接收实时流数据。
流是由一系列不可变的相似类型的消息组成,例如,一个流可以是网站所有的点击事件,可以是一个特定数据库的所有更新操作,可以是一个服务产生的所有日志,也可以是其他类型的时间数据。流数据是一组顺序、大量、快速、连续到达的数据序列。一般情况下,流数据可以被视为一个随时间延续而无限增长的动态数据集合。实时流数据表示流数据具有时间属性,从时间戳角度可知,实时流数据中的一个数据产生于某一时刻,那么这一时刻的取值可以是数据源产生数据的时间,也可以是数据流入处理引擎时流数据处理系统的时间。接收实时流数据可以是通过高吞吐、低延迟的Kafka流处理平台接收互联网中的所有动作流数据,这种动作可以是网页浏览、搜索和其他用户的行动等。
在上述技术方案的基础上,接收实时流数据可以是基于Flink流式框架接收所述实时流数据。这样设置的好处在于Flink流式框架具有高性能、处理数据速度快,且Flink流式框架还具有容错性,Flink流式框架的容错机制会降低流处理框架的性能和吞吐量。
S120、根据数据去重规则对所述实时流数据进行去重处理,得到去重数据。
数据去重规则可以采用人工的方式配置,将接收到的实时流数据通过配置数据去重规则进行数据去重操作。在一些实施例中,去重操作可以是将实时流数据中的多个数据分别进行比对,确定至少两个重复数据,保留至少两个重复数据中的一个数据,删除其他重复数据,得到去重数据。可选的,可以是将任意两个数据中的数据内容进行逐一匹配,确定数据内容完全相同的两个数据为重复数据。可选的,去重操作还可以是通过对任意两个数据按照数据类型对数据内容进行比对,将具有相同类型的数据进行数据属性比对,选取实时流数据中具有相同数据类型且数据属性一致的多个数据(即数据类型相同且数据内容相同的多个数据)作为重复数据。
通过对接收的实时流数据进行去重处理,去除实时流数据中的重复数据,减少重复数据对存储资源的占用。
S130、根据正确性检测规则对所述去重数据进行正确性检测,得到有效数据,并存储所述有效数据。
正确性检测规则可以是预先配置的,例如可以是将正确性检测代码输入至正确性检测规则模板形成的,正确性检测规则可以是数据属性的正确性检测规则,不同的数据属性对应不同的正确性检测规则,通过配置不同的数据属性相对应的正确性检测规则,将获取到的去重数据相对应的数据属性的数据内容进行正确性检测,选取符合正确性检测规则的去重数据作为有效数据。
可选的,每个数据属性对应的正确性检测规则可以是独立存储的,例如每个数据属性对应的正确性检测规则可以存储在正确性检测规则数据库中,根据实时流数据中包括的数据属性调用对应的正确性检测规则。当正确性检测规则数据库不存在与当前流数据的数据属性对应的正确性检测规则时,生成提示信息,以提示配置新的正确性检测规则。
通过对去重数据进行正确性验证,删除数据错误的无效数据,避免无效数据对存储空间的占用。可选的,基于无效数据生成无效数据的提示信息,将所述无效数据的提示信息进行显示或者发送至关联终端,以使关联终端或者操作用户对无效数据进行修正。
可选的,数据去重规则和/或正确性检测规则可以为XML文件格式。配置规则采用XML文件格式,这样设置的好处在于XML是以文本形式来描述的一种文件格式,具有良好的可读性以及面向对象。
该数据处理方法的工作原理:通过接收实时流数据,配置数据去重规则,对实时流数据进行初步去重操作,对实时流数据过滤获得去重数据,再次通过 配置正确性检测规则对去重数据进行数据的正确性检测,在去重数据的基础上对数据进行再一次的无效数据过滤,得到有效数据,将有效数据进行数据存储,可以存储到本地数据库以及云端等。
本实施例的技术方案,将接收到的实时流数据通过数据去重的规则和正确性检测规则对数据进行两次连续处理,以去除实时流数据中的重复数据以及错误数据,避免了无效数据对存储空间的占用,解决了数据存储压力大和无效数据的问题,达到了减轻数据存储的压力和提高数据有效性的问题的效果。
实施例二
图2是本申请实施例二中的一种数据处理方法的流程图。在上述实施例的基础上进行了说明,该方法包括如下步骤。
S210、接收实时流数据。
可选的,所述实时流数据包括多个数据,每个数据包括数据类型标识、至少一个数据属性、以及每个数据属性的数据内容。在接收实时流数据之前,可以对数据的格式进行定义,将接收的数据的格式定义为第一字段,令第一字段为数据类型编码,后面的字段依次为数据属性,示例性的,数据格式可以定义为[数据类型编码]、[属性1]、[属性2]……。
当接收的实时流数据不符合上述数据格式时,可对接收的实时流数据进行预处理,例如识别接收的实时流数据的数据类型,根据识别到的数据类型的编码在实时流数据的第一字段添加数据类型编码。例如接收到一系统登录的数据,给数据格式的第一字段定义为一系统注册,将第一字段设定为01,即01代表一系统注册,该系统具有用户名、手机号、密码属性,可以表示为[01]、[用户名]、[手机号]、[密码]。
S220、根据数据去重规则对所述实时流数据进行去重处理,得到去重数据。
可选的,所述根据数据去重规则对所述实时流数据进行去重处理,得到去重数据,包括:当所述实时流数据中的两数据的数据类型标识均相同时,将所述两数据中的一个数据的至少一个数据属性与另一个数据的至少一个数据属性进行比对;当所述一个数据中的全部数据属性与所述另一个数据的全部数据属性均相同时,将所述一个数据的至少一个数据属性的数据内容与所述另一个数据的至少一个数据属性的数据内容进行比对;当所述一个数据的全部数据属性的数据内容与所述另一个数据的全部数据属性的数据内容均相同时,确定所述两数据为重复数据,对所述重复数据进行去重处理;当所述一个数据的至少一个数据属性的数据内容与所述另一个数据的至少一个数据属性的数据内容不相同时,确定所述两数据不是重复数据,保留所述两数据;当所述一个数据的至 少一个数据属性与所述另一个数据的至少一个数据属性不相同时,确定所述两数据不是重复数据,保留所述两数据。
示例性的,当两数据的数据类型标识不相同的情况下,确定所述两数据不是重复数据,保留所述两数据。
通过配置数据去重规则,在实时流数据中分别对任意两个数据类型编码相同的数据进行两两数据属性的比对,当所述任意两个数据类型编码相同的数据的全部数据属性均相同时,将比对的两个数据确定为重复数据,对所述重复数据进行去重,即选择所述两个数据中的任意一个即可。
对任意两个数据类型编码相同的数据进行两两数据属性的比对,当所述任意两个数据类型编码相同的数据的至少一个数据属性不相同时,将所述任意两个数据类型编码相同的数据确定为不是重复数据,并将所述任意两个数据类型编码相同的数据进行保留。示例性的,数据1表示为[01]、[用户名]、[手机号],数据2表示为[01]、[用户名]、[性别],数据1与数据2进行比对,其中数据1的[手机号]属性与数据2的[性别]属性不同,因此,数据1与数据2是不同的流数据。当所述任意两个数据类型编码相同的数据的数据属性相同时,对所述任意两个数据类型编码相同的数据的数据内容进行比对,将数据内容相同的任意两个数据类型编码相同的数据确定为重复数据,数据内容不同的任意两个数据类型编码相同的数据为不同的流数据。示例性的,数据3表示为[01]、[用户名A]、[手机号B],数据4表示为[01]、[用户名A]、[手机号C],数据3和数据4进行比对,其中,对于手机号属性的数据内容不同,则确定数据3和数据4为不同的流数据。
可选的,所述根据数据去重规则对所述实时流数据进行去重处理,得到去重数据,包括:确定实时流数据中每个数据的至少一个去重关键属性;当所述实时流数据中的两数据的数据类型标识相同时,将两数据中一个数据的至少一个去重关键属性与另一个数据的至少一个去重关键属性进行比对;当所述一个数据的全部去重关键属性与所述另一个数据的全部去重关键属性均相同时,将所述一个数据的至少一个去重关键属性的数据内容与所述另一个数据的至少一个去重关键属性的数据内容进行比对;当所述一个数据的至少一个去重关键属性的数据内容与所述另一个数据的至少一个去重关键属性的数据内容不相同时,确定所述两数据不是重复数据,保留所述两数据;当所述一个数据的全部去重关键属性的数据内容与所述另一个数据的全部去重关键属性的数据内容均相同时,确定所述两数据为重复数据,对所述重复数据进行去重处理。
所述根据数据去重规则对所述实时流数据进行去重处理,得到去重数据,确定多个数据的去重关键属性,可以是当实时流数据包含多个数据属性时,去 重关键属性可以是多个数据属性中的一个或多个,去重关键属性可根据用户需求设置更新,对此不做限定。
示例性的,数据4表示为[04]、[用户名]、[手机号]、[性别]、[密码]、[身份证号],数据5表示为[04]、[用户名]、[手机号]、[年龄]、[身份证号],可以选择[用户名]、[手机号]和[身份证号]为关键属性。
当实时流数据中的任意两数据的数据类型标识相同时,通过采用选择至少一个关键属性对所述任意两数据进行一一比对的方式进行数据去重,当所述任意两数据的全部去重关键属性均相同时,对所述任意两数据的去重关键属性的属性内容进行比对,在所述任意两数据的全部去重关键属性的属性内容相同的情况下,则将所述任意两数据确定为重复数据,对所述重复数据进行去重处理,去重处理可以是选择两个重复数据中的任意一个数据即可,示例性的,数据4表示为[04]、[用户名]、[手机号]、[性别]、[密码]、[身份证号],数据5表示为[04]、[用户名]、[手机号]、[年龄]、[身份证号],当选择[用户名]、[手机号]和[身份证号]为关键属性,对数据4与数据5的关键属性的数据内容进行比对,当数据4与数据5的关键属性的数据内容均相同时,将数据4与数据5确定为相同的流数据,选择数据4或数据5中的任意一个即可。
S230、根据所述去重数据的数据类型,调用所述数据类型对应的正确性检测规则,确定去重数据的有效数据,其中,所述正确性检测规则中包括所述数据类型对应的至少一个数据属性中的每个数据属性的正确性检测标准。
通过配置数据正确性检测规则的标准,根据配置的正确性检测规则的标准对去重数据进行正确性检测,确定数据的有效性,正确性检测规则可以是数据类型对应的至少一个数据属性的正确性检测规则(正确性检测标准),数据类型对应的至少一个数据属性的正确性检测规则可以是根据至少一个数据属性配置的不同的数据属性的正确性检测规则,数据属性的正确性检测规则可以是采用正则表达式进行设定。通过数据属性的正确性检测规则对去重数据进行正确性检测,得到有效数据。
示例性的,针对不同的数据属性配置正确性检测规则,如配置数据属性是手机号的正确性检测规则为:regex="^1[3456789]\d{9}$",即满足一共9位数字,其中第一位是1,第二位可以是3-9中任意一个数,通过此规则对一数据的手机号属性进行正确性检测,排除不满足条件的数据,选取符合正确性检测规则的数据,例如,当手机号为1352,对其进行正确性检测,通过检测该手机号不符合正确手机号的条件,则不获取该手机号,当手机号为134567433时,通过对其进行正确性检测,检测该手机号符合正确手机号的条件,将该手机号存储到数据库中。
本实施例的技术方案,通过对数据格式进行定义,对实时流数据进行接收,基于数据去重规则对接收到的实时流数据通过属性比对进行去重,获取去重数据,配置数据的不同属性相对应的正确性检测规则,将去重数据的属性通过属性对应的正确性检测规则检验数据属性的正确性,将数据属性正确的实时流数据保存到数据库中。这样,通过层次性的数据处理的手段,获取有效数据,解决了数据存储压力大的问题,达到了减轻数据库存储的压力和提高数据有效性的问题的效果。
实施例三
图3是本申请实施例三中的一种数据处理装置的功能模块示意图。本申请提供了一种数据处理装置,包括:数据采集模块310,设置为接收实时流数据;数据去重模块320,设置为根据数据去重规则对所述实时流数据进行去重处理,得到去重数据;正确性验证模块330,设置为根据正确性检测规则对所述去重数据进行正确性检测,得到有效数据;数据存储模块340,设置为存储所述有效数据。
可选的,数据采集模块310,是设置为基于Flink流式框架接收所述实时流数据。
可选的,所述实时流数据包括多个数据,每个数据包括数据类型标识,至少一个数据属性以及每个数据属性的数据内容。
可选的,数据去重模块320,是设置为当所述实时流数据中的两数据的数据类型标识相同时,将所述两数据中的一个数据的至少一个数据属性与另一个数据的至少一个数据属性进行比对;当所述一个数据的至少一个数据属性与所述另一个数据的至少一个数据属性不相同时,确定所述两数据不是重复数据,保留所述两数据;当所述一个数据的全部数据属性与所述另一个数据的全部数据属性相同时,将所述一个数据的至少一个数据属性的数据内容与所述另一个数据的至少一个数据属性的数据内容进行比对;当所述一个数据的全部数据属性的数据内容与所述另一个数据的全部数据属性的数据内容相同时,确定所述两数据为重复数据,对所述重复数据进行去重处理;当所述一个数据中的至少一个数据属性的的数据内容与所述另一个数据的至少一个数据属性的数据内容不相同时,确定所述两数据不是重复数据,保留所述两数据。
可选的,数据去重模块320,是设置为确定实时流数据中的每个数据的至少一个去重关键属性;当所述实时流数据中的两数据的数据类型标识相同时,将两数据中的一个数据的至少一个去重关键属性与另一个数据的至少一个去重关键属性进行比对;当所述一个数据的全部去重关键属性与所述另一个数据的全部去重关键属性相同时,将所述一个数据的至少一个去重关键属性的数据内容 与所述另一个数据的至少一个去重关键属性的数据内容进行比对;当所述一个数据的至少一个去重关键属性的数据内容与所述另一个数据的至少一个去重关键属性的数据内容不相同时,确定所述两数据不是重复数据,保留所述两数据;当所述一个数据的全部去重关键属性的数据内容与所述另一个数据的全部去重关键属性的数据内容相同时,确定所述两数据为重复数据,对所述重复数据进行去重处理。
可选的,正确性验证模块330,是设置为根据所述去重数据的数据类型,调用所述数据类型对应的正确性检测规则,确定去重数据中的有效数据,其中,所述正确性检测规则中包括所述数据类型对应的至少一个数据属性中的每个数据属性的正确性检测标准。
可选的,所述数据去重规则和/或正确性检测规则为XML文件格式。
本实施例的技术方案,数据采集模块接收实时流数据,将所接收到的实时流数据首先通过数据去重模块按照去重规则对数据进行去重处理,获得去重数据,在将去重数据通过正确性检测模块按照所配置的正确性检测规则对去重数据进行正确性检测,获得有效数据,最后将有效数据通过数据存储模块完成数据存储。本实施例解决了数据存储压力大的问题,达到了减轻数据存储压力和提高数据有效性的问题。
上述产品可执行本申请任意实施例所提供的方法,具备执行方法相应的功能模块。
实施例四
图4是本申请实施例四提供的一种电子设备的结构示意图。如图4所示,该设备包括处理器410、存储器420、输入装置430和输出装置440;设备中处理器410的数量可以是一个或多个,图4中以一个处理器410为例。设备中的处理器410、存储器420、输入装置430和、输出装置440可以通过总线或其他方式连接,图4中以通过总线连接为例。
存储器420作为一种计算机可读存储介质,可设置为存储软件程序、计算机可执行程序以及模块,如本申请实施例中的数据处理对应的程序指令/模块(例如,数据处理装置中的数据采集模块310、数据去重模块320、正确性验证模块330和数据存储模块340)。处理器410通过运行存储在存储器420中的软件程序、指令以及模块,从而执行设备的多种功能应用以及数据处理,即实现上述的数据处理方法。
存储器420可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使 用所创建的数据等。此外,存储器420可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器420还可包括相对于处理器410远程设置的存储器,这些远程存储器可以通过网络连接至设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置430可设置为接收输入的流数据,以及产生与设备的用户设置以及功能控制有关的数据输入。输出装置440可包括显示屏等显示设备。
实施例五
本实施例五提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请任意实施例所提供的数据处理方法,该方法包括:接收实时流数据;根据数据去重规则对所述实时流数据进行去重处理,得到去重数据;根据正确性检测规则对所述去重数据进行正确性检测,得到有效数据,并将所述有效数据进行存储。
本申请实施例的计算机存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是但不限于:电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、可擦式可编程只读存储器(Electrically Programmable Read-Only-Memory,EPROM)或闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、射频(Radio Frequency,RF)等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络,包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN),连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
本领域普通技术人员应该明白,上述的本申请的多个模块或多个步骤可以用通用的计算装置来实现,它们可以集中在单个计算装置上,或者分布在多个计算装置所组成的网络上,可选地,他们可以用计算机装置可执行的程序代码来实现,从而可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成多个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件的结合。

Claims (10)

  1. 一种数据处理方法,包括:
    接收实时流数据;
    根据数据去重规则对所述实时流数据进行去重处理,得到去重数据;
    根据正确性检测规则对所述去重数据进行正确性检测,得到有效数据,并存储所述有效数据。
  2. 根据权利要求1所述的方法,其中,所述实时流数据包括多个数据,每个数据包括数据类型标识、至少一个数据属性以及每个数据属性的数据内容。
  3. 根据权利要求2所述的方法,其中,所述根据数据去重规则对所述实时流数据进行去重处理,得到去重数据,包括:在所述实时流数据中的两数据的数据类型标识相同的情况下,将所述两数据中的一个数据的至少一个数据属性与另一个数据的至少一个数据属性的数据属性进行比对;
    在所述一个数据的至少一个数据属性与所述另一个数据的至少一个数据属性不相同的情况下,确定所述两数据不是重复数据,保留所述两数据;
    在所述一个数据的全部数据属性与所述另一个数据的全部数据属性相同的情况下,将所述一个数据的至少一个数据属性的数据内容与所述另一个数据的至少一个数据属性的数据内容进行比对;
    在所述一个数据的全部数据属性的数据内容与所述另一个数据的全部数据属性的数据内容相同的情况下,确定所述两数据为重复数据,对所述重复数据进行去重处理;
    在所述一个数据的至少一个数据属性的数据内容与所述另一个数据的至少一个数据属性的数据内容不相同的情况下,确定所述两数据不是重复数据,保留所述两数据。
  4. 根据权利要求2所述的方法,其中,所述根据数据去重规则对所述实时流数据进行去重处理,得到去重数据,包括:确定所述实时流数据中的每个数据的至少一个去重关键属性;其中,每个数据的至少一个去重关键属性是所述每个数据的全部数据属性中的至少一个数据属性;
    在所述实时流数据中的两数据的数据类型标识相同的情况下,将两数据中的一个数据的至少一个去重关键属性与另一个数据的至少一个去重关键属性进行比对;
    在所述一个数据的全部去重关键属性与所述另一个数据的全部去重关键属性相同的情况下,将所述一个数据的至少一个去重关键属性的数据内容与所述另一个数据的至少一个去重关键属性的数据内容进行比对;
    在所述一个数据的至少一个去重关键属性的数据内容与所述另一个数据的至少一个去重关键属性的数据内容不相同的情况下,确定所述两数据不是重复数据,保留所述两数据;
    在所述一个数据的全部去重关键属性的数据内容与所述另一个数据的全部去重关键属性的数据内容相同的情况下,确定所述两数据为重复数据,对所述重复数据进行去重处理。
  5. 根据权利要求2所述的方法,其中,根据正确性检测规则对所述去重数据进行正确性检测,得到有效数据,包括:
    根据所述去重数据的数据类型,调用所述数据类型对应的正确性检测规则,确定所述去重数据中的有效数据,其中,所述正确性检测规则中包括所述数据类型对应的至少一个数据属性中的每个数据属性的正确性检测标准。
  6. 根据权利要求1所述的方法,其中,所述接收实时流数据,包括:
    基于Flink流式框架接收所述实时流数据。
  7. 根据权利要求1所述的方法,其中,所述数据去重规则和所述正确性检测规则中的至少之一为XML文件格式。
  8. 一种数据处理装置,包括:
    数据采集模块,设置为接收实时流数据;
    数据去重模块,设置为根据数据去重规则对所述实时流数据进行去重处理,得到去重数据;
    正确性验证模块,设置为根据正确性检测规则对所述去重数据进行正确性检测,得到有效数据;
    数据存储模块,设置为存储所述有效数据。
  9. 一种电子设备,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序,
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-7中任一所述的数据处理方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如权利要求1-7中任一所述的数据处理方法。
PCT/CN2021/112248 2020-09-02 2021-08-12 数据处理的方法、装置、设备及存储介质 WO2022048422A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010910153.5 2020-09-02
CN202010910153.5A CN112084179B (zh) 2020-09-02 2020-09-02 一种数据处理的方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022048422A1 true WO2022048422A1 (zh) 2022-03-10

Family

ID=73731836

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/112248 WO2022048422A1 (zh) 2020-09-02 2021-08-12 数据处理的方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN112084179B (zh)
WO (1) WO2022048422A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084179B (zh) * 2020-09-02 2023-11-07 北京锐安科技有限公司 一种数据处理的方法、装置、设备及存储介质
CN113064888B (zh) * 2021-03-25 2021-12-07 珠海格力电器股份有限公司 数据校对方法、装置和系统、服务器、设备
CN113084388B (zh) * 2021-03-29 2023-05-09 广州明珞装备股份有限公司 焊接质量的检测方法、系统、装置及存储介质
CN117093416A (zh) * 2023-08-23 2023-11-21 南方电网数字电网集团有限公司广东分公司 一种基于云平台的数据恢复系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109451006A (zh) * 2018-10-30 2019-03-08 北京锐安科技有限公司 一种数据传输方法、装置、服务器及计算机存储介质
CN109857728A (zh) * 2017-11-30 2019-06-07 广州明领基因科技有限公司 针对图书馆的大数据清洗系统
CN111367989A (zh) * 2020-06-01 2020-07-03 北京江融信科技有限公司 一种实时数据指标计算系统和方法
CN112084179A (zh) * 2020-09-02 2020-12-15 北京锐安科技有限公司 一种数据处理的方法、装置、设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649676B (zh) * 2016-12-15 2020-06-19 北京锐安科技有限公司 一种基于hdfs存储文件的去重方法及装置
CN106599234A (zh) * 2016-12-20 2017-04-26 深圳飓风传媒科技有限公司 基于多维标识的数据可视化处理方法和系统
CN107577769A (zh) * 2017-09-06 2018-01-12 河南腾龙信息工程有限公司 一种计量专业数据的挖掘方法及系统
CN108628931B (zh) * 2018-03-15 2022-08-30 创新先进技术有限公司 一种数据驱动业务的方法、装置以及设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857728A (zh) * 2017-11-30 2019-06-07 广州明领基因科技有限公司 针对图书馆的大数据清洗系统
CN109451006A (zh) * 2018-10-30 2019-03-08 北京锐安科技有限公司 一种数据传输方法、装置、服务器及计算机存储介质
CN111367989A (zh) * 2020-06-01 2020-07-03 北京江融信科技有限公司 一种实时数据指标计算系统和方法
CN112084179A (zh) * 2020-09-02 2020-12-15 北京锐安科技有限公司 一种数据处理的方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN112084179B (zh) 2023-11-07
CN112084179A (zh) 2020-12-15

Similar Documents

Publication Publication Date Title
WO2022048422A1 (zh) 数据处理的方法、装置、设备及存储介质
US11755630B2 (en) Regular expression generation using longest common subsequence algorithm on combinations of regular expression codes
US20210248143A1 (en) Automatically executing graphql queries on databases
CN111311326B (zh) 用户行为实时多维度分析方法、装置及存储介质
US20160042015A1 (en) Activity information schema discovery and schema change detection and notification
EP3987408A1 (en) Regular expression generation using span highlighting alignment
WO2022068348A1 (zh) 关系图谱构建方法、装置、电子设备及存储介质
WO2020000742A1 (zh) 一种去重流量记录方法、装置、服务器及存储介质
US9195763B2 (en) Identifying unknown parameter and name value pairs
CN113761565B (zh) 数据脱敏方法和装置
US20240195860A1 (en) Sample message processing method and apparatus
CN116562255A (zh) 表单信息生成方法、装置、电子设备和计算机可读介质
WO2021129849A1 (zh) 日志处理方法、装置、设备和存储介质
WO2020263674A1 (en) User interface commands for regular expression generation
WO2019134277A1 (zh) 数据过滤方法、装置、服务器及可读存储介质
CN118585569A (zh) 一种数据导入方法和装置
CN112486967A (zh) 一种数据采集方法、终端设备及存储介质
CN115510091A (zh) 一种话单数据处理方法、装置、电子设备及存储介质
CN113110873A (zh) 统一系统编码规范的方法和装置
CN116955420A (zh) 数据访问方法及其装置、存储介质、程序产品
CN115203228A (zh) 数据处理方法、装置、介质以及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21863489

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21863489

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 22/09/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21863489

Country of ref document: EP

Kind code of ref document: A1