WO2020237878A1 - 数据去重方法、装置、计算机设备以及存储介质 - Google Patents

数据去重方法、装置、计算机设备以及存储介质 Download PDF

Info

Publication number
WO2020237878A1
WO2020237878A1 PCT/CN2019/103388 CN2019103388W WO2020237878A1 WO 2020237878 A1 WO2020237878 A1 WO 2020237878A1 CN 2019103388 W CN2019103388 W CN 2019103388W WO 2020237878 A1 WO2020237878 A1 WO 2020237878A1
Authority
WO
WIPO (PCT)
Prior art keywords
field
characteristic
fields
feature
access request
Prior art date
Application number
PCT/CN2019/103388
Other languages
English (en)
French (fr)
Inventor
高源�
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020237878A1 publication Critical patent/WO2020237878A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/60Business processes related to postal services

Definitions

  • This application relates to the field of computer technology, and in particular to a data deduplication method, device, computer equipment and storage medium.
  • the embodiments of the present application provide a data deduplication method, device, computer equipment, and storage medium, which can reduce the consumption of a large amount of repeated data on database resources, save database memory space, and further reduce the complaint rate of users. Improve the reputation of the company.
  • an embodiment of the present application provides a data deduplication method, which includes:
  • the characteristic field is a repeated field, store the characteristic field in a preset exception processing queue; otherwise, output a prompt message for prompting that the characteristic field is a normal field.
  • an embodiment of the present application provides a data deduplication device, and the device includes:
  • the obtaining unit is used to obtain the data access request, and extract the characteristic fields in the data access request;
  • a processing unit configured to clean the characteristic fields and standardize the cleaned characteristic fields
  • a splicing processing unit configured to splice the characteristic fields to generate a characteristic field combination, and use a hash algorithm to compress the characteristic field combination
  • the identification judgment unit is configured to identify the compressed feature field based on the preset database cluster, and determine whether the feature field is a repeated field according to the identification result;
  • the storage output unit is configured to store the characteristic field in a preset exception processing queue if the characteristic field is a repeated field, otherwise output a prompt message, which is used to prompt the characteristic field as a normal field.
  • an embodiment of the present application also provides a computer device, including a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the computer program When implementing the data deduplication method as described above.
  • the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores one or more computer programs, and the one or more computer programs can be used by one or more The processor executes to implement the data deduplication method as described above.
  • the data deduplication method described in the embodiments of the present application can reduce the consumption of a large amount of repeated data on database resources, save database memory space, and further reduce the complaint rate of users and improve the reputation of the enterprise.
  • FIG. 1 is a schematic diagram of an application scenario of a data deduplication method provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a data deduplication method provided by an embodiment of the present application
  • FIG. 3 is another schematic flowchart of a data deduplication method provided by an embodiment of the present application.
  • FIG. 4 is another schematic flowchart of a data deduplication method provided by an embodiment of the present application.
  • FIG. 5 is another schematic flowchart of a data deduplication method provided by an embodiment of the present application.
  • FIG. 6 is another schematic flowchart of a data deduplication method provided by an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a data deduplication device provided by an embodiment of the present application.
  • FIG. 8 is another schematic block diagram of a data deduplication device provided by an embodiment of the present application.
  • FIG. 9 is another schematic block diagram of a data deduplication device provided by an embodiment of the present application.
  • FIG. 10 is another schematic block diagram of a data deduplication apparatus provided by an embodiment of the present application.
  • FIG. 11 is another schematic block diagram of a data deduplication device provided by an embodiment of the present application.
  • FIG. 12 is another schematic block diagram of a data deduplication device provided by an embodiment of the present application.
  • FIG. 13 is another schematic block diagram of a data deduplication device provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of the structural composition of a computer device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an application scenario of a data deduplication method provided by an embodiment of the application.
  • the application scenario includes:
  • a server is used to provide back-end services for data transmission.
  • a server is a computer device, which can be a single server or a server cluster, a cloud server, or a dedicated web server, which receives access from external terminals and connects to the terminal through a wired network or a wireless network.
  • the terminal shown in FIG. 1 includes terminal 1, terminal 2, and terminal 3.
  • the terminal obtains target data from the server by accessing the server, and inserts the obtained target data into the local data table on the terminal on the terminal in.
  • the terminal may be an electronic device such as a smart phone, a smart watch, a notebook computer, a tablet computer, or a desktop computer, and the terminal accesses the server through a wired network or a wireless network.
  • FIG. 2 is a schematic flowchart of a data deduplication method provided by an embodiment of the application. As shown in Figure 2, the method includes the following steps S101 to S105.
  • S101 Obtain a data access request, and extract characteristic fields in the data access request.
  • the data access request refers to a logistics data request sent by the EMS system to the server
  • the data access request is sent to the server in the form of a request message
  • the server receives the data sent by the EMS system
  • the request message is an HTTP message.
  • the characteristic field refers to the field data content in the data access request.
  • the characteristic field may include field data content such as zip code, logistics order number, and logistics time.
  • the step S101 includes steps S201 to S202:
  • the data access request sent by the EMS system is actually a request message, and parsing the data access request means parsing the request message.
  • the process of parsing is: first determine the composition of the request message Part (including request line, request header, and request body), where the request line contains a method and a request URL, and also contains the version of the HTTP message.
  • the request header contains the header field of the HTTP message and adds to the request message Some additional information, the request body contains the length and field content of the message.
  • the field content of the request message is determined from the request body and extracted as a characteristic field.
  • the characteristic field may include a zip code and logistics order number. , Logistics time and other field data content.
  • cleaning the characteristic field refers to capturing the null value field in the characteristic field, and loading or replacing the null value field with specific data.
  • different databases can be split according to the null field; the cleaned feature fields are standardized, and the process of standardized processing is specifically: for feature fields from different data sources, it is possible that the same feature fields belong to different data Type, but actually refers to the same concept, for example: they may be character or numeric.
  • the current feature field will be judged regardless of the data it belongs to in the data source Type, and use numeric type instead of character type or use character type instead of numeric type.
  • the step of splicing the feature fields to generate a feature field combination, and using the hash algorithm to compress the feature field combination includes the following steps S301 to S303; S301, Use the append method in the StringBuilder class of the C# language to splice the characteristic fields; S302, use the hash algorithm to perform the modulo operation on the spliced characteristic fields; S303, obtain the operation result, and locate and store the operation result to complete the compression deal with.
  • the append method in the StringBuilder class of the C# language can be used to splice the characteristic fields in the request message.
  • the StringBuilder class is a variable character sequence class in the C# language.
  • the append method is used to splice the characteristic fields in order. For example: for the feature fields "zip code, logistics order number, logistics time”, the splicing method is append (zip code, logistics order number, logistics time), and the final output splicing result is "zip code, logistics order number, logistics time”.
  • the hash algorithm is a "linked list hash” data structure. Through the hash algorithm, the characteristic field data will be modulo operation, and the result will be placed in a certain position of the array, that is, the positioning and storage of the characteristic field data is realized. This compression is not to compress and restore the original data, but to hash the metadata to become a unique hashcode, and then the hashcode needs to be used to determine whether the feature field data is duplicated.
  • Hashcode is the address of the object obtained by the hash algorithm, or the number of int type calculated from a string or number, which can be obtained by calling Object.hashcode() in java.
  • S104 Identify the compressed feature field based on a preset database cluster, and determine whether the feature field is a repeated field according to the recognition result.
  • the preset database cluster is a distributed redis database cluster. Before identifying the compressed feature fields, the database cluster needs to be set up in advance, and the preset database cluster is initialized. The feature field data is stored in the database cluster in advance.
  • the preset database cluster in this embodiment is a distributed redis database cluster.
  • the distributed redis database cluster is an existing memory-based database and is also a single-threaded high-performance Memory Database.
  • Identify the compressed feature fields based on the preset database cluster When identifying, you need to call the setnx command. According to the setnx command, the compressed feature fields are traversed through all the data stored in the preset database cluster, and then the setnx command returns one Return value, and then judge whether the characteristic field is a repeated field according to the return value returned by the setnx command.
  • the so-called repeated field refers to whether there are all the same characteristic field data as the compressed characteristic field in the preset database cluster. If yes, it means that the compressed characteristic field is a repeated field. If not, it means that the compressed characteristic field is not a repeated field.
  • S105 If the characteristic field is a repeated field, store the characteristic field in a preset exception processing queue; otherwise, output a prompt message, which is used to prompt the characteristic field as a normal field.
  • the exception processing queue is a processing window specially set for abnormal data by the distributed redis database cluster, and the repeated characteristic fields are stored in the processing window for processing operations, and the processing operations can be delete operations, for example, Store the repeated feature fields directly into the processing window for deletion.
  • the embodiment of the application obtains the data access request and extracts the characteristic fields in the data access request; cleans the characteristic fields, and standardizes the cleaned characteristic fields; splices the characteristic fields to generate the characteristic fields Combination, use the hash algorithm to compress the feature field combination; recognize the compressed feature field based on the preset database cluster, and judge whether the feature field is a repeated field according to the recognition result; if the feature field is a repeated field, set the feature field Store in the preset exception processing queue, otherwise output a prompt message, which is used to prompt that the characteristic field is a normal field.
  • This application provides a data deduplication method based on data processing, which can reduce the consumption of database resources by a large amount of repeated data, save database memory space, and further reduce the complaint rate of users and improve the reputation of the enterprise.
  • FIG. 5 is a schematic flowchart of a data deduplication method according to another embodiment of the application. As shown in Figure 5, the method includes the following steps S401 to S407.
  • S401 Obtain a data access request, and extract characteristic fields in the data access request.
  • the data access request refers to a logistics data request sent by the EMS system to the server
  • the data access request is sent to the server in the form of a request message
  • the server receives the data sent by the EMS system
  • the request message is an HTTP message.
  • the characteristic field refers to the field data content in the data access request.
  • the characteristic field may include field data content such as zip code, logistics order number, and logistics time.
  • cleaning the characteristic field refers to capturing the null value field in the characteristic field, and loading or replacing the null value field with specific data.
  • different databases can be split according to the null field; the cleaned feature fields are standardized, and the process of standardized processing is specifically: for feature fields from different data sources, it is possible that the same feature fields belong to different data Type, but actually refers to the same concept, for example: they may be character or numeric.
  • the current feature field will be judged regardless of the data it belongs to in the data source Type, and use numeric type instead of character type or use character type instead of numeric type.
  • the append method in the StringBuilder class of the C# language can be used to splice the characteristic fields in the request message.
  • the StringBuilder class is a variable character sequence class in the C# language.
  • the append method is used to sort the characteristic fields in order Splicing in sequence, for example: for the feature fields "zip code, logistics order number, logistics time", the splicing method is append (zip code, logistics order number, logistics time), and the final output of the splicing result is "zip code, logistics order number, logistics time” ".
  • the hash algorithm is a "linked list hash" data structure.
  • Hashcode is the object address obtained through the hash algorithm, or a number of int type calculated from a string or number, which can be obtained by calling Object.hashcode() in java.
  • the so-called fields of the same type refer to the type set in the field belong to the same type, for example, field A and field B are of integer type, field C and field D are of floating point type, etc.; Multiple compressed feature fields, and multiple compressed feature fields have the same sub-feature field, then it is judged that the multiple compressed feature fields are of the same type and grouped. For example, in the above practical example, If the same sub-feature field "Zip Code" exists in the compressed feature fields, these compressed feature fields are of the same type, and these same type fields are grouped into the same group. The grouping method is to group the corresponding Fields of the same type are stored in the same list collection.
  • S406 Identify the compressed feature field based on the preset database cluster, and determine whether the feature field is a repeated field according to the recognition result.
  • the preset database cluster is a distributed redis database cluster. Before identifying the compressed feature fields, the database cluster needs to be set up in advance, and the preset database cluster is initialized. The feature field data is stored in the database cluster in advance.
  • the preset database cluster in this embodiment is a distributed redis database cluster.
  • the distributed redis database cluster is an existing memory-based database and is also a single-threaded high-performance Memory Database.
  • Identify the compressed feature fields based on the preset database cluster When identifying, you need to call the setnx command. According to the setnx command, the compressed feature fields are traversed through all the data stored in the preset database cluster, and then the setnx command returns one Return value, and then judge whether the characteristic field is a repeated field according to the return value returned by the setnx command.
  • the so-called repeated field refers to whether there are all the same characteristic field data as the compressed characteristic field in the preset database cluster. If yes, it means that the compressed characteristic field is a repeated field. If not, it means that the compressed characteristic field is not a repeated field.
  • the exception processing queue is a processing window specially set for abnormal data by the distributed redis database cluster, and the repeated characteristic fields will be stored in the processing window for processing operations, and the processing operations may include deletion operations.
  • FIG. 6 is a schematic flowchart of a data deduplication method according to another embodiment of the application. As shown in Figure 6, the method includes the following steps S501 to S507.
  • S501 Obtain a data access request, and extract characteristic fields in the data access request.
  • the data access request refers to a logistics data request sent by the EMS system to the server
  • the data access request is sent to the server in the form of a request message
  • the server receives the data sent by the EMS system
  • the request message is an HTTP message.
  • the characteristic field refers to the field data content in the data access request.
  • the characteristic field may include field data content such as zip code, logistics order number, and logistics time.
  • cleaning the characteristic field refers to capturing the null value field in the characteristic field, and loading or replacing the null value field with specific data.
  • different databases can be split according to the null field; the cleaned feature fields are standardized, and the process of standardized processing is specifically: for feature fields from different data sources, it is possible that the same feature fields belong to different data Type, but actually refers to the same concept, for example: they may be character or numeric.
  • the current feature field will be judged regardless of the data it belongs to in the data source Type, and use numeric type instead of character type or use character type instead of numeric type.
  • the append method in the StringBuilder class of the C# language can be used to splice the characteristic fields in the request message.
  • the StringBuilder class is a variable character sequence class in the C# language.
  • the append method is used to sort the characteristic fields in order Splicing in sequence, for example: for the feature fields "zip code, logistics order number, logistics time", the splicing method is append (zip code, logistics order number, logistics time), and the final output of the splicing result is "zip code, logistics order number, logistics time” ".
  • the hash algorithm is a "linked list hash" data structure.
  • Hashcode is the address of the object obtained by the hash algorithm, or the number of int type calculated from a string or number, which can be obtained by calling Object.hashcode() in java.
  • S504 Identify the compressed feature field based on a preset database cluster, and determine whether the feature field is a repeated field according to the recognition result.
  • the preset database cluster is a distributed redis database cluster. Before identifying the compressed feature fields, the database cluster needs to be set up in advance, and the preset database cluster is initialized. The feature field data is stored in the database cluster in advance.
  • the preset database cluster in this embodiment is a distributed redis database cluster.
  • the distributed redis database cluster is an existing memory-based database and is also a single-threaded high-performance Memory Database.
  • Identify the compressed feature fields based on the preset database cluster When identifying, you need to call the setnx command. According to the setnx command, the compressed feature fields are traversed through all the data stored in the preset database cluster, and then the setnx command returns one Return value, and then judge whether the characteristic field is a repeated field according to the return value returned by the setnx command.
  • the so-called repeated field refers to whether there are all the same characteristic field data as the compressed characteristic field in the preset database cluster. If yes, it means that the compressed characteristic field is a repeated field. If not, it means that the compressed characteristic field is not a repeated field.
  • the exception processing queue is a processing window specially set for abnormal data by the distributed redis database cluster, and the repeated characteristic fields will be stored in the processing window for processing operations, and the processing operations may include deletion operations.
  • S506 Pre-set the data update duration of the preset database cluster.
  • the data update duration may be 1 minute, 2 minutes, or other preset durations.
  • the specific value of the preset duration is not limited here, and can be set according to actual needs.
  • the deletion instruction in the preset database cluster is triggered and the deletion instruction is executed to delete the Characteristic field.
  • the field data in the preset database cluster can be updated regularly, and the field data in the database cluster can be updated on schedule.
  • the device 100 includes: an acquisition unit 101, a processing unit 102, a splicing processing unit 103, and an identification judgment unit 104, Storage output unit 105.
  • the obtaining unit 101 is configured to obtain a data access request and extract the characteristic fields in the data access request; the processing unit 102 is configured to clean the characteristic fields, generate a characteristic field combination, and combine the characteristic fields
  • the hash algorithm is used for compression processing; the splicing processing unit 103 is used to splice the feature fields, and the spliced feature fields are compressed using the hash algorithm; the identification and judgment unit 104 is used to compress the compressed fields based on the preset database cluster
  • the processed feature field is identified, and the identification result is used to determine whether the feature field is a repeated field; the storage output unit 105 is configured to store the feature field in a preset exception processing queue if the feature field is a repeated field Otherwise, a prompt message is output, and the prompt message is used to prompt that the characteristic field is a normal field.
  • the embodiment of the application obtains the data access request and extracts the characteristic fields in the data access request; cleans the characteristic fields, and standardizes the cleaned characteristic fields; splices the characteristic fields to generate the characteristic fields Combination, use the hash algorithm to compress the feature field combination; recognize the compressed feature field based on the preset database cluster, and judge whether the feature field is a repeated field according to the recognition result; if the feature field is a repeated field, set the feature field Store in the preset exception processing queue, otherwise output a prompt message, which is used to prompt that the characteristic field is a normal field.
  • This application provides a data deduplication method based on data processing, which can reduce the consumption of database resources by a large amount of repeated data, save database memory space, and further reduce the complaint rate of users and improve the reputation of the enterprise.
  • the acquiring unit 101 includes: a parsing unit 101a, configured to acquire a data access request, and parse the data access request; an acquiring subunit 101b, configured to acquire the data access request according to the analysis result The characteristic field in the request.
  • the splicing processing unit 103 includes: a splicing unit 103a for splicing the characteristic fields using the append method in the StringBuilder class of the C# language; an arithmetic unit 103b for splicing the post-splicing by using a hash algorithm The characteristic field of is subjected to modulo operation; the storage unit 103c is used to obtain the operation result, and position and store the operation result to complete the compression process.
  • the processing unit 102 includes: a capturing unit 102a for capturing the null value field in the characteristic field; a processing subunit 102b for loading or replacing the null value field with specific data .
  • the recognition and judgment unit 104 includes: a recognition and judgment subunit 104a, configured to traverse all the data stored in the preset database cluster according to the setnx command to traverse the compressed feature fields, and return the setnx command A return value is used to determine whether the characteristic field is a repeated field according to the return value returned by the setnx command.
  • an embodiment of the present application also provides a data deduplication device.
  • the device 200 includes: an acquisition unit 201, a processing unit 202, a splicing processing unit 203, a judgment unit 204, and a grouping Unit 205, identification judgment unit 206, storage output unit 207.
  • the obtaining unit 201 is configured to obtain a data access request and extract the characteristic fields in the data access request; the processing unit 202 is configured to clean the characteristic fields and standardize the cleaned characteristic fields;
  • the splicing processing unit 203 is used for splicing the characteristic fields to generate a combination of characteristic fields, and compressing the characteristic field combinations using a hash algorithm;
  • the judging unit 204 is used for judging whether the compressed characteristic field is Fields of the same type;
  • the grouping unit 205 is configured to group the compressed feature fields if the compressed feature fields are of the same type;
  • the identification and judgment unit 206 is configured to group the compressed feature fields based on a preset database cluster
  • the compressed feature field is identified, and the feature field is judged according to the recognition result whether the feature field is a repeated field;
  • the storage output unit 207 is configured to store the feature field in a preset exception process if the feature field is a repeated field In the queue, otherwise, a prompt message is output, and the prompt message is used to prompt
  • an embodiment of the present application also proposes a data deduplication device.
  • the device 300 includes: an acquisition unit 301, a processing unit 302, a splicing processing unit 303, an identification judgment unit 304, The storage output unit 305, the preset unit 306, and the deletion unit 307.
  • the obtaining unit 301 is configured to obtain the data access request and extract the characteristic fields in the data access request; the processing unit 302 cleans the characteristic fields and normalizes the cleaned characteristic fields; splicing processing The unit 303 is used for splicing the feature fields to generate a combination of feature fields, and the feature field combination is compressed using a hash algorithm; the identification judgment unit 304 is used for performing compression processing on the feature fields based on a preset database cluster Perform recognition, and determine whether the characteristic field is a repeated field according to the recognition result; the storage output unit 305 is configured to store the characteristic field in a preset exception processing queue if the characteristic field is a repeated field, otherwise output A prompt message, the prompt message is used to prompt that the characteristic field is a normal field; the presetting unit 306 is used to pre-set the data update duration of the preset database cluster; the deleting unit 307 is used to if the characteristic field is stored When the time period in the preset database cluster exceeds the preset data update time period, the characteristic field is deleted.
  • the foregoing data deduplication device corresponds to the foregoing data deduplication method one-to-one, and its specific principle and process are the same as the method described in the foregoing embodiment, and will not be repeated.
  • the above-mentioned data deduplication device can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 14.
  • FIG. 14 is a schematic diagram of the structural composition of a computer device of this application.
  • the device can be a terminal or a server, where the terminal can be an electronic device with communication functions and voice input functions such as smart phones, tablet computers, notebook computers, desktop computers, personal digital assistants, and wearable devices.
  • the server can be an independent server or a server cluster composed of multiple servers.
  • the computer device 500 includes a processor 502, a nonvolatile storage medium 503, an internal memory 504, and a network interface 505 connected through a system bus 501.
  • the non-volatile storage medium 503 of the computer device 500 may store an operating system 5031 and a computer program 5032.
  • the processor 502 may execute a data deduplication method.
  • the processor 502 of the computer device 500 is used to provide calculation and control capabilities, and support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the running of the computer program 5032 in the non-volatile storage medium 503, and when the computer program is executed by the processor, the processor 502 can execute a data deduplication method.
  • the network interface 505 of the computer device 500 is used for network communication.
  • the processor 502 executes the computer program, the following operations are implemented: obtaining a data access request, and extracting characteristic fields in the data access request; cleaning the characteristic fields, and performing the cleaned characteristic fields Standardization processing; splicing the feature fields to generate a feature field combination, and compressing the feature field combination using a hash algorithm; identifying the compressed feature field based on the preset database cluster, and judging the location based on the recognition result Whether the characteristic field is a repeated field; if the characteristic field is a repeated field, store the characteristic field in the preset exception handling queue, otherwise output a prompt message, which is used to prompt that the characteristic field is normal Field.
  • the processor 502 executes the step of acquiring the data access request and extracting the characteristic fields in the data access request
  • the following operations are specifically executed: acquiring the data access request, and performing the data access request on the data access request. Parsing; obtaining the characteristic fields in the data access request according to the parsing result.
  • the processor 502 when the processor 502 performs the steps of splicing the feature fields to generate a feature field combination, and performing compression processing on the feature field combination using a hash algorithm, the following operations are specifically performed: use C# language The append method in the StringBuilder class is used to splice the characteristic fields; the hash algorithm is used to perform modulo operation on the spliced characteristic fields; the operation result is obtained, and the operation result is positioned and stored to complete the compression process.
  • the processor 502 also implements the following operations when executing the computer program: judging whether the compressed feature field is a field of the same type; if the compressed feature field is a field of the same type To group the compressed feature fields.
  • the processor 502 also implements the following operations when executing the computer program: preset the data update duration of the preset database cluster; if the characteristic field is stored in the preset database cluster When the duration exceeds the preset data update duration, the characteristic field is deleted.
  • the processor 502 executes the step of cleaning the characteristic field, the following operations are specifically performed: capture the null value field in the characteristic field; perform specific data on the null value field Load or replace.
  • the processor 502 executes the step of recognizing the compressed feature field based on the preset database cluster, and judging whether the feature field is a repeated field according to the recognition result, the following operations are specifically performed: According to the setnx command, the compressed characteristic fields are traversed through all the data stored in the preset database cluster, and a return value is returned by the setnx command, and then according to the return value returned by the setnx command to determine whether the characteristic field is a repeated field .
  • the embodiment of the computer device shown in FIG. 14 does not constitute a limitation on the specific configuration of the computer device.
  • the computer device may include more or less components than those shown in the figure. Or combine certain components, or different component arrangements.
  • the computer device only includes a memory and a processor. In such embodiments, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 14, and will not be repeated here.
  • the computer-readable storage medium stores one or more computer programs, and the one or more computer programs can be executed by one or more processors to implement the Any embodiment of the data deduplication method.
  • the aforementioned storage media in this application include: magnetic disks, optical discs, read-only memory (Read-Only Memory, ROM) and other media that can store program codes.
  • ROM Read-Only Memory
  • the units in all embodiments of the present application may be implemented by general integrated circuits, such as CPU (Central Processing Unit, central processing unit), or by ASIC (Application Specific Integrated Circuit, application specific integrated circuit).
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the steps in the data deduplication method of the embodiment of the present application can be adjusted, merged, and deleted in order according to actual needs.
  • the units in the data deduplication device of the embodiment of the present application may be combined, divided, and deleted according to actual needs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例公开了一种数据去重方法、装置、计算机设备以及存储介质,所述方法包括:获取数据访问请求,并提取数据访问请求中的特征字段;对特征字段进行清洗,并将清洗后的特征字段进行规范化处理;对特征字段进行拼接,生成特征字段组合,对特征字段组合使用hash算法进行压缩处理;基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断特征字段是否为重复字段;若特征字段为重复字段,将特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示特征字段为正常字段。

Description

数据去重方法、装置、计算机设备以及存储介质
本申请要求于2019年5月30日提交中国专利局、申请号为CN 201910461945.6、申请名称为“数据去重方法、装置、计算机设备以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种数据去重方法、装置、计算机设备以及存储介质。
背景技术
目前,在与EMS(Express Mail Service,邮政特快专递)系统交互时,经常会从EMS系统中获得大量重复的快递物流动态信息,由于所获取的大量的快递物流动态信息需要进行存储,需要占用后台数据库的大量存储空间,造成后台数据库的存储压力非常大,在严重情况下,有可能会导致后台数据库无法正常工作,另外,由于后台数据库中大量的快递物流动态信息存在重复的信息,若没有经过一定的筛选或者去重处理便向相应的用户发送重复的物流短信或者其他物流消息推送,容易引起用户投诉,对企业声誉产生严重的影响。
发明内容
有鉴于此,本申请实施例提供一种数据去重方法、装置、计算机设备以及存储介质,能够减轻大量重复数据对数据库资源的消耗,节省数据库内存空间,进一步地,能够降低用户的投诉率,改善企业的声誉。
一方面,本申请实施例提供了一种数据去重方法,该方法包括:
获取数据访问请求,并提取所述数据访问请求中的特征字段;
对所述特征字段进行清洗,并将清洗后的特征字段进行规范化处理;
对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理;
基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段;
若所述特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示所述特征字段为正常字段。
另一方面,本申请实施例提供了一种数据去重装置,所述装置包括:
获取单元,用于获取数据访问请求,并提取所述数据访问请求中的特征字段;
处理单元,用于对所述特征字段进行清洗,并将清洗后的特征字段进行规范化处理;
拼接处理单元,用于对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理;
识别判断单元,用于基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段;
存储输出单元,用于若所述特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示所述特征字段为正常字段。
又一方面,本申请实施例还提供了一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上所述的数据去重方法。
再一方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者一个以上计算机程序,所述一个或者一个以上计算机程序可被一个或者一个以上的处理器执行,以实现如上所述的数据去重方法。
本申请实施例所述的数据去重方法能够减轻大量重复数据对数据库资源的消耗,节省数据库内存空间,进一步地,能够降低用户的投诉率,改善企业的声誉。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实 施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种数据去重方法的应用场景示意图;
图2是本申请实施例提供的一种数据去重方法的示意流程图;
图3是本申请实施例提供的一种数据去重方法的另一示意流程图;
图4是本申请实施例提供的一种数据去重方法的另一示意流程图;
图5是本申请实施例提供的一种数据去重方法的另一示意流程图;
图6是本申请实施例提供的一种数据去重方法的另一示意流程图;
图7是本申请实施例提供的一种数据去重装置的示意性框图;
图8是本申请实施例提供的一种数据去重装置的另一示意性框图;
图9是本申请实施例提供的一种数据去重装置的另一示意性框图;
图10是本申请实施例提供的一种数据去重装置的另一示意性框图;
图11是本申请实施例提供的一种数据去重装置的另一示意性框图;
图12是本申请实施例提供的一种数据去重装置的另一示意性框图;
图13是本申请实施例提供的一种数据去重装置的另一示意性框图;
图14是本申请实施例提供的一种计算机设备的结构组成示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/ 或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
请参阅图1,图1为本申请实施例提供的一种数据去重方法的应用场景示意图,所述应用场景包括:
(1)服务器,服务器用于提供数据传输的后端服务。服务器为一种计算机设备,可以为单台服务器或者服务器集群,也可以为云服务器,或者为专门的网页服务器,接收外部终端的访问,通过有线网络或者无线网络与终端连接。
(2)终端,图1所示终端包括终端1、终端2和终端3,所述终端通过访问服务器,从服务器上获取目标数据,在终端上将所获取的目标数据插入终端上的本地数据表中。所述终端可以为智能手机、智能手表、笔记本电脑、平板电脑或者台式电脑等电子设备,终端通过有线网络或者无线网络访问服务器。
请参阅图2,图2为本申请实施例提供的一种数据去重方法的示意流程图。如图2所示,该方法包括以下步骤S101~S105。
S101,获取数据访问请求,并提取所述数据访问请求中的特征字段。
在本申请实施例中,所述数据访问请求指的是由EMS系统向服务器发出的物流数据请求,该数据访问请求以请求报文的形式发送至服务器中,并由服务器接收EMS系统所发送的数据访问请求,该请求报文为HTTP报文。所述特征字段指的是数据访问请求中的字段数据内容,例如,对于物流数据请求,特征字段可以包括邮编、物流订单号、物流时间等字段数据内容。
在一实施例中,如图3所示,所述步骤S101包括步骤S201~S202:
S201,获取数据访问请求,对所述数据访问请求进行解析。
在本申请实施例中,由EMS系统发出的数据访问请求实际为请求报文,对所述数据访问请求进行解析即为对请求报文进行解析,解析的过程为:首先确定请求报文的组成部分(包括请求行、请求头以及请求体),其中,请求行包含了一个方法和一个请求的URL,还包含HTTP报文的版本,请求头包含HTTP报文首部字段向请求报文中添加了一些附加信息,请求体包含报文的长度和字段内容。
S202,根据所述解析结果获取所述数据访问请求中的特征字段。
在本申请实施例中,在确定了报文的组成部分后,从请求体中确定出请求 报文的字段内容并提取作为特征字段,在本实施例中,特征字段可以包括邮编、物流订单号、物流时间等字段数据内容。
S102,对所述特征字段进行清洗,并将清洗后的特征字段进行规范化处理。
在本申请实施例中,对所述特征字段进行清洗指的是捕获特征字段中的空值字段,将空值字段加载或替换为特定数据。其中,根据空值字段可以进行不同数据库的分流;将清洗后的特征字段进行规范化处理,规范化处理的过程具体为:针对来自不同数据源的特征字段,有可能出现相同的特征字段属于不同的数据类型,但在实际上指的是同一个概念,例如:它们可能会是字符型或者数值型,针对这种情况,将对当前特征字段进行判断而不考虑其本身在数据源中所属于的数据类型,并使用数值型来代替字符型或者使用字符型来代替数值型。
S103,对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理。
在本申请实施例中,对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理的步骤,如图4所示,包括以下步骤S301~S303;S301,使用C#语言的StringBuilder类中的append方法对所述特征字段进行拼接;S302,使用hash算法对拼接后的特征字段进行取模运算;S303,获取运算结果,并将运算结果进行定位存放以完成压缩处理。具体的,可使用C#语言的StringBuilder类中的append方法对请求报文中的特征字段进行拼接,StringBuilder类为C#语言中的可变字符序列类,使用append方法将特征字段按照先后顺序进行拼接,例如:对于特征字段“邮编、物流订单号、物流时间”,拼接方法为append(邮编,物流订单号,物流时间),最后输出的拼接结果为“邮编,物流订单号,物流时间”。hash算法是一个“链表散列”的数据结构,通过hash算法,会将特征字段数据进行取模运算,并将结果放在数组某个位置,即实现特征字段数据的定位存放。这个压缩不是原数据进行压缩还原,而是对元数据进行hash处理,变成一个唯一hashcode,后续还需要由hashcode来判断特征字段数据是否重复。hashcode是经过hash算法获取的对象地址,或者字符串或数字算出来的int类型的数字,java中调用Object.hashcode()即可获取。
S104,基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段。
在本申请实施例中,所述预设数据库集群为分布式redis数据库集群,在对已压缩处理的特征字段进行识别之前,需要预先设置数据库集群,并对预先设置的数据库集群进行初始化,初始化指的是预先向数据库集群中存储特征字段数据,本实施例中的预设数据库集群为分布式redis数据库集群,该分布式redis数据库集群为现有的基于内存的数据库,也是一种单线程高性能内存数据库。
基于预设数据库集群对已压缩处理的特征字段进行识别,在识别时需要调用setnx命令,根据setnx命令将已压缩处理的特征字段遍历预设数据库集群中存储的所有数据,然后由setnx命令返回一个返回值,再根据setnx命令返回的返回值来判断所述特征字段是否为重复字段,所谓重复字段指的是预设数据库集群中是否存在与已压缩处理的特征字段全部相同的特征字段数据,若有,则说明已压缩处理的特征字段为重复字段,若没有,则说明已压缩处理的特征字段不是重复字段。
S105,若所述特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示所述特征字段为正常字段。
在本申请实施例中,若setnx命令的返回值为1,则说明已压缩处理的特征字段不是重复字段,并输出所述特征字段为正常字段的提示消息,提示消息可以以短信、消息推送的方式进行推送,最后将所述特征字段存储至分布式redis数据库集群中;若setnx命令的返回值为0,则说明已压缩处理的特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,其中,异常处理队列为分布式redis数据库集群专门为异常数据设置的处理窗口,对于重复的特征字段存储至该处理窗口中进行处理操作,该处理操作可以为删除操作,例如,直接将重复的特征字段存储至该处理窗口中进行删除。
由以上可见,本申请实施例通过获取数据访问请求,并提取数据访问请求中的特征字段;对特征字段进行清洗,并将清洗后的特征字段进行规范化处理;对特征字段进行拼接,生成特征字段组合,对特征字段组合使用hash算法进行压缩处理;基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断特征字段是否为重复字段;若特征字段为重复字段,将特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示特征字段为正常字段。本申请基于数据处理提供一种数据去重方法,能够减轻大量 重复数据对数据库资源的消耗,节省数据库内存空间,进一步地,能够降低用户的投诉率,改善企业的声誉。
请参阅图5,图5为本申请另一实施例提供的一种数据去重方法的示意流程图。如图5所示,该方法包括以下步骤S401~S407。
S401,获取数据访问请求,并提取所述数据访问请求中的特征字段。
在本申请实施例中,所述数据访问请求指的是由EMS系统向服务器发出的物流数据请求,该数据访问请求以请求报文的形式发送至服务器中,并由服务器接收EMS系统所发送的数据访问请求,该请求报文为HTTP报文。所述特征字段指的是数据访问请求中的字段数据内容,例如,对于物流数据请求,特征字段可以包括邮编、物流订单号、物流时间等字段数据内容。
S402,对所述特征字段进行清洗,并将清洗后的特征字段进行规范化处理。
在本申请实施例中,对所述特征字段进行清洗指的是捕获特征字段中的空值字段,将空值字段加载或替换为特定数据。其中,根据空值字段可以进行不同数据库的分流;将清洗后的特征字段进行规范化处理,规范化处理的过程具体为:针对来自不同数据源的特征字段,有可能出现相同的特征字段属于不同的数据类型,但在实际上指的是同一个概念,例如:它们可能会是字符型或者数值型,针对这种情况,将对当前特征字段进行判断而不考虑其本身在数据源中所属于的数据类型,并使用数值型来代替字符型或者使用字符型来代替数值型。
S403,对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理。
在本申请实施例中,可使用C#语言的StringBuilder类中的append方法对请求报文中的特征字段进行拼接,StringBuilder类为C#语言中的可变字符序列类,使用append方法将特征字段按照先后顺序进行拼接,例如:对于特征字段“邮编、物流订单号、物流时间”,拼接方法为append(邮编,物流订单号,物流时间),最后输出的拼接结果为“邮编,物流订单号,物流时间”。hash算法是一个“链表散列”的数据结构,通过hash算法,会将特征字段数据进行取模运算,并将结果放在数组某个位置,即实现特征字段数据的定位存放。这个压缩不是原数据进行压缩还原,而是对元数据进行hash处理,变成一个唯一hashcode,后续还需要由hashcode来判断特征字段数据是否重复。hashcode是经过hash算 法获取的对象地址,或者字符串或数字算出来的int类型的数字,java中调用Object.hashcode()即可获取。
S404,判断所述已压缩处理的特征字段是否为同类型字段。
S405,若所述已压缩处理的特征字段为同类型字段,对所述已压缩处理的特征字段进行分组。
在本申请实施例中,所谓同类型字段指的是字段所设置的类型属于同一种类型,例如,字段A、字段B均属于整型,字段C、字段D均属于浮点型等;若存在多个已压缩的特征字段,且多个已压缩的特征字段存在相同的子特征字段,则判断多个已压缩处理的特征字段为同类型字段,并进行分组,比如上述实际例子中,在多个已压缩的特征字段中,存在相同的子特征字段“邮编”,那么这些已压缩的特征字段就为同类型字段,并将这些同类型字段归为同一个分组,分组的方法为将对应的同类型字段存放至同一个list集合中。
S406,基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段。
在本申请实施例中,所述预设数据库集群为分布式redis数据库集群,在对已压缩处理的特征字段进行识别之前,需要预先设置数据库集群,并对预先设置的数据库集群进行初始化,初始化指的是预先向数据库集群中存储特征字段数据,本实施例中的预设数据库集群为分布式redis数据库集群,该分布式redis数据库集群为现有的基于内存的数据库,也是一种单线程高性能内存数据库。
基于预设数据库集群对已压缩处理的特征字段进行识别,在识别时需要调用setnx命令,根据setnx命令将已压缩处理的特征字段遍历预设数据库集群中存储的所有数据,然后由setnx命令返回一个返回值,再根据setnx命令返回的返回值来判断所述特征字段是否为重复字段,所谓重复字段指的是预设数据库集群中是否存在与已压缩处理的特征字段全部相同的特征字段数据,若有,则说明已压缩处理的特征字段为重复字段,若没有,则说明已压缩处理的特征字段不是重复字段。
S407,若所述特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示所述特征字段为正常字段。
在本申请实施例中,若setnx命令的返回值为1,则说明已压缩处理的特征 字段不是重复字段,并输出所述特征字段为正常字段的提示消息,提示消息可以以短信、消息推送的方式进行推送,最后将所述特征字段存储至分布式redis数据库集群中;若setnx命令的返回值为0,则说明已压缩处理的特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,其中,异常处理队列为分布式redis数据库集群专门为异常数据设置的处理窗口,对于重复的特征字段将存储至该处理窗口中进行处理操作,处理操作可以包括删除操作。
请参阅图6,图6为本申请另一实施例提供的一种数据去重方法的示意流程图。如图6所示,该方法包括以下步骤S501~S507。
S501,获取数据访问请求,并提取所述数据访问请求中的特征字段。
在本申请实施例中,所述数据访问请求指的是由EMS系统向服务器发出的物流数据请求,该数据访问请求以请求报文的形式发送至服务器中,并由服务器接收EMS系统所发送的数据访问请求,该请求报文为HTTP报文。所述特征字段指的是数据访问请求中的字段数据内容,例如,对于物流数据请求,特征字段可以包括邮编、物流订单号、物流时间等字段数据内容。
S502,对所述特征字段进行清洗,并将清洗后的特征字段进行规范化处理。
在本申请实施例中,对所述特征字段进行清洗指的是捕获特征字段中的空值字段,将空值字段加载或替换为特定数据。其中,根据空值字段可以进行不同数据库的分流;将清洗后的特征字段进行规范化处理,规范化处理的过程具体为:针对来自不同数据源的特征字段,有可能出现相同的特征字段属于不同的数据类型,但在实际上指的是同一个概念,例如:它们可能会是字符型或者数值型,针对这种情况,将对当前特征字段进行判断而不考虑其本身在数据源中所属于的数据类型,并使用数值型来代替字符型或者使用字符型来代替数值型。
S503,对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理。
在本申请实施例中,可使用C#语言的StringBuilder类中的append方法对请求报文中的特征字段进行拼接,StringBuilder类为C#语言中的可变字符序列类,使用append方法将特征字段按照先后顺序进行拼接,例如:对于特征字段“邮编、物流订单号、物流时间”,拼接方法为append(邮编,物流订单号,物流时间),最后输出的拼接结果为“邮编,物流订单号,物流时间”。hash算法是一个 “链表散列”的数据结构,通过hash算法,会将特征字段数据进行取模运算,并将结果放在数组某个位置,即实现特征字段数据的定位存放。这个压缩不是原数据进行压缩还原,而是对元数据进行hash处理,变成一个唯一hashcode,后续还需要由hashcode来判断特征字段数据是否重复。hashcode是经过hash算法获取的对象地址,或者字符串或数字算出来的int类型的数字,java中调用Object.hashcode()即可获取。
S504,基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段。
在本申请实施例中,所述预设数据库集群为分布式redis数据库集群,在对已压缩处理的特征字段进行识别之前,需要预先设置数据库集群,并对预先设置的数据库集群进行初始化,初始化指的是预先向数据库集群中存储特征字段数据,本实施例中的预设数据库集群为分布式redis数据库集群,该分布式redis数据库集群为现有的基于内存的数据库,也是一种单线程高性能内存数据库。
基于预设数据库集群对已压缩处理的特征字段进行识别,在识别时需要调用setnx命令,根据setnx命令将已压缩处理的特征字段遍历预设数据库集群中存储的所有数据,然后由setnx命令返回一个返回值,再根据setnx命令返回的返回值来判断所述特征字段是否为重复字段,所谓重复字段指的是预设数据库集群中是否存在与已压缩处理的特征字段全部相同的特征字段数据,若有,则说明已压缩处理的特征字段为重复字段,若没有,则说明已压缩处理的特征字段不是重复字段。
S505,若所述特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示所述特征字段为正常字段。
在本申请实施例中,若setnx命令的返回值为1,则说明已压缩处理的特征字段不是重复字段,并输出所述特征字段为正常字段的提示消息,提示消息可以以短信、消息推送的方式进行推送,最后将所述特征字段存储至分布式redis数据库集群中;若setnx命令的返回值为0,则说明已压缩处理的特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,其中,异常处理队列为分布式redis数据库集群专门为异常数据设置的处理窗口,对于重复的特征字段将存储至该处理窗口中进行处理操作,处理操作可以包括删除操作。
S506,预先设置所述预设数据库集群的数据更新时长。
在本申请实施例中,所述数据更新时长可以是1分钟,可以是2分钟,也可以是其他预设时长,所述预设时长的具体数值在此不作限定,可以根据实际需要进行设置。
S507,若所述特征字段存储至所述预设数据库集群中的时长超过预设的数据更新时长,删除所述特征字段。
在本申请实施例中,若所述特征字段存储至所述预设数据库集群中的时长超过预设的数据更新时长,触发预设数据库集群中的删除指令并通过执行该删除指令以删除所述特征字段。通过设置预设数据库集群存储特征字段的数据更新时长,能够定期更新预设数据库集群中的字段数据,并保证数据库集群中的字段数据按期更新。
请参阅图7,对应上述一种数据去重方法,本申请实施例还提出一种数据去重装置,该装置100包括:获取单元101、处理单元102、拼接处理单元103、识别判断单元104、存储输出单元105。
其中,获取单元101,用于获取数据访问请求,并提取所述数据访问请求中的特征字段;处理单元102,用于对所述特征字段进行清洗,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理;拼接处理单元103,用于对所述特征字段进行拼接,将拼接后的特征字段使用hash算法进行压缩处理;识别判断单元104,用于基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段;存储输出单元105,用于若所述特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示所述特征字段为正常字段。
由以上可见,本申请实施例通过获取数据访问请求,并提取数据访问请求中的特征字段;对特征字段进行清洗,并将清洗后的特征字段进行规范化处理;对特征字段进行拼接,生成特征字段组合,对特征字段组合使用hash算法进行压缩处理;基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断特征字段是否为重复字段;若特征字段为重复字段,将特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示特征字段为正常字段。本申请基于数据处理提供一种数据去重方法,能够减轻大量重复数据对数据库资源的消耗,节省数据库内存空间,进一步地,能够降低用 户的投诉率,改善企业的声誉。
请参阅图8,所述获取单元101,包括:解析单元101a,用于获取数据访问请求,对所述数据访问请求进行解析;获取子单元101b,用于根据所述解析结果获取所述数据访问请求中的特征字段。
请参阅图9,所述拼接处理单元103,包括:拼接单元103a,用于使用C#语言的StringBuilder类中的append方法对所述特征字段进行拼接;运算单元103b,用于使用hash算法对拼接后的特征字段进行取模运算;存放单元103c,用于获取运算结果,并将运算结果进行定位存放以完成压缩处理。
请参阅图10,所述处理单元102,包括:捕获单元102a,用于捕获所述特征字段中的空值字段;处理子单元102b,用于使用特定数据对所述空值字段进行加载或替换。
请参阅图11,所述识别判断单元104,包括:识别判断子单元104a,用于根据setnx命令将已压缩处理的特征字段遍历所述预设数据库集群中存储的所有数据,并由setnx命令返回一个返回值,再根据setnx命令返回的返回值来判断所述特征字段是否为重复字段。
请参阅图12,对应上述一种数据去重方法,本申请实施例还提出一种数据去重装置,该装置200包括:获取单元201、处理单元202、拼接处理单元203、判断单元204、分组单元205、识别判断单元206、存储输出单元207。
其中,获取单元201,用于获取数据访问请求,并提取所述数据访问请求中的特征字段;处理单元202,用于对所述特征字段进行清洗,并将清洗后的特征字段进行规范化处理;拼接处理单元203,用于对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理;判断单元204,用于判断所述已压缩处理的特征字段是否为同类型字段;分组单元205,用于若所述已压缩处理的特征字段为同类型字段,对所述已压缩处理的特征字段进行分组;识别判断单元206,用于基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段;存储输出单元207,用于若所述特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示所述特征字段为正常字段。
请参阅图13,对应上述一种数据去重方法,本申请实施例还提出一种数据去重装置,该装置300包括:获取单元301、处理单元302、拼接处理单元303、 识别判断单元304、存储输出单元305、预先设置单元306、删除单元307。
其中,获取单元301,用于获取数据访问请求,并提取所述数据访问请求中的特征字段;处理单元302,对所述特征字段进行清洗,并将清洗后的特征字段进行规范化处理;拼接处理单元303,用于对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理;识别判断单元304,用于基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段;存储输出单元305,用于若所述特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示所述特征字段为正常字段;预先设置单元306,用于预先设置所述预设数据库集群的数据更新时长;删除单元307,用于若所述特征字段存储至所述预设数据库集群中的时长超过预设的数据更新时长,删除所述特征字段。
上述数据去重装置与上述数据去重方法一一对应,其具体的原理和过程与上述实施例所述方法相同,不再赘述。
上述数据去重装置可以实现为一种计算机程序的形式,计算机程序可以在如图14所示的计算机设备上运行。
图14为本申请一种计算机设备的结构组成示意图。该设备可以是终端,也可以是服务器,其中,终端可以是智能手机、平板电脑、笔记本电脑、台式电脑、个人数字助理和穿戴式装置等具有通信功能和语音输入功能的电子装置。服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。参照图12,该计算机设备500包括通过系统总线501连接的处理器502、非易失性存储介质503、内存储器504和网络接口505。其中,该计算机设备500的非易失性存储介质503可存储操作系统5031和计算机程序5032,该计算机程序5032被执行时,可使得处理器502执行一种数据去重方法。该计算机设备500的处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序被处理器执行时,可使得处理器502执行一种数据去重方法。计算机设备500的网络接口505用于进行网络通信。本领域技术人员可以理解,图14中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所 示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器502执行所述计算机程序时实现如下操作:获取数据访问请求,并提取所述数据访问请求中的特征字段;对所述特征字段进行清洗,并将清洗后的特征字段进行规范化处理;对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理;基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段;若所述特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示所述特征字段为正常字段。
在一个实施例中,所述处理器502执行所述获取数据访问请求,并提取所述数据访问请求中的特征字段的步骤时具体执行如下操作:获取数据访问请求,对所述数据访问请求进行解析;根据所述解析结果获取所述数据访问请求中的特征字段。
在一个实施例中,所述处理器502执行所述对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理的步骤时具体执行如下操作:使用C#语言的StringBuilder类中的append方法对所述特征字段进行拼接;使用hash算法对拼接后的特征字段进行取模运算;获取运算结果,并将运算结果进行定位存放以完成压缩处理。
在一个实施例中,所述处理器502执行所述计算机程序时还实现如下操作:判断所述已压缩处理的特征字段是否为同类型字段;若所述已压缩处理的特征字段为同类型字段,对所述已压缩处理的特征字段进行分组。
在一个实施例中,所述处理器502执行所述计算机程序时还实现如下操作:预先设置所述预设数据库集群的数据更新时长;若所述特征字段存储至所述预设数据库集群中的时长超过预设的数据更新时长,删除所述特征字段。
在一个实施例中,所述处理器502执行所述对所述特征字段进行清洗的步骤时具体执行如下操作:捕获所述特征字段中的空值字段;使用特定数据对所述空值字段进行加载或替换。
在一个实施例中,所述处理器502执行所述基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段的步骤时具体执行如下操作:根据setnx命令将已压缩处理的特征字段遍历所述 预设数据库集群中存储的所有数据,并由setnx命令返回一个返回值,再根据setnx命令返回的返回值来判断所述特征字段是否为重复字段。
本领域技术人员可以理解,图14中示出的计算机设备的实施例并不构成对计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图14所示实施例一致,在此不再赘述。
本申请提供了一种计算机可读存储介质,计算机可读存储介质存储有一个或者一个以上计算机程序,所述一个或者一个以上计算机程序可被一个或者一个以上的处理器执行,以实现本申请中的数据去重方法的任意实施例。
本申请前述的存储介质包括:磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等各种可以存储程序代码的介质。
本申请所有实施例中的单元可以通过通用集成电路,例如CPU(Central Processing Unit,中央处理器),或通过ASIC(Application Specific Integrated Circuit,专用集成电路)来实现。本申请实施例数据去重方法中的步骤可以根据实际需要进行顺序调整、合并和删减。本申请实施例数据去重装置中的单元可以根据实际需要进行合并、划分和删减。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种数据去重方法,所述方法包括:
    获取数据访问请求,并提取所述数据访问请求中的特征字段;
    对所述特征字段进行清洗,并将清洗后的特征字段进行规范化处理;
    对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理;
    基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段;
    若所述特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示所述特征字段为正常字段。
  2. 如权利要求1所述的方法,其中,所述获取数据访问请求,并提取所述数据访问请求中的特征字段,包括:
    获取数据访问请求,对所述数据访问请求进行解析;
    根据所述解析结果获取所述数据访问请求中的特征字段。
  3. 如权利要求1所述的方法,其中,所述对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理,包括:
    使用C#语言的StringBuilder类中的append方法对所述特征字段进行拼接;
    使用hash算法对拼接后的特征字段进行取模运算;
    获取运算结果,并将运算结果进行定位存放以完成压缩处理。
  4. 如权利要求1所述的方法,其中,所述基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段的步骤之前,所述方法还包括:
    判断所述已压缩处理的特征字段是否为同类型字段;
    若所述已压缩处理的特征字段为同类型字段,对所述已压缩处理的特征字段进行分组。
  5. 如权利要求1所述的方法,其中,所述方法还包括:
    预先设置所述预设数据库集群的数据更新时长;
    若所述特征字段存储至所述预设数据库集群中的时长超过预设的数据更新时长,删除所述特征字段。
  6. 如权利要求1所述的方法,其中,所述对所述特征字段进行清洗,包括:
    捕获所述特征字段中的空值字段;
    使用特定数据对所述空值字段进行加载或替换。
  7. 如权利要求1所述的方法,其中,所述基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段,包括:
    根据setnx命令将已压缩处理的特征字段遍历所述预设数据库集群中存储的所有数据,并由setnx命令返回一个返回值,再根据setnx命令返回的返回值来判断所述特征字段是否为重复字段。
  8. 一种数据去重装置,其中,所述装置包括:
    获取单元,用于获取数据访问请求,并提取所述数据访问请求中的特征字段;
    处理单元,用于对所述特征字段进行清洗,并将清洗后的特征字段进行规范化处理;
    拼接处理单元,用于对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理;
    识别判断单元,用于基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段;
    存储输出单元,用于若所述特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示所述特征字段为正常字段。
  9. 如权利要求8所述的装置,其中,所述获取单元,包括:
    解析单元,用于获取数据访问请求,对所述数据访问请求进行解析;
    获取子单元,用于根据所述解析结果获取所述数据访问请求中的特征字段。
  10. 如权利要求8所述的装置,其中,所述拼接处理单元,包括:
    拼接单元,用于使用C#语言的StringBuilder类中的append方法对所述特征字段进行拼接;
    运算单元,用于使用hash算法对拼接后的特征字段进行取模运算;
    存放单元,用于获取运算结果,并将运算结果进行定位存放以完成压缩处理。
  11. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如下操作:
    获取数据访问请求,并提取所述数据访问请求中的特征字段;
    对所述特征字段进行清洗,并将清洗后的特征字段进行规范化处理;
    对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理;
    基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段;
    若所述特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示所述特征字段为正常字段。
  12. 如权利要求11所述的计算机设备,其中,所述获取数据访问请求,并提取所述数据访问请求中的特征字段,包括:
    获取数据访问请求,对所述数据访问请求进行解析;
    根据所述解析结果获取所述数据访问请求中的特征字段。
  13. 如权利要求11所述的计算机设备,其中,所述对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理,包括:
    使用C#语言的StringBuilder类中的append方法对所述特征字段进行拼接;
    使用hash算法对拼接后的特征字段进行取模运算;
    获取运算结果,并将运算结果进行定位存放以完成压缩处理。
  14. 如权利要求11所述的计算机设备,其中,所述处理器执行所述计算机程序时还实现如下操作:
    判断所述已压缩处理的特征字段是否为同类型字段;
    若所述已压缩处理的特征字段为同类型字段,对所述已压缩处理的特征字段进行分组。
  15. 如权利要求11所述的计算机设备,其中,所述处理器执行所述计算机程序时还实现如下操作:
    预先设置所述预设数据库集群的数据更新时长;
    若所述特征字段存储至所述预设数据库集群中的时长超过预设的数据更新时长,删除所述特征字段。
  16. 如权利要求11所述的计算机设备,其中,所述对所述特征字段进行清洗,包括:
    捕获所述特征字段中的空值字段;
    使用特定数据对所述空值字段进行加载或替换。
  17. 如权利要求11所述的计算机设备,其中,所述基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段,包括:
    根据setnx命令将已压缩处理的特征字段遍历所述预设数据库集群中存储的所有数据,并由setnx命令返回一个返回值,再根据setnx命令返回的返回值来判断所述特征字段是否为重复字段。
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有一个或者一个以上计算机程序,所述一个或者一个以上计算机程序可被一个或者一个以上的处理器执行,以实现如下步骤:
    获取数据访问请求,并提取所述数据访问请求中的特征字段;
    对所述特征字段进行清洗,并将清洗后的特征字段进行规范化处理;
    对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理;
    基于预设数据库集群对已压缩处理的特征字段进行识别,并根据识别结果判断所述特征字段是否为重复字段;
    若所述特征字段为重复字段,将所述特征字段存储至预设的异常处理队列中,否则输出提示消息,所述提示消息用于提示所述特征字段为正常字段。
  19. 如权利要求18所述的计算机可读存储介质,其中,所述获取数据访问请求,并提取所述数据访问请求中的特征字段,包括:
    获取数据访问请求,对所述数据访问请求进行解析;
    根据所述解析结果获取所述数据访问请求中的特征字段。
  20. 如权利要求18所述的计算机可读存储介质,其中,所述对所述特征字段进行拼接,生成特征字段组合,对所述特征字段组合使用hash算法进行压缩处理,包括:
    使用C#语言的StringBuilder类中的append方法对所述特征字段进行拼接;
    使用hash算法对拼接后的特征字段进行取模运算;
    获取运算结果,并将运算结果进行定位存放以完成压缩处理。
PCT/CN2019/103388 2019-05-30 2019-08-29 数据去重方法、装置、计算机设备以及存储介质 WO2020237878A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910461945.6 2019-05-30
CN201910461945.6A CN110334086A (zh) 2019-05-30 2019-05-30 数据去重方法、装置、计算机设备以及存储介质

Publications (1)

Publication Number Publication Date
WO2020237878A1 true WO2020237878A1 (zh) 2020-12-03

Family

ID=68140493

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103388 WO2020237878A1 (zh) 2019-05-30 2019-08-29 数据去重方法、装置、计算机设备以及存储介质

Country Status (2)

Country Link
CN (1) CN110334086A (zh)
WO (1) WO2020237878A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117436496A (zh) * 2023-11-22 2024-01-23 深圳市网安信科技有限公司 基于大数据日志的异常检测模型的训练方法及检测方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339070A (zh) * 2020-02-20 2020-06-26 上海二三四五网络科技有限公司 一种订单重复提交的控制方法及装置
CN112436943B (zh) * 2020-10-29 2022-11-08 南阳理工学院 基于大数据的请求去重方法、装置、设备及存储介质
CN112597138A (zh) * 2020-12-10 2021-04-02 浙江岩华文化科技有限公司 数据去重方法、装置、计算机设备和计算机可读存储介质
CN112906005A (zh) * 2021-02-02 2021-06-04 浙江大华技术股份有限公司 Web漏洞扫描方法、装置、系统、电子装置和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1677216A2 (en) * 2005-01-04 2006-07-05 International Business Machines Corporation A method for reducing a data repository
CN102298633A (zh) * 2011-09-08 2011-12-28 厦门市美亚柏科信息股份有限公司 一种分布式海量数据排重方法及系统
CN107688591A (zh) * 2017-04-06 2018-02-13 平安科技(深圳)有限公司 一种精算处理方法和装置
CN108090064A (zh) * 2016-11-21 2018-05-29 腾讯科技(深圳)有限公司 一种数据查询方法、装置、数据存储服务器及系统
CN108280227A (zh) * 2018-01-26 2018-07-13 北京奇虎科技有限公司 基于缓存的数据信息处理方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8805795B2 (en) * 2011-06-20 2014-08-12 Bank Of America Corporation Identifying duplicate messages in a database
CN102591855A (zh) * 2012-01-13 2012-07-18 广州从兴电子开发有限公司 一种数据标识方法及系统
CN104038450B (zh) * 2013-03-04 2017-09-19 华为技术有限公司 基于pcie总线的报文传输方法与装置
CN108804242B (zh) * 2018-05-23 2022-03-22 武汉斗鱼网络科技有限公司 一种数据计数去重方法、系统、服务器及存储介质
CN109542854B (zh) * 2018-11-14 2020-11-24 网易(杭州)网络有限公司 数据压缩方法、装置、介质及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1677216A2 (en) * 2005-01-04 2006-07-05 International Business Machines Corporation A method for reducing a data repository
CN102298633A (zh) * 2011-09-08 2011-12-28 厦门市美亚柏科信息股份有限公司 一种分布式海量数据排重方法及系统
CN108090064A (zh) * 2016-11-21 2018-05-29 腾讯科技(深圳)有限公司 一种数据查询方法、装置、数据存储服务器及系统
CN107688591A (zh) * 2017-04-06 2018-02-13 平安科技(深圳)有限公司 一种精算处理方法和装置
CN108280227A (zh) * 2018-01-26 2018-07-13 北京奇虎科技有限公司 基于缓存的数据信息处理方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117436496A (zh) * 2023-11-22 2024-01-23 深圳市网安信科技有限公司 基于大数据日志的异常检测模型的训练方法及检测方法

Also Published As

Publication number Publication date
CN110334086A (zh) 2019-10-15

Similar Documents

Publication Publication Date Title
WO2020237878A1 (zh) 数据去重方法、装置、计算机设备以及存储介质
WO2019227689A1 (zh) 数据监控方法、装置、计算机设备及存储介质
CN109240886B (zh) 异常处理方法、装置、计算机设备以及存储介质
US9355250B2 (en) Method and system for rapidly scanning files
CN111949710B (zh) 数据存储方法、装置、服务器及存储介质
CN110019263B (zh) 信息存储方法和装置
CN111241125A (zh) 一种记录操作日志的方法、装置、电子设备和存储介质
CN110928934A (zh) 一种用于业务分析的数据处理方法和装置
CN108415998B (zh) 应用依赖关系更新方法、终端、设备及存储介质
CN113254320A (zh) 记录用户网页操作行为的方法及装置
CN111782728A (zh) 一种数据同步方法、装置、电子设备及介质
US20140236987A1 (en) System and method for audio signal collection and processing
CN113590447B (zh) 埋点处理方法和装置
WO2019062087A1 (zh) 考勤数据测试方法、终端、设备以及计算机可读存储介质
CN114064803A (zh) 一种数据同步方法和装置
CN108037950A (zh) 一种信息删除方法、装置、电子设备及可读存储介质
CN112241332B (zh) 一种接口补偿的方法和装置
US9465876B2 (en) Managing content available for content prediction
CN109087097B (zh) 一种更新链码同一标识的方法和装置
CN114090397A (zh) 一种告警信息处理方法和装置
CN112817782A (zh) 一种数据采集上报方法、装置、电子设备和存储介质
CN113761433A (zh) 业务处理方法和装置
CN112910855A (zh) 一种样例报文处理方法及装置
CN112699116A (zh) 一种数据处理方法和系统
US20200257662A1 (en) Junk Feature Acquisition Method, Apparatus, Server and Readable Storage Medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19930777

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19930777

Country of ref document: EP

Kind code of ref document: A1