CN113377812B - Order duplicate removal method and device for big data - Google Patents

Order duplicate removal method and device for big data Download PDF

Info

Publication number
CN113377812B
CN113377812B CN202110027862.3A CN202110027862A CN113377812B CN 113377812 B CN113377812 B CN 113377812B CN 202110027862 A CN202110027862 A CN 202110027862A CN 113377812 B CN113377812 B CN 113377812B
Authority
CN
China
Prior art keywords
bloom filter
database
file name
cache
bill file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110027862.3A
Other languages
Chinese (zh)
Other versions
CN113377812A (en
Inventor
唐明
谭吉湘
杨陆
王晓宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Data Driven Technology Co ltd
Original Assignee
Beijing Data Driven Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Data Driven Technology Co ltd filed Critical Beijing Data Driven Technology Co ltd
Priority to CN202110027862.3A priority Critical patent/CN113377812B/en
Publication of CN113377812A publication Critical patent/CN113377812A/en
Application granted granted Critical
Publication of CN113377812B publication Critical patent/CN113377812B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/164File meta data generation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a big data order deduplication method and device, which are applied to a server and comprise the following steps: receiving order information sent by a client, wherein the order information comprises a key field, a bill file name and order data; judging whether a bloom filter exists in the cache according to the key field; if not, loading a bloom filter; if so, judging whether the bill file name exists in the bloom filter; if not, storing the order data in a database and cache and adding the billing filename to the bloom filter; if the bill file name exists, confirming the bill file name through the cache and the database to obtain a confirmation result; the key fields comprise hardware identification and sales time, memory occupation can be reduced through a bloom filter, and the aim of efficiently removing duplicate data is fulfilled.

Description

Order duplicate removal method and device for big data
Technical Field
The invention relates to the technical field of deduplication, in particular to a method and a device for order deduplication of big data.
Background
In recent years, with the rapid development of the internet and information industry, data generated each year grows exponentially, and due to the complexity of internet service, repeated information submission by users, retries of clients, and upstream service failures, etc., repeated data uploading may be caused.
In order to avoid data disorder caused by repeated uploading, a buffer layer is added, and a unique identification field of data is stored in the buffer layer. Firstly, inquiring the cache layer, and if the unique identification field of the data can be inquired, repeating the data; if the data is not found, the database is queried for the unique identification field of the data to confirm. Under the circumstance of large data volume, the duplication elimination method can cause the unique identification field of the data stored in the cache to occupy a large amount of memory resources, so that the cost is relatively high.
Disclosure of Invention
Therefore, the present invention aims to provide a method and a device for order deduplication of big data, which can reduce memory occupation through a bloom filter and achieve the purpose of efficient deduplication of data.
In a first aspect, an embodiment of the present invention provides a method for de-duplication of orders of big data, applied to a server, where the method includes:
Receiving order information sent by a client, wherein the order information comprises a key field, a bill file name and order data;
Judging whether a bloom filter exists in the cache according to the key field;
If not, loading the bloom filter;
If so, judging whether the bill file name exists in the bloom filter;
if not, storing the order data in a database and the cache, and adding the billing filename to the bloom filter;
If yes, confirming the bill file name through the cache and the database to obtain a confirmation result;
wherein the key fields include hardware identification and sales time.
Further, the step of confirming the bill file name through the cache and the database to obtain a confirmation result includes:
inquiring whether the bill file name exists in the cache;
if yes, discarding the order information;
if not, inquiring whether the bill file name exists in the database;
if yes, discarding the order information;
If not, storing the order data into the database and the cache, adding the bill file name into the bloom filter, and sending response information of successful warehousing to the client.
Further, the loading the bloom filter includes:
Judging whether the persistence information of the bloom filter exists in the database according to the key field;
if so, acquiring the time of updating the bloom filter last in the persistence information;
taking the time of updating the bloom filter last as a starting time;
Searching all increment orders from the starting time to the current time in the database, and adding the bill file names corresponding to all increment orders into the bloom filter;
if not, all orders of the current day corresponding to the hardware identification are searched from the database, and all orders of the current day are added into the bloom filter.
Further, the current time is the corresponding time when the bloom filter is searched from the database.
Further, the method further comprises:
and storing order information of the bloom filter into the database within a preset time interval.
In a second aspect, an embodiment of the present invention provides a big data order deduplication apparatus, applied to a server, where the apparatus includes:
The receiving unit is used for receiving order information sent by the client, wherein the order information comprises a key field, a bill file name and order data;
the first judging unit is used for judging whether a bloom filter exists in the cache according to the key field;
a loading unit for loading the bloom filter in the absence;
A second judging unit, configured to judge whether the bill file name exists in the bloom filter in the presence of the bill file name;
A storage unit for storing the order data in a database and the cache, and adding the bill filename to the bloom filter if not present;
the confirming unit is used for confirming the bill file name through the cache and the database under the condition of existence, so as to obtain a confirming result;
wherein the key fields include hardware identification and sales time.
Further, the confirmation unit is specifically configured to:
inquiring whether the bill file name exists in the cache;
if yes, discarding the order information;
if not, inquiring whether the bill file name exists in the database;
if yes, discarding the order information;
If not, storing the order data into the database and the cache, adding the bill file name into the bloom filter, and sending response information of successful warehousing to the client.
Further, the loading unit is specifically configured to:
Judging whether the persistence information of the bloom filter exists in the database according to the key field;
if so, acquiring the time of updating the bloom filter last in the persistence information;
taking the time of updating the bloom filter last as a starting time;
Searching all increment orders from the starting time to the current time in the database, and adding the bill file names corresponding to all increment orders into the bloom filter;
if not, all orders of the current day corresponding to the hardware identification are searched from the database, and all orders of the current day are added into the bloom filter.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, and a processor, where the memory stores a computer program executable on the processor, and where the processor implements a method as described above when executing the computer program.
In a fourth aspect, embodiments of the present invention provide a computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method as described above.
The embodiment of the invention provides a big data order deduplication method and device, which are applied to a server and comprise the following steps: receiving order information sent by a client, wherein the order information comprises a key field, a bill file name and order data; judging whether a bloom filter exists in the cache according to the key field; if not, loading a bloom filter; if so, judging whether the bill file name exists in the bloom filter; if not, storing the order data in a database and cache and adding the billing filename to the bloom filter; if the bill file name exists, confirming the bill file name through the cache and the database to obtain a confirmation result; the key fields comprise hardware identification and sales time, memory occupation can be reduced through a bloom filter, and the aim of efficiently removing duplicate data is fulfilled.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a big data order deduplication method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for loading bloom filters according to an embodiment of the present invention;
Fig. 3 is a schematic diagram of an order deduplication apparatus for big data according to a second embodiment of the present invention.
Icon:
1-a receiving unit; 2-a first judgment unit; a 3-loading unit; 4-a second judgment unit; a 5-memory unit; 6-a confirmation unit.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In recent years, with the rapid development of the internet and information industry, data generated each year grows exponentially, and due to the complexity of internet service, repeated information submission by users, retries of clients, and upstream service failures, etc., repeated data uploading may be caused.
In order to avoid data disorder caused by repeated uploading, a buffer layer is added, and a unique identification field of data is stored in the buffer layer. Firstly, inquiring the cache layer, and if the unique identification field of the data can be inquired, repeating the data; if the data is not found, the database is queried for the unique identification field of the data to confirm. Under the circumstance of large data volume, the duplication elimination method can cause the unique identification field of the data stored in the cache to occupy a large amount of memory resources, so that the cost is relatively high.
The application reduces the memory occupation through the bloom filter and achieves the purpose of efficiently removing the duplicate data.
The bloom filter is a bit array with all the positions being 0 and a series of hash functions in an initial state, and can achieve the purposes of occupying a small amount of space and efficiently removing duplicate data. It maps the key field to a certain position in the bit array by means of a hash function, sets the position to 1, and identifies the presence of the key field.
When the hash functions map different key fields to the same position of the bit array, conflicts may be caused, so that a plurality of hash functions are adopted to map the key fields to a plurality of positions of the bit array, when whether the key fields exist or not is confirmed, whether the positions mapped by all the hash functions are 1 or not needs to be judged, if one position is not 1, the fact that the key fields do not exist can be determined, and otherwise, the existence of the key fields is possible. When the possible existence exists, a certain misjudgment rate exists, and the buffer memory or the database needs to be further queried for confirmation.
In order to facilitate understanding of the present embodiment, the following describes embodiments of the present invention in detail.
Embodiment one:
fig. 1 is a flowchart of an order deduplication method for big data according to an embodiment of the present invention.
Referring to fig. 1, the execution subject is a server, and the method includes the steps of:
step S101, receiving order information sent by a client, wherein the order information comprises a key field, a bill file name and order data;
Step S102, judging whether a bloom filter exists in a cache according to the key field; if not, executing step S103; if so, executing step S104;
step S103, loading a bloom filter;
Specifically, the server receives order information sent by the client, the order information including a key field, a billing file name, and order data, the key field including a hardware identification (MAC) and a sales time. The bill file name comprises a hardware identification (MAC) and an interception time, wherein the interception time is the time for a client to collect data, and the unit is ms. The hardware identification (MAC) typically includes 12 bits of data and capital letters, and the key fields of the bloom filter, such as BBBB5D3ED72320200718, are obtained after the hardware identification and sales time are concatenated.
When uploading an order, the client sends order information to the server, the server judges whether a bloom filter exists in the cache according to the key field, if so, the client judges whether the bill file name exists in the bloom filter, if not, the order is not repeated, and the client can directly store the order, otherwise, the server further inquires the cache and confirms the database. If no bloom filter is present, the bloom filter needs to be loaded.
Step S104, judging whether the bill file name exists in the bloom filter; if not, executing step S105; if so, executing step S106;
Step S105, storing order data into a database and a cache, and adding a bill file name into a bloom filter;
And step S106, confirming the bill file name through the cache and the database to obtain a confirmation result.
Further, step S106 includes the steps of:
Step S201, inquiring whether a bill file name exists in the cache; if so, step S202 is performed; if not, executing step S203;
Step S202, discarding the order information;
Here, if the bill file name is queried in the cache, it is indicated that this is a repeated order, discard processing is required, and processing information is returned to the client; if no bill file name is queried in the cache, further validation in the database is required.
Step S203, inquiring whether a bill file name exists from a database; if so, step S204 is performed; if not, then step S205 is performed;
step S204, discarding the order information;
step S205, the order data is stored in a database and a cache, the bill file name is added in a bloom filter, and response information of successful warehousing is sent to the client.
Here, if the bill file name can be queried in the database, it is indicated that this is a repeated bill, discard processing is required, and processing information is returned to the client; if the bill file name is not inquired in the database, the bloom filter is indicated to have misjudgment, the new order is obtained, the order data is stored in the database and the cache, the bill file name is added into the bloom filter, and response information of successful warehousing is sent to the client.
Further, referring to fig. 2, step S103 includes the steps of:
step S301, judging whether persistence information of a bloom filter exists in a database according to the key field; if so, step S302 is performed; if not, then step S305 is performed;
step S302, obtaining the last time of updating the bloom filter in the persistence information;
step S303, taking the time of last updating the bloom filter as the starting time;
step S304, searching all increment orders from the starting time to the current time in a database, and adding bill file names corresponding to all increment orders into a bloom filter;
In step S305, all orders of the same day corresponding to the hardware identifier are searched from the database, and all orders of the same day are added into the bloom filter.
Specifically, the bloom filter is stored in the memory, and when the server is restarted, the data of the bloom filter is lost; and the identification of the bill file name of the order put in storage before restarting is not saved in the newly built bloom filter after restarting the server, so that judgment errors can be caused.
Thus, the order information for the bloom filter is stored in the database for a preset time interval, and the time to last update the bloom filter is added to the order information for the persistent bloom filter. Inquiring the persistence information in the database when the bloom filter is loaded after the server is restarted, and acquiring the time of updating the bloom filter last in the persistence information if the persistence information is inquired; taking the time of last updating the bloom filter as the starting time; searching all increment orders from the starting time to the current time in a database, and adding bill file names corresponding to all increment orders into a bloom filter; if the order is not searched in the database, searching all orders of the current day corresponding to the hardware identification from the database, and adding all orders of the current day into the bloom filter.
Further, the current time is the corresponding time when the bloom filter is searched from the database.
Further, the method further comprises:
And storing order information of the bloom filter into a database within a preset time interval.
Specifically, the order information of the bloom filter is stored in a database within a preset time interval, namely, key fields of the bloom filter, the bloom filter after serialization, the last update time of the bloom filter and the like are stored in the database.
The embodiment of the invention provides a big data order deduplication method, which is applied to a server and comprises the following steps: receiving order information sent by a client, wherein the order information comprises a key field, a bill file name and order data; judging whether a bloom filter exists in the cache according to the key field; if not, loading a bloom filter; if so, judging whether the bill file name exists in the bloom filter; if not, storing the order data in a database and cache and adding the billing filename to the bloom filter; if the bill file name exists, confirming the bill file name through the cache and the database to obtain a confirmation result; the key fields comprise hardware identification and sales time, memory occupation can be reduced through a bloom filter, and the aim of efficiently removing duplicate data is fulfilled.
Embodiment two:
Fig. 3 is a schematic diagram of an order deduplication apparatus for big data according to a second embodiment of the present invention.
Referring to fig. 3, applied to a server, the apparatus includes:
The receiving unit 1 is used for receiving order information sent by the client, wherein the order information comprises a key field, a bill file name and order data;
A first judging unit 2, configured to judge whether a bloom filter exists in the cache according to the key field;
A loading unit 3 for loading the bloom filter in the absence;
A second judging unit 4 for judging whether the bill file name exists in the bloom filter in the case of existence;
A storage unit 5 for storing order data in a database and a cache, and adding a billing filename to the bloom filter in the absence;
a confirmation unit 6, configured to confirm the bill file name through the cache and the database in the presence of the bill file name, to obtain a confirmation result;
wherein the key fields include hardware identification and sales time.
Further, the confirmation unit 6 is specifically configured to:
inquiring whether a bill file name exists in the cache;
if yes, discarding the order information;
if not, inquiring whether the bill file name exists in the database;
if yes, discarding the order information;
If not, the order data is stored in a database and a cache, and the bill file name is added in a bloom filter, and response information of successful warehousing is sent to the client.
Further, the loading unit 3 is specifically configured to:
judging whether persistence information of the bloom filter exists in the database according to the key field;
If so, acquiring the time of updating the bloom filter last in the persistence information;
taking the time of last updating the bloom filter as the starting time;
Searching all increment orders from the starting time to the current time in a database, and adding bill file names corresponding to all increment orders into a bloom filter;
if not, all orders of the current day corresponding to the hardware identification are searched from the database, and all orders of the current day are added into the bloom filter.
The embodiment of the invention provides a big data order deduplication device, which is applied to a server and comprises the following components: receiving order information sent by a client, wherein the order information comprises a key field, a bill file name and order data; judging whether a bloom filter exists in the cache according to the key field; if not, loading a bloom filter; if so, judging whether the bill file name exists in the bloom filter; if not, storing the order data in a database and cache and adding the billing filename to the bloom filter; if the bill file name exists, confirming the bill file name through the cache and the database to obtain a confirmation result; the key fields comprise hardware identification and sales time, memory occupation can be reduced through a bloom filter, and the aim of efficiently removing duplicate data is fulfilled.
The embodiment of the invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the order duplication elimination method of big data provided by the embodiment when executing the computer program.
The embodiment of the invention also provides a computer readable medium with non-volatile program code executable by a processor, wherein the computer readable medium stores a computer program, and the computer program executes the steps of the order duplication elimination method of big data in the embodiment when being executed by the processor.
The computer program product provided by the embodiment of the present invention includes a computer readable storage medium storing a program code, where instructions included in the program code may be used to perform the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment and will not be described herein.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.
In addition, in the description of embodiments of the present invention, unless explicitly stated and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. A method for de-duplication of orders for big data, applied to a server, the method comprising:
Receiving order information sent by a client, wherein the order information comprises a key field, a bill file name and order data;
Judging whether a bloom filter exists in the cache according to the key field;
Loading the bloom filter if the bloom filter is not present in the cache;
if the bloom filter exists in the cache, judging whether the bill file name exists in the bloom filter or not;
storing the order data in a database and the cache if the billing filename does not exist in the bloom filter, and adding the billing filename to the bloom filter;
if the bill file name exists in the bloom filter, confirming the bill file name through the cache and the database to obtain a confirmation result;
wherein the key field comprises a hardware identifier and a sales time;
said loading said bloom filter, comprising:
Judging whether the persistence information of the bloom filter exists in the database according to the key field;
If the persistence information of the bloom filter exists in the database, acquiring the time of updating the bloom filter last in the persistence information;
taking the time of updating the bloom filter last as a starting time;
Searching all increment orders from the starting time to the current time in the database, and adding the bill file names corresponding to all increment orders into the bloom filter;
if the persistence information of the bloom filter does not exist in the database, searching all orders of the same day corresponding to the hardware identifier from the database, and adding all orders of the same day into the bloom filter;
And the bill file name is confirmed through the cache and the database, and a confirmation result is obtained, wherein the confirmation result comprises the following steps:
inquiring whether the bill file name exists in the cache;
if the bill file name exists in the cache, discarding the order information;
If the bill file name does not exist in the cache, inquiring whether the bill file name exists in the database;
If the bill file name exists in the database, discarding the order information;
If the bill file name does not exist in the database, the order data are stored in the database and the cache, the bill file name is added into the bloom filter, and response information of successful warehousing is sent to the client.
2. The big data order deduplication method of claim 1, wherein the current time is a time corresponding to when the bloom filter is looked up from the database.
3. The big data order deduplication method of claim 1, further comprising:
and storing order information of the bloom filter into the database within a preset time interval.
4. An order deduplication device for big data, applied to a server, the device comprising:
The receiving unit is used for receiving order information sent by the client, wherein the order information comprises a key field, a bill file name and order data;
the first judging unit is used for judging whether a bloom filter exists in the cache according to the key field;
a loading unit, configured to load the bloom filter if the bloom filter does not exist in the cache;
the second judging unit is used for judging whether the bill file name exists in the bloom filter or not under the condition that the bloom filter exists in the cache;
a storage unit configured to store the order data in a database and the cache, and add the bill filename to the bloom filter, in a case where the bill filename does not exist in the bloom filter;
the confirming unit is used for confirming the bill file name through the cache and the database under the condition that the bill file name exists in the bloom filter, so as to obtain a confirming result;
wherein the key field comprises a hardware identifier and a sales time;
the loading unit is specifically configured to:
Judging whether the persistence information of the bloom filter exists in the database according to the key field;
If the persistence information of the bloom filter exists in the database, acquiring the time of updating the bloom filter last in the persistence information;
taking the time of updating the bloom filter last as a starting time;
Searching all increment orders from the starting time to the current time in the database, and adding the bill file names corresponding to all increment orders into the bloom filter;
if the persistence information of the bloom filter does not exist in the database, searching all orders of the same day corresponding to the hardware identifier from the database, and adding all orders of the same day into the bloom filter;
The confirmation unit is specifically configured to:
inquiring whether the bill file name exists in the cache;
if the bill file name exists in the cache, discarding the order information;
If the bill file name does not exist in the cache, inquiring whether the bill file name exists in the database;
If the bill file name exists in the database, discarding the order information;
If the bill file name does not exist in the database, the order data are stored in the database and the cache, the bill file name is added into the bloom filter, and response information of successful warehousing is sent to the client.
5. An electronic device comprising a memory, a processor, the memory having stored thereon a computer program executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1 to 3 when the computer program is executed.
6. A computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any one of claims 1 to 3.
CN202110027862.3A 2021-01-08 2021-01-08 Order duplicate removal method and device for big data Active CN113377812B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110027862.3A CN113377812B (en) 2021-01-08 2021-01-08 Order duplicate removal method and device for big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110027862.3A CN113377812B (en) 2021-01-08 2021-01-08 Order duplicate removal method and device for big data

Publications (2)

Publication Number Publication Date
CN113377812A CN113377812A (en) 2021-09-10
CN113377812B true CN113377812B (en) 2024-06-18

Family

ID=77569582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110027862.3A Active CN113377812B (en) 2021-01-08 2021-01-08 Order duplicate removal method and device for big data

Country Status (1)

Country Link
CN (1) CN113377812B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536034A (en) * 2021-09-17 2021-10-22 飞狐信息技术(天津)有限公司 Data writing method and data reading method based on bloom filter
CN114048201A (en) * 2021-11-16 2022-02-15 北京锐安科技有限公司 Distributed stream computing engine Flink-based key field real-time deduplication method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781386A (en) * 2019-10-10 2020-02-11 支付宝(杭州)信息技术有限公司 Information recommendation method and device, and bloom filter creation method and device
CN111143720A (en) * 2018-11-06 2020-05-12 顺丰科技有限公司 URL duplicate removal method, device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9535658B2 (en) * 2012-09-28 2017-01-03 Alcatel Lucent Secure private database querying system with content hiding bloom filters
CN106649346B (en) * 2015-10-30 2020-09-22 北京国双科技有限公司 Data repeatability checking method and device
CN109313642B (en) * 2018-09-07 2022-07-12 威富通科技有限公司 Bill information caching method, bill information query method and terminal equipment
CN109614407A (en) * 2018-12-10 2019-04-12 北京奇艺世纪科技有限公司 A kind of request processing method and equipment
CN110532251B (en) * 2019-08-28 2021-11-05 东北大学 Seismic table network big data deduplication method based on bloom filter algorithm
CN110880147B (en) * 2019-11-22 2022-08-26 腾讯科技(深圳)有限公司 Transaction processing method, related equipment and computer storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143720A (en) * 2018-11-06 2020-05-12 顺丰科技有限公司 URL duplicate removal method, device and storage medium
CN110781386A (en) * 2019-10-10 2020-02-11 支付宝(杭州)信息技术有限公司 Information recommendation method and device, and bloom filter creation method and device

Also Published As

Publication number Publication date
CN113377812A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN110737658B (en) Data fragment storage method, device, terminal and readable storage medium
CN113377812B (en) Order duplicate removal method and device for big data
CN108494799B (en) Data sharing method and system
CN109117275B (en) Account checking method and device based on data slicing, computer equipment and storage medium
CN109064031B (en) Project affiliate credit evaluation method based on block chain, block chain and storage medium
CN108090064A (en) A kind of data query method, apparatus, data storage server and system
CN109831540B (en) Distributed storage method and device, electronic equipment and storage medium
CN104572727A (en) Data querying method and device
CN109445902B (en) Data operation method and system
CN107085613B (en) Method and device for filtering files to be put in storage
CN112764997B (en) Log storage method and device, computer equipment and storage medium
CN113076303A (en) Method and device for generating service identifier in distributed system
CN111061681A (en) Method and device for partitioning directory based on case insensitivity and storage medium
CN111063183B (en) Bluetooth-based electric energy meter statistical method and device and storage medium
CN107357557B (en) Information updating method and device
CN107870940B (en) File storage method and device
CN105592083A (en) Method and device for terminal to have access to server by using token
CN112860679A (en) Equipment information management method and device, electronic equipment and storage medium
CN113656098A (en) Configuration acquisition method and system
CN110413341A (en) A kind of starting method, apparatus, terminal and the medium of application program
CN107422991B (en) Storage strategy management system
CN106886589B (en) Picture storage method, server and client
CN112540984B (en) Data storage method, query method, device, electronic equipment and storage medium
CN111654398B (en) Configuration updating method and device, computer equipment and readable storage medium
CN112948020A (en) Configuration file processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant