CN111143340B - Data processing method and device, server and client - Google Patents
Data processing method and device, server and client Download PDFInfo
- Publication number
- CN111143340B CN111143340B CN201911358949.8A CN201911358949A CN111143340B CN 111143340 B CN111143340 B CN 111143340B CN 201911358949 A CN201911358949 A CN 201911358949A CN 111143340 B CN111143340 B CN 111143340B
- Authority
- CN
- China
- Prior art keywords
- data
- adjacent
- written
- duplicate removal
- processed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
Abstract
The application provides a data processing method, a data processing device, a server and a client, and the method applied to the server comprises the following steps: receiving an SQL query request sent by a client; acquiring data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as data to be processed, calling an adjacent duplicate removal function source code in a ClickHouse source code, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: carrying out adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data; and returning the data after the adjacent deduplication to the client. In the present application, the efficiency of adjacent deduplication can be improved in the above manner.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, a server, and a client.
Background
In some services, there is a need for adjacent deduplication, for example, in the process of searching all pages accessed by a user in time sequence, deduplication calculation needs to be performed on the same page accessed in adjacent time.
However, how to perform efficient adjacent deduplication becomes a problem.
Disclosure of Invention
In order to solve the foregoing technical problems, embodiments of the present application provide a data processing method, an apparatus, a server, and a client, so as to achieve the purpose of improving the efficiency of adjacent deduplication, and the technical solution is as follows:
a data processing method is applied to a server, the server adds adjacent deduplication function source codes in a ClickHouse source code, and the method comprises the following steps:
receiving an SQL query request sent by a client;
acquiring the data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as the data to be processed, calling an adjacent duplicate removal function source code in the ClickHouse source code, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;
and returning the data after the adjacent deduplication to the client.
Preferably, the performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data includes:
grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays;
sorting the data in each array according to the data generation time to obtain sorted data;
respectively corresponding to each array, and constructing a list;
selecting one of the unselected data in each array in sequence as data to be written;
respectively checking whether the latest written data in each list is the same as the data to be written;
if not, writing the data to be written into the list;
if the data to be written are the same, discarding the data to be written;
respectively judging whether unselected data exist in each array;
if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written;
and if not, taking the data in the lists as the data after adjacent deduplication.
Preferably, the performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data includes:
sorting the data to be processed according to the data generation time respectively to obtain the sorted data to be processed;
constructing a list;
selecting one of the unselected data in the sorted data to be processed in sequence as data to be written;
checking whether the latest written data in the list is the same as the data to be written;
if not, writing the data to be written into the list;
if the data to be written are the same, discarding the data to be written;
judging whether the sorted to-be-processed data has unselected data;
if yes, returning to the step of selecting one data from the unselected data in the sorted data to be processed in sequence as data to be written;
and if not, taking the data in the list as the data after the adjacent duplication elimination.
A data processing method is applied to a client, and comprises the following steps:
acquiring an SQL query request;
sending the SQL query request to a server, so that the server receives the SQL query request sent by a client, acquiring data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as data to be processed, calling an adjacent duplicate removal function source code in a source code of the ClickHouse, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client;
and receiving the adjacent de-duplicated data returned by the server.
A data processing device is applied to a server, the server adds adjacent deduplication function source codes in the source codes of ClickHouse, and the device comprises:
the query request receiving module is used for receiving an SQL query request sent by a client;
the adjacent duplicate removal calculation module is configured to obtain data requested to be processed by the SQL query request from the ClickHouse, use the obtained data as data to be processed, call an adjacent duplicate removal function source code in the ClickHouse source code, and execute an adjacent duplicate removal process, where the adjacent duplicate removal process is: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;
and the return module is used for returning the data after the adjacent duplication elimination to the client.
Preferably, the adjacent duplicate removal calculation module is specifically configured to:
grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays;
sorting the data in each array according to the data generation time to obtain sorted data;
respectively corresponding to each array, and constructing a list;
selecting one of the unselected data in each array in sequence as data to be written;
respectively checking whether the latest written data in each list is the same as the data to be written;
if not, writing the data to be written into the list;
if the data to be written are the same, discarding the data to be written;
respectively judging whether unselected data exist in each array;
if yes, returning to the step of selecting one of the unselected data in each array in sequence as the data to be written;
and if not, taking the data in the lists as the data after adjacent deduplication.
Preferably, the adjacent duplicate removal calculation module is specifically configured to:
sorting the data to be processed according to the data generation time respectively to obtain the sorted data to be processed;
constructing a list;
selecting one of the unselected data in the sorted data to be processed in sequence as data to be written;
checking whether the latest written data in the list is the same as the data to be written;
if not, writing the data to be written into the list;
if the data to be written are the same, discarding the data to be written;
judging whether the sorted data to be processed has unselected data;
if yes, returning to the step of selecting one data from the unselected data in the sorted data to be processed in sequence as data to be written;
and if not, taking the data in the list as the data after the adjacent duplication elimination.
A data processing device is applied to a client, and comprises:
the query request acquisition module is used for acquiring the SQL query request;
a sending module, configured to send the SQL query request to a server, so that the server receives the SQL query request sent by a client, obtains data requested to be processed by the SQL query request from the ClickHouse, uses the obtained data as data to be processed, calls an adjacent duplicate removal function source code in a source code of the ClickHouse, and executes an adjacent duplicate removal process, where the adjacent duplicate removal process is: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client;
and the data receiving module is used for receiving the adjacent deduplicated data returned by the server.
A server, comprising:
a memory for storing a program;
the processor is configured to run the program, and when the processor runs the program, the processor implements the data processing method according to any one of the above.
A client, comprising:
a memory for storing a program;
the processor is configured to run the program, and when the processor runs the program, the processor implements the data processing method as described above.
Compared with the prior art, the beneficial effect of this application is:
in the application, the server adds the source code of the adjacent duplicate removal function in the source code of the clickwouse to enable the clickwouse to have the adjacent duplicate removal function, on the basis, receives an SQL query request sent by a client, acquires data requested to be processed by the SQL query request from the clickwouse, takes the acquired data as data to be processed, calls the adjacent duplicate removal function source code in the source code of the clickwouse, executes the adjacent duplicate removal process, realizes the adjacent duplicate removal in the clickwouse, avoids the transmission of the data among different devices, can shorten the whole time of the adjacent duplicate removal, improves the efficiency of the adjacent duplicate removal, and utilizes the high running speed of the server to further ensure the efficiency of the adjacent duplicate removal.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a flowchart of an embodiment 1 of a data processing method provided in the present application;
FIG. 2 is a flow chart of an adjacent deduplication process provided herein;
FIG. 3 is a flow chart of another neighboring deduplication process provided herein;
fig. 4 is a flowchart of embodiment 2 of a data processing method provided in the present application;
fig. 5 is a schematic logical structure diagram of a data processing apparatus provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application discloses a data processing method, which is applied to a server, wherein the server adds adjacent duplication removal function source codes in a ClickHouse source code, and the method comprises the following steps: receiving an SQL query request sent by a client; acquiring the data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as the data to be processed, calling an adjacent duplicate removal function source code in the ClickHouse source code, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data; and returning the data after the adjacent deduplication to the client. In the present application, the efficiency of adjacent deduplication can be improved.
Next, a data processing method disclosed in an embodiment of the present application is introduced, where the data processing method disclosed in the embodiment of the present application is applied to a server, and the server adds an adjacent deduplication function source code to a clickwouse source code, as shown in fig. 1, a flowchart of an embodiment 1 of the data processing method provided by the present application may include the following steps:
and S11, receiving an SQL query request sent by the client.
The SQL query request sent by the client may include an identifier (e.g., name or storage location) of the data requested to be processed and information indicating that the neighbor deduplication calculation is performed.
Step S12, acquiring the data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as the data to be processed, calling the adjacent duplication removal function source code in the ClickHouse source code, and executing the adjacent duplication removal process, wherein the adjacent duplication removal process is as follows: and performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data.
ClickHouse can be understood as: OLAP-oriented distributed columnar databases sourced by Yandex corporation, entitled "russian Google" are capable of generating real-time data reports using SQL queries.
Specifically, the server may obtain the data requested to be processed by the SQL query request from the ClickHouse according to the identifier of the data requested to be processed in the SQL query request, use the obtained data as the data to be processed, and call the adjacent duplicate removal function source code in the ClickHouse source code according to the information indicating to perform the adjacent duplicate removal calculation in the SQL query request, so as to execute the adjacent duplicate removal process.
Wherein, the adjacent deduplication process may include: and performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data.
And S13, returning the data after the adjacent duplicate removal to the client.
In the application, the server adds the source code of the adjacent duplicate removal function in the source code of the clickwouse to enable the clickwouse to have the adjacent duplicate removal function, on the basis, receives an SQL query request sent by a client, acquires data requested to be processed by the SQL query request from the clickwouse, takes the acquired data as data to be processed, calls the adjacent duplicate removal function source code in the source code of the clickwouse, executes the adjacent duplicate removal process, realizes the adjacent duplicate removal in the clickwouse, avoids the transmission of the data among different devices, can shorten the whole time of the adjacent duplicate removal, improves the efficiency of the adjacent duplicate removal, and utilizes the high running speed of the server to further ensure the efficiency of the adjacent duplicate removal.
In another embodiment of the present application, the above-mentioned adjacent deduplication process is described, as shown in fig. 2, and may include the following steps:
and S21, grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays.
And grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays, so that the efficiency of subsequent processing can be improved.
And S22, sorting the data in each array according to the data generation time to obtain sorted data.
And S23, respectively corresponding to each array, and constructing a list.
If the arrays are A, B and C respectively, corresponding to the array A, constructing a list a; corresponding to the array B, constructing a list B; and corresponding to the array C, constructing a list C.
The list is used to store data.
And step S24, selecting one of the unselected data in each array in sequence respectively to be used as the data to be written.
In this embodiment, the unselected data in each array may be understood as: and unselected data in the sorted data in each array.
And step S25, respectively checking whether the latest written data in each list is the same as the data to be written.
If not, go to step S26; if the same, step S27 is executed.
And S26, writing the data to be written into the list.
And S27, discarding the data to be written.
And step S28, respectively judging whether the unselected data exist in each array.
If yes, returning to execute the step S24; if not, step S29 is executed.
And S29, taking the data in the lists as the data after adjacent deduplication.
In another embodiment of the present application, another adjacent deduplication process is presented, which, as shown in fig. 3, may include the following steps:
and S31, sequencing the data to be processed according to the data generation time respectively to obtain the sequenced data to be processed.
And step S32, constructing a list.
The list is used to store data.
And step S33, selecting one data from the unselected data in the sorted data to be processed in sequence as data to be written.
And step S34, checking whether the latest written data in the list is the same as the data to be written.
If not, executing step S35; if yes, go to step S36.
And step S35, writing the data to be written into the list.
And S36, discarding the data to be written.
And step S37, judging whether the sorted to-be-processed data has unselected data.
If yes, returning to execute the step S33; if not, go to step S38.
And step S38, taking the data in the list as the data after the adjacent duplication elimination.
In another embodiment of the present application, another data processing method is introduced and applied to a client, as shown in fig. 4, a flowchart of embodiment 2 of the data processing method provided in the present application may include the following steps:
and S41, acquiring the SQL query request.
The client can receive the SQL query request input by the user and realize the acquisition of the SQL query request.
Step S42, the SQL query request is sent to a server, so that the server receives the SQL query request sent by a client, the data requested to be processed by the SQL query request is obtained from the ClickHouse, the obtained data is used as the data to be processed, the adjacent duplicate removal function source codes in the source codes of the ClickHouse are called, and the adjacent duplicate removal process is executed, wherein the adjacent duplicate removal process is as follows: and performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client.
And S43, receiving the adjacent deduplicated data returned by the server.
And the client receives the adjacent deduplicated data returned by the server, and can actively push the data to the user or provide the user for acquiring the adjacent deduplicated data.
Next, a data processing apparatus provided in the present application will be described, and the data processing apparatus described below and the data processing method described above may be referred to in correspondence with each other.
Referring to fig. 5, the data processing apparatus is applied to a server, where the server adds adjacent source codes of a deduplication function to a source code of a clickwouse, and the data processing apparatus includes: a query request receiving module 11, an adjacent duplicate removal calculation module 12 and a return module 13.
The query request receiving module 11 is configured to receive an SQL query request sent by a client;
the adjacent duplicate removal calculation module 12 is configured to obtain, from the clickwouse, the data requested to be processed by the SQL query request, use the obtained data as data to be processed, call an adjacent duplicate removal function source code in the source code of the clickwouse, and execute an adjacent duplicate removal process, where the adjacent duplicate removal process is: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;
and a returning module 13, configured to return the neighboring deduplicated data to the client.
In this embodiment, the adjacent duplicate removal calculation module 12 may be specifically configured to:
grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays;
sorting the data in each array according to the data generation time to obtain sorted data;
respectively corresponding to each array, and constructing a list;
selecting one of the unselected data in each array in sequence as data to be written;
respectively checking whether the latest written data in each list is the same as the data to be written;
if not, writing the data to be written into the list;
if the data to be written are the same, discarding the data to be written;
respectively judging whether unselected data exist in each array;
if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written;
and if not, taking the data in the lists as the data after adjacent deduplication.
In this embodiment, the neighboring duplicate removal calculation module 12 may be specifically configured to:
sorting the data to be processed according to the data generation time respectively to obtain the sorted data to be processed;
constructing a list;
selecting one of the unselected data in the sorted data to be processed in sequence as data to be written;
checking whether the latest written data in the list is the same as the data to be written;
if not, writing the data to be written into the list;
if the data to be written are the same, discarding the data to be written;
judging whether the sorted data to be processed has unselected data;
if yes, returning to the step of selecting one data which is not selected from the sorted data to be processed in sequence as data to be written;
and if not, taking the data in the list as the data after the adjacent duplication elimination.
In another embodiment of the present application, there is provided a data processing apparatus applied to a client, the apparatus including:
the query request acquisition module is used for acquiring the SQL query request;
a sending module, configured to send the SQL query request to a server, so that the server receives the SQL query request sent by a client, obtains data requested to be processed by the SQL query request from the ClickHouse, uses the obtained data as data to be processed, calls an adjacent duplicate removal function source code in a source code of the ClickHouse, and executes an adjacent duplicate removal process, where the adjacent duplicate removal process is: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client;
and the data receiving module is used for receiving the adjacent deduplicated data returned by the server.
In another embodiment of the present application, there is provided a server including:
a memory for storing a program;
the processor is configured to run the program, and when the processor runs the program, the processor implements the data processing method as described in embodiment 1.
In another embodiment of the present application, there is provided a client comprising:
a memory for storing a program;
the processor is configured to run the program, and when the processor runs the program, the processor implements the data processing method as described in embodiment 2.
It should be noted that each embodiment is mainly described as a difference from the other embodiments, and the same and similar parts between the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and reference may be made to the partial description of the method embodiment for relevant points.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The data processing method, the data processing device, the server and the client provided by the application are introduced in detail, a specific example is applied in the description to explain the principle and the implementation of the application, and the description of the above embodiment is only used to help understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (6)
1. A data processing method is applied to a server, the server adds adjacent deduplication function source codes in a ClickHouse source code, and the method comprises the following steps:
receiving an SQL query request sent by a client;
acquiring the data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as the data to be processed, calling an adjacent duplicate removal function source code in the ClickHouse source code, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;
returning the data after the adjacent duplicate removal to the client;
the performing adjacent duplicate removal calculation on the data to be processed to obtain the data after adjacent duplicate removal includes: grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays; sorting the data in each array according to the data generation time to obtain sorted data; respectively corresponding to each array, and constructing a list; selecting one of the unselected data sequenced in the arrays in sequence respectively as data to be written; respectively checking whether the latest written data in each list is the same as the data to be written; if not, writing the data to be written into the list; if the data are the same, discarding the data to be written; respectively judging whether unselected data exist in each array; if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written; and if not, taking the data in the lists as the data after adjacent deduplication.
2. A data processing method is applied to a client, and comprises the following steps:
acquiring an SQL query request;
sending the SQL query request to a server, so that the server receives the SQL query request sent by a client, acquiring data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as data to be processed, calling an adjacent duplicate removal function source code in a source code of the ClickHouse, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client;
receiving the adjacent deduplicated data returned by the server;
wherein, the performing adjacent duplicate removal calculation on the data to be processed to obtain the data after adjacent duplicate removal includes: grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays; sorting the data in each array according to the data generation time to obtain sorted data; respectively corresponding to each array, and constructing a list; selecting one of the unselected data sequenced in the arrays in sequence respectively as data to be written; respectively checking whether the latest written data in each list is the same as the data to be written; if not, writing the data to be written into the list; if the data to be written are the same, discarding the data to be written; respectively judging whether unselected data exist in each array; if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written; and if not, taking the data in the lists as the data after adjacent deduplication.
3. A data processing apparatus, applied to a server that adds an adjacent deduplication function source code to a clickwouse source code, the apparatus comprising:
the query request receiving module is used for receiving an SQL query request sent by a client;
the adjacent duplicate removal calculation module is used for acquiring the data requested to be processed by the SQL query request from the ClickHouse, using the acquired data as the data to be processed, calling an adjacent duplicate removal function source code in the source code of the ClickHouse, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;
a returning module, configured to return the data after the adjacent deduplication to the client;
the adjacent duplicate removal calculation module is specifically configured to: grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays; sorting the data in each array according to the data generation time to obtain sorted data; respectively corresponding to each array, and constructing a list; selecting one of the unselected data in each array in sequence as data to be written; respectively checking whether the latest written data in each list is the same as the data to be written; if not, writing the data to be written into the list; if the data to be written are the same, discarding the data to be written; respectively judging whether unselected data exist in each array; if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written; and if not, taking the data in the lists as the data after adjacent deduplication.
4. A data processing apparatus, applied to a client, the apparatus comprising:
the query request acquisition module is used for acquiring the SQL query request;
a sending module, configured to send the SQL query request to a server, so that the server receives the SQL query request sent by a client, obtains data requested to be processed by the SQL query request from the ClickHouse, uses the obtained data as data to be processed, calls an adjacent duplicate removal function source code in a source code of the ClickHouse, and executes an adjacent duplicate removal process, where the adjacent duplicate removal process is: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client;
the data receiving module is used for receiving the adjacent deduplicated data returned by the server;
the performing adjacent duplicate removal calculation on the data to be processed to obtain the data after adjacent duplicate removal includes: grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays; sorting the data in each array according to the data generation time to obtain sorted data; respectively corresponding to each array, and constructing a list; selecting one of the unselected data sequenced in the arrays in sequence respectively as data to be written; respectively checking whether the latest written data in each list is the same as the data to be written; if not, writing the data to be written into the list; if the data to be written are the same, discarding the data to be written; respectively judging whether unselected data exist in each array; if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written; and if not, taking the data in the lists as the data after adjacent deduplication.
5. A server, comprising:
a memory for storing a program;
the processor for executing the program, the processor implementing the data processing method of claim 1 when the processor executes the program.
6. A client, comprising:
a memory for storing a program;
the processor for executing the program, the processor implementing the data processing method of claim 2 when the processor executes the program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911358949.8A CN111143340B (en) | 2019-12-25 | 2019-12-25 | Data processing method and device, server and client |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911358949.8A CN111143340B (en) | 2019-12-25 | 2019-12-25 | Data processing method and device, server and client |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111143340A CN111143340A (en) | 2020-05-12 |
CN111143340B true CN111143340B (en) | 2023-03-21 |
Family
ID=70520086
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911358949.8A Active CN111143340B (en) | 2019-12-25 | 2019-12-25 | Data processing method and device, server and client |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111143340B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239230A (en) * | 2016-03-29 | 2017-10-10 | 三星电子株式会社 | The many hash tables of hop-scotch of the optimization of duplicate removal application are embedded for efficient memory |
CN110321346A (en) * | 2019-05-28 | 2019-10-11 | 中国科学院计算技术研究所 | A kind of character string hash table method and system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150213047A1 (en) * | 2014-01-24 | 2015-07-30 | Netapp Inc. | Coalescing sequences for host side deduplication |
US9626115B2 (en) * | 2015-01-14 | 2017-04-18 | International Business Machines Corporation | Threshold based incremental flashcopy backup of a raid protected array |
-
2019
- 2019-12-25 CN CN201911358949.8A patent/CN111143340B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239230A (en) * | 2016-03-29 | 2017-10-10 | 三星电子株式会社 | The many hash tables of hop-scotch of the optimization of duplicate removal application are embedded for efficient memory |
CN110321346A (en) * | 2019-05-28 | 2019-10-11 | 中国科学院计算技术研究所 | A kind of character string hash table method and system |
Non-Patent Citations (1)
Title |
---|
"构建clickhouse复杂数据模型";与AI零距离;《https://www.jianshu.com/p/7b2f17ef4ab7》;20190919;1-7 * |
Also Published As
Publication number | Publication date |
---|---|
CN111143340A (en) | 2020-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108304444B (en) | Information query method and device | |
US11580168B2 (en) | Method and system for providing context based query suggestions | |
KR100971863B1 (en) | System and method for batched indexing of network documents | |
CN107103032B (en) | Mass data paging query method for avoiding global sequencing in distributed environment | |
EP3602351A1 (en) | Apparatus and method for distributed query processing utilizing dynamically generated in-memory term maps | |
CN102193917A (en) | Method and device for processing and querying data | |
WO2010129063A1 (en) | Method and system for search engine indexing and searching using the index | |
CN109766318B (en) | File reading method and device | |
CN104636502A (en) | Accelerated data query method of query system | |
US20140289268A1 (en) | Systems and methods of rationing data assembly resources | |
CN111858760A (en) | Data processing method and device for heterogeneous database | |
CN111046041A (en) | Data processing method and device, storage medium and processor | |
US20170270149A1 (en) | Database systems with re-ordered replicas and methods of accessing and backing up databases | |
CN108038253B (en) | Log query processing method and device | |
CN112527824B (en) | Paging query method, paging query device, electronic equipment and computer-readable storage medium | |
CN108897858A (en) | The appraisal procedure and device, electronic equipment of distributed type assemblies index fragment | |
CN111143340B (en) | Data processing method and device, server and client | |
CN106446080B (en) | Data query method, query service equipment, client equipment and data system | |
CN109213972B (en) | Method, device, equipment and computer storage medium for determining document similarity | |
CN113032436B (en) | Searching method and device based on article content and title | |
CN112464049B (en) | Method, device and equipment for downloading number detail list | |
CN108804502A (en) | Big data inquiry system, method, computer equipment and storage medium | |
CN114139040A (en) | Data storage and query method, device, equipment and readable storage medium | |
CN111639099A (en) | Full-text indexing method and system | |
CN111858609A (en) | Fuzzy query method and device for block chain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |