CN111143340B - Data processing method and device, server and client - Google Patents

Data processing method and device, server and client Download PDF

Info

Publication number
CN111143340B
CN111143340B CN201911358949.8A CN201911358949A CN111143340B CN 111143340 B CN111143340 B CN 111143340B CN 201911358949 A CN201911358949 A CN 201911358949A CN 111143340 B CN111143340 B CN 111143340B
Authority
CN
China
Prior art keywords
data
adjacent
written
duplicate removal
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911358949.8A
Other languages
Chinese (zh)
Other versions
CN111143340A (en
Inventor
黄金
王俊博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Knet Eqxiu Technology Co ltd
Original Assignee
Beijing Knet Eqxiu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Knet Eqxiu Technology Co ltd filed Critical Beijing Knet Eqxiu Technology Co ltd
Priority to CN201911358949.8A priority Critical patent/CN111143340B/en
Publication of CN111143340A publication Critical patent/CN111143340A/en
Application granted granted Critical
Publication of CN111143340B publication Critical patent/CN111143340B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Abstract

The application provides a data processing method, a data processing device, a server and a client, and the method applied to the server comprises the following steps: receiving an SQL query request sent by a client; acquiring data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as data to be processed, calling an adjacent duplicate removal function source code in a ClickHouse source code, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: carrying out adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data; and returning the data after the adjacent deduplication to the client. In the present application, the efficiency of adjacent deduplication can be improved in the above manner.

Description

Data processing method and device, server and client
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, a server, and a client.
Background
In some services, there is a need for adjacent deduplication, for example, in the process of searching all pages accessed by a user in time sequence, deduplication calculation needs to be performed on the same page accessed in adjacent time.
However, how to perform efficient adjacent deduplication becomes a problem.
Disclosure of Invention
In order to solve the foregoing technical problems, embodiments of the present application provide a data processing method, an apparatus, a server, and a client, so as to achieve the purpose of improving the efficiency of adjacent deduplication, and the technical solution is as follows:
a data processing method is applied to a server, the server adds adjacent deduplication function source codes in a ClickHouse source code, and the method comprises the following steps:
receiving an SQL query request sent by a client;
acquiring the data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as the data to be processed, calling an adjacent duplicate removal function source code in the ClickHouse source code, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;
and returning the data after the adjacent deduplication to the client.
Preferably, the performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data includes:
grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays;
sorting the data in each array according to the data generation time to obtain sorted data;
respectively corresponding to each array, and constructing a list;
selecting one of the unselected data in each array in sequence as data to be written;
respectively checking whether the latest written data in each list is the same as the data to be written;
if not, writing the data to be written into the list;
if the data to be written are the same, discarding the data to be written;
respectively judging whether unselected data exist in each array;
if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written;
and if not, taking the data in the lists as the data after adjacent deduplication.
Preferably, the performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data includes:
sorting the data to be processed according to the data generation time respectively to obtain the sorted data to be processed;
constructing a list;
selecting one of the unselected data in the sorted data to be processed in sequence as data to be written;
checking whether the latest written data in the list is the same as the data to be written;
if not, writing the data to be written into the list;
if the data to be written are the same, discarding the data to be written;
judging whether the sorted to-be-processed data has unselected data;
if yes, returning to the step of selecting one data from the unselected data in the sorted data to be processed in sequence as data to be written;
and if not, taking the data in the list as the data after the adjacent duplication elimination.
A data processing method is applied to a client, and comprises the following steps:
acquiring an SQL query request;
sending the SQL query request to a server, so that the server receives the SQL query request sent by a client, acquiring data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as data to be processed, calling an adjacent duplicate removal function source code in a source code of the ClickHouse, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client;
and receiving the adjacent de-duplicated data returned by the server.
A data processing device is applied to a server, the server adds adjacent deduplication function source codes in the source codes of ClickHouse, and the device comprises:
the query request receiving module is used for receiving an SQL query request sent by a client;
the adjacent duplicate removal calculation module is configured to obtain data requested to be processed by the SQL query request from the ClickHouse, use the obtained data as data to be processed, call an adjacent duplicate removal function source code in the ClickHouse source code, and execute an adjacent duplicate removal process, where the adjacent duplicate removal process is: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;
and the return module is used for returning the data after the adjacent duplication elimination to the client.
Preferably, the adjacent duplicate removal calculation module is specifically configured to:
grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays;
sorting the data in each array according to the data generation time to obtain sorted data;
respectively corresponding to each array, and constructing a list;
selecting one of the unselected data in each array in sequence as data to be written;
respectively checking whether the latest written data in each list is the same as the data to be written;
if not, writing the data to be written into the list;
if the data to be written are the same, discarding the data to be written;
respectively judging whether unselected data exist in each array;
if yes, returning to the step of selecting one of the unselected data in each array in sequence as the data to be written;
and if not, taking the data in the lists as the data after adjacent deduplication.
Preferably, the adjacent duplicate removal calculation module is specifically configured to:
sorting the data to be processed according to the data generation time respectively to obtain the sorted data to be processed;
constructing a list;
selecting one of the unselected data in the sorted data to be processed in sequence as data to be written;
checking whether the latest written data in the list is the same as the data to be written;
if not, writing the data to be written into the list;
if the data to be written are the same, discarding the data to be written;
judging whether the sorted data to be processed has unselected data;
if yes, returning to the step of selecting one data from the unselected data in the sorted data to be processed in sequence as data to be written;
and if not, taking the data in the list as the data after the adjacent duplication elimination.
A data processing device is applied to a client, and comprises:
the query request acquisition module is used for acquiring the SQL query request;
a sending module, configured to send the SQL query request to a server, so that the server receives the SQL query request sent by a client, obtains data requested to be processed by the SQL query request from the ClickHouse, uses the obtained data as data to be processed, calls an adjacent duplicate removal function source code in a source code of the ClickHouse, and executes an adjacent duplicate removal process, where the adjacent duplicate removal process is: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client;
and the data receiving module is used for receiving the adjacent deduplicated data returned by the server.
A server, comprising:
a memory for storing a program;
the processor is configured to run the program, and when the processor runs the program, the processor implements the data processing method according to any one of the above.
A client, comprising:
a memory for storing a program;
the processor is configured to run the program, and when the processor runs the program, the processor implements the data processing method as described above.
Compared with the prior art, the beneficial effect of this application is:
in the application, the server adds the source code of the adjacent duplicate removal function in the source code of the clickwouse to enable the clickwouse to have the adjacent duplicate removal function, on the basis, receives an SQL query request sent by a client, acquires data requested to be processed by the SQL query request from the clickwouse, takes the acquired data as data to be processed, calls the adjacent duplicate removal function source code in the source code of the clickwouse, executes the adjacent duplicate removal process, realizes the adjacent duplicate removal in the clickwouse, avoids the transmission of the data among different devices, can shorten the whole time of the adjacent duplicate removal, improves the efficiency of the adjacent duplicate removal, and utilizes the high running speed of the server to further ensure the efficiency of the adjacent duplicate removal.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a flowchart of an embodiment 1 of a data processing method provided in the present application;
FIG. 2 is a flow chart of an adjacent deduplication process provided herein;
FIG. 3 is a flow chart of another neighboring deduplication process provided herein;
fig. 4 is a flowchart of embodiment 2 of a data processing method provided in the present application;
fig. 5 is a schematic logical structure diagram of a data processing apparatus provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application discloses a data processing method, which is applied to a server, wherein the server adds adjacent duplication removal function source codes in a ClickHouse source code, and the method comprises the following steps: receiving an SQL query request sent by a client; acquiring the data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as the data to be processed, calling an adjacent duplicate removal function source code in the ClickHouse source code, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data; and returning the data after the adjacent deduplication to the client. In the present application, the efficiency of adjacent deduplication can be improved.
Next, a data processing method disclosed in an embodiment of the present application is introduced, where the data processing method disclosed in the embodiment of the present application is applied to a server, and the server adds an adjacent deduplication function source code to a clickwouse source code, as shown in fig. 1, a flowchart of an embodiment 1 of the data processing method provided by the present application may include the following steps:
and S11, receiving an SQL query request sent by the client.
The SQL query request sent by the client may include an identifier (e.g., name or storage location) of the data requested to be processed and information indicating that the neighbor deduplication calculation is performed.
Step S12, acquiring the data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as the data to be processed, calling the adjacent duplication removal function source code in the ClickHouse source code, and executing the adjacent duplication removal process, wherein the adjacent duplication removal process is as follows: and performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data.
ClickHouse can be understood as: OLAP-oriented distributed columnar databases sourced by Yandex corporation, entitled "russian Google" are capable of generating real-time data reports using SQL queries.
Specifically, the server may obtain the data requested to be processed by the SQL query request from the ClickHouse according to the identifier of the data requested to be processed in the SQL query request, use the obtained data as the data to be processed, and call the adjacent duplicate removal function source code in the ClickHouse source code according to the information indicating to perform the adjacent duplicate removal calculation in the SQL query request, so as to execute the adjacent duplicate removal process.
Wherein, the adjacent deduplication process may include: and performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data.
And S13, returning the data after the adjacent duplicate removal to the client.
In the application, the server adds the source code of the adjacent duplicate removal function in the source code of the clickwouse to enable the clickwouse to have the adjacent duplicate removal function, on the basis, receives an SQL query request sent by a client, acquires data requested to be processed by the SQL query request from the clickwouse, takes the acquired data as data to be processed, calls the adjacent duplicate removal function source code in the source code of the clickwouse, executes the adjacent duplicate removal process, realizes the adjacent duplicate removal in the clickwouse, avoids the transmission of the data among different devices, can shorten the whole time of the adjacent duplicate removal, improves the efficiency of the adjacent duplicate removal, and utilizes the high running speed of the server to further ensure the efficiency of the adjacent duplicate removal.
In another embodiment of the present application, the above-mentioned adjacent deduplication process is described, as shown in fig. 2, and may include the following steps:
and S21, grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays.
And grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays, so that the efficiency of subsequent processing can be improved.
And S22, sorting the data in each array according to the data generation time to obtain sorted data.
And S23, respectively corresponding to each array, and constructing a list.
If the arrays are A, B and C respectively, corresponding to the array A, constructing a list a; corresponding to the array B, constructing a list B; and corresponding to the array C, constructing a list C.
The list is used to store data.
And step S24, selecting one of the unselected data in each array in sequence respectively to be used as the data to be written.
In this embodiment, the unselected data in each array may be understood as: and unselected data in the sorted data in each array.
And step S25, respectively checking whether the latest written data in each list is the same as the data to be written.
If not, go to step S26; if the same, step S27 is executed.
And S26, writing the data to be written into the list.
And S27, discarding the data to be written.
And step S28, respectively judging whether the unselected data exist in each array.
If yes, returning to execute the step S24; if not, step S29 is executed.
And S29, taking the data in the lists as the data after adjacent deduplication.
In another embodiment of the present application, another adjacent deduplication process is presented, which, as shown in fig. 3, may include the following steps:
and S31, sequencing the data to be processed according to the data generation time respectively to obtain the sequenced data to be processed.
And step S32, constructing a list.
The list is used to store data.
And step S33, selecting one data from the unselected data in the sorted data to be processed in sequence as data to be written.
And step S34, checking whether the latest written data in the list is the same as the data to be written.
If not, executing step S35; if yes, go to step S36.
And step S35, writing the data to be written into the list.
And S36, discarding the data to be written.
And step S37, judging whether the sorted to-be-processed data has unselected data.
If yes, returning to execute the step S33; if not, go to step S38.
And step S38, taking the data in the list as the data after the adjacent duplication elimination.
In another embodiment of the present application, another data processing method is introduced and applied to a client, as shown in fig. 4, a flowchart of embodiment 2 of the data processing method provided in the present application may include the following steps:
and S41, acquiring the SQL query request.
The client can receive the SQL query request input by the user and realize the acquisition of the SQL query request.
Step S42, the SQL query request is sent to a server, so that the server receives the SQL query request sent by a client, the data requested to be processed by the SQL query request is obtained from the ClickHouse, the obtained data is used as the data to be processed, the adjacent duplicate removal function source codes in the source codes of the ClickHouse are called, and the adjacent duplicate removal process is executed, wherein the adjacent duplicate removal process is as follows: and performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client.
And S43, receiving the adjacent deduplicated data returned by the server.
And the client receives the adjacent deduplicated data returned by the server, and can actively push the data to the user or provide the user for acquiring the adjacent deduplicated data.
Next, a data processing apparatus provided in the present application will be described, and the data processing apparatus described below and the data processing method described above may be referred to in correspondence with each other.
Referring to fig. 5, the data processing apparatus is applied to a server, where the server adds adjacent source codes of a deduplication function to a source code of a clickwouse, and the data processing apparatus includes: a query request receiving module 11, an adjacent duplicate removal calculation module 12 and a return module 13.
The query request receiving module 11 is configured to receive an SQL query request sent by a client;
the adjacent duplicate removal calculation module 12 is configured to obtain, from the clickwouse, the data requested to be processed by the SQL query request, use the obtained data as data to be processed, call an adjacent duplicate removal function source code in the source code of the clickwouse, and execute an adjacent duplicate removal process, where the adjacent duplicate removal process is: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;
and a returning module 13, configured to return the neighboring deduplicated data to the client.
In this embodiment, the adjacent duplicate removal calculation module 12 may be specifically configured to:
grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays;
sorting the data in each array according to the data generation time to obtain sorted data;
respectively corresponding to each array, and constructing a list;
selecting one of the unselected data in each array in sequence as data to be written;
respectively checking whether the latest written data in each list is the same as the data to be written;
if not, writing the data to be written into the list;
if the data to be written are the same, discarding the data to be written;
respectively judging whether unselected data exist in each array;
if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written;
and if not, taking the data in the lists as the data after adjacent deduplication.
In this embodiment, the neighboring duplicate removal calculation module 12 may be specifically configured to:
sorting the data to be processed according to the data generation time respectively to obtain the sorted data to be processed;
constructing a list;
selecting one of the unselected data in the sorted data to be processed in sequence as data to be written;
checking whether the latest written data in the list is the same as the data to be written;
if not, writing the data to be written into the list;
if the data to be written are the same, discarding the data to be written;
judging whether the sorted data to be processed has unselected data;
if yes, returning to the step of selecting one data which is not selected from the sorted data to be processed in sequence as data to be written;
and if not, taking the data in the list as the data after the adjacent duplication elimination.
In another embodiment of the present application, there is provided a data processing apparatus applied to a client, the apparatus including:
the query request acquisition module is used for acquiring the SQL query request;
a sending module, configured to send the SQL query request to a server, so that the server receives the SQL query request sent by a client, obtains data requested to be processed by the SQL query request from the ClickHouse, uses the obtained data as data to be processed, calls an adjacent duplicate removal function source code in a source code of the ClickHouse, and executes an adjacent duplicate removal process, where the adjacent duplicate removal process is: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client;
and the data receiving module is used for receiving the adjacent deduplicated data returned by the server.
In another embodiment of the present application, there is provided a server including:
a memory for storing a program;
the processor is configured to run the program, and when the processor runs the program, the processor implements the data processing method as described in embodiment 1.
In another embodiment of the present application, there is provided a client comprising:
a memory for storing a program;
the processor is configured to run the program, and when the processor runs the program, the processor implements the data processing method as described in embodiment 2.
It should be noted that each embodiment is mainly described as a difference from the other embodiments, and the same and similar parts between the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and reference may be made to the partial description of the method embodiment for relevant points.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The data processing method, the data processing device, the server and the client provided by the application are introduced in detail, a specific example is applied in the description to explain the principle and the implementation of the application, and the description of the above embodiment is only used to help understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (6)

1. A data processing method is applied to a server, the server adds adjacent deduplication function source codes in a ClickHouse source code, and the method comprises the following steps:
receiving an SQL query request sent by a client;
acquiring the data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as the data to be processed, calling an adjacent duplicate removal function source code in the ClickHouse source code, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;
returning the data after the adjacent duplicate removal to the client;
the performing adjacent duplicate removal calculation on the data to be processed to obtain the data after adjacent duplicate removal includes: grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays; sorting the data in each array according to the data generation time to obtain sorted data; respectively corresponding to each array, and constructing a list; selecting one of the unselected data sequenced in the arrays in sequence respectively as data to be written; respectively checking whether the latest written data in each list is the same as the data to be written; if not, writing the data to be written into the list; if the data are the same, discarding the data to be written; respectively judging whether unselected data exist in each array; if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written; and if not, taking the data in the lists as the data after adjacent deduplication.
2. A data processing method is applied to a client, and comprises the following steps:
acquiring an SQL query request;
sending the SQL query request to a server, so that the server receives the SQL query request sent by a client, acquiring data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as data to be processed, calling an adjacent duplicate removal function source code in a source code of the ClickHouse, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client;
receiving the adjacent deduplicated data returned by the server;
wherein, the performing adjacent duplicate removal calculation on the data to be processed to obtain the data after adjacent duplicate removal includes: grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays; sorting the data in each array according to the data generation time to obtain sorted data; respectively corresponding to each array, and constructing a list; selecting one of the unselected data sequenced in the arrays in sequence respectively as data to be written; respectively checking whether the latest written data in each list is the same as the data to be written; if not, writing the data to be written into the list; if the data to be written are the same, discarding the data to be written; respectively judging whether unselected data exist in each array; if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written; and if not, taking the data in the lists as the data after adjacent deduplication.
3. A data processing apparatus, applied to a server that adds an adjacent deduplication function source code to a clickwouse source code, the apparatus comprising:
the query request receiving module is used for receiving an SQL query request sent by a client;
the adjacent duplicate removal calculation module is used for acquiring the data requested to be processed by the SQL query request from the ClickHouse, using the acquired data as the data to be processed, calling an adjacent duplicate removal function source code in the source code of the ClickHouse, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;
a returning module, configured to return the data after the adjacent deduplication to the client;
the adjacent duplicate removal calculation module is specifically configured to: grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays; sorting the data in each array according to the data generation time to obtain sorted data; respectively corresponding to each array, and constructing a list; selecting one of the unselected data in each array in sequence as data to be written; respectively checking whether the latest written data in each list is the same as the data to be written; if not, writing the data to be written into the list; if the data to be written are the same, discarding the data to be written; respectively judging whether unselected data exist in each array; if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written; and if not, taking the data in the lists as the data after adjacent deduplication.
4. A data processing apparatus, applied to a client, the apparatus comprising:
the query request acquisition module is used for acquiring the SQL query request;
a sending module, configured to send the SQL query request to a server, so that the server receives the SQL query request sent by a client, obtains data requested to be processed by the SQL query request from the ClickHouse, uses the obtained data as data to be processed, calls an adjacent duplicate removal function source code in a source code of the ClickHouse, and executes an adjacent duplicate removal process, where the adjacent duplicate removal process is: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client;
the data receiving module is used for receiving the adjacent deduplicated data returned by the server;
the performing adjacent duplicate removal calculation on the data to be processed to obtain the data after adjacent duplicate removal includes: grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays; sorting the data in each array according to the data generation time to obtain sorted data; respectively corresponding to each array, and constructing a list; selecting one of the unselected data sequenced in the arrays in sequence respectively as data to be written; respectively checking whether the latest written data in each list is the same as the data to be written; if not, writing the data to be written into the list; if the data to be written are the same, discarding the data to be written; respectively judging whether unselected data exist in each array; if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written; and if not, taking the data in the lists as the data after adjacent deduplication.
5. A server, comprising:
a memory for storing a program;
the processor for executing the program, the processor implementing the data processing method of claim 1 when the processor executes the program.
6. A client, comprising:
a memory for storing a program;
the processor for executing the program, the processor implementing the data processing method of claim 2 when the processor executes the program.
CN201911358949.8A 2019-12-25 2019-12-25 Data processing method and device, server and client Active CN111143340B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911358949.8A CN111143340B (en) 2019-12-25 2019-12-25 Data processing method and device, server and client

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911358949.8A CN111143340B (en) 2019-12-25 2019-12-25 Data processing method and device, server and client

Publications (2)

Publication Number Publication Date
CN111143340A CN111143340A (en) 2020-05-12
CN111143340B true CN111143340B (en) 2023-03-21

Family

ID=70520086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911358949.8A Active CN111143340B (en) 2019-12-25 2019-12-25 Data processing method and device, server and client

Country Status (1)

Country Link
CN (1) CN111143340B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239230A (en) * 2016-03-29 2017-10-10 三星电子株式会社 The many hash tables of hop-scotch of the optimization of duplicate removal application are embedded for efficient memory
CN110321346A (en) * 2019-05-28 2019-10-11 中国科学院计算技术研究所 A kind of character string hash table method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150213047A1 (en) * 2014-01-24 2015-07-30 Netapp Inc. Coalescing sequences for host side deduplication
US9626115B2 (en) * 2015-01-14 2017-04-18 International Business Machines Corporation Threshold based incremental flashcopy backup of a raid protected array

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239230A (en) * 2016-03-29 2017-10-10 三星电子株式会社 The many hash tables of hop-scotch of the optimization of duplicate removal application are embedded for efficient memory
CN110321346A (en) * 2019-05-28 2019-10-11 中国科学院计算技术研究所 A kind of character string hash table method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"构建clickhouse复杂数据模型";与AI零距离;《https://www.jianshu.com/p/7b2f17ef4ab7》;20190919;1-7 *

Also Published As

Publication number Publication date
CN111143340A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
CN108304444B (en) Information query method and device
US11580168B2 (en) Method and system for providing context based query suggestions
KR100971863B1 (en) System and method for batched indexing of network documents
CN107103032B (en) Mass data paging query method for avoiding global sequencing in distributed environment
EP3602351A1 (en) Apparatus and method for distributed query processing utilizing dynamically generated in-memory term maps
CN102193917A (en) Method and device for processing and querying data
WO2010129063A1 (en) Method and system for search engine indexing and searching using the index
CN109766318B (en) File reading method and device
CN104636502A (en) Accelerated data query method of query system
US20140289268A1 (en) Systems and methods of rationing data assembly resources
CN111858760A (en) Data processing method and device for heterogeneous database
CN111046041A (en) Data processing method and device, storage medium and processor
US20170270149A1 (en) Database systems with re-ordered replicas and methods of accessing and backing up databases
CN108038253B (en) Log query processing method and device
CN112527824B (en) Paging query method, paging query device, electronic equipment and computer-readable storage medium
CN108897858A (en) The appraisal procedure and device, electronic equipment of distributed type assemblies index fragment
CN111143340B (en) Data processing method and device, server and client
CN106446080B (en) Data query method, query service equipment, client equipment and data system
CN109213972B (en) Method, device, equipment and computer storage medium for determining document similarity
CN113032436B (en) Searching method and device based on article content and title
CN112464049B (en) Method, device and equipment for downloading number detail list
CN108804502A (en) Big data inquiry system, method, computer equipment and storage medium
CN114139040A (en) Data storage and query method, device, equipment and readable storage medium
CN111639099A (en) Full-text indexing method and system
CN111858609A (en) Fuzzy query method and device for block chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant