CN111143340B

CN111143340B - Data processing method and device, server and client

Info

Publication number: CN111143340B
Application number: CN201911358949.8A
Authority: CN
Inventors: 黄金; 王俊博
Original assignee: Beijing Knet Eqxiu Technology Co ltd
Current assignee: Beijing Knet Eqxiu Technology Co ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2023-03-21
Anticipated expiration: 2039-12-25
Also published as: CN111143340A

Abstract

The application provides a data processing method, a data processing device, a server and a client, and the method applied to the server comprises the following steps: receiving an SQL query request sent by a client; acquiring data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as data to be processed, calling an adjacent duplicate removal function source code in a ClickHouse source code, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: carrying out adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data; and returning the data after the adjacent deduplication to the client. In the present application, the efficiency of adjacent deduplication can be improved in the above manner.

Description

Data processing method and device, server and client

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, a server, and a client.

Background

In some services, there is a need for adjacent deduplication, for example, in the process of searching all pages accessed by a user in time sequence, deduplication calculation needs to be performed on the same page accessed in adjacent time.

However, how to perform efficient adjacent deduplication becomes a problem.

Disclosure of Invention

In order to solve the foregoing technical problems, embodiments of the present application provide a data processing method, an apparatus, a server, and a client, so as to achieve the purpose of improving the efficiency of adjacent deduplication, and the technical solution is as follows:

a data processing method is applied to a server, the server adds adjacent deduplication function source codes in a ClickHouse source code, and the method comprises the following steps:

receiving an SQL query request sent by a client;

acquiring the data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as the data to be processed, calling an adjacent duplicate removal function source code in the ClickHouse source code, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;

and returning the data after the adjacent deduplication to the client.

Preferably, the performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data includes:

grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays;

sorting the data in each array according to the data generation time to obtain sorted data;

respectively corresponding to each array, and constructing a list;

selecting one of the unselected data in each array in sequence as data to be written;

respectively checking whether the latest written data in each list is the same as the data to be written;

if not, writing the data to be written into the list;

if the data to be written are the same, discarding the data to be written;

respectively judging whether unselected data exist in each array;

if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written;

and if not, taking the data in the lists as the data after adjacent deduplication.

sorting the data to be processed according to the data generation time respectively to obtain the sorted data to be processed;

constructing a list;

selecting one of the unselected data in the sorted data to be processed in sequence as data to be written;

checking whether the latest written data in the list is the same as the data to be written;

if not, writing the data to be written into the list;

if the data to be written are the same, discarding the data to be written;

judging whether the sorted to-be-processed data has unselected data;

if yes, returning to the step of selecting one data from the unselected data in the sorted data to be processed in sequence as data to be written;

and if not, taking the data in the list as the data after the adjacent duplication elimination.

A data processing method is applied to a client, and comprises the following steps:

acquiring an SQL query request;

sending the SQL query request to a server, so that the server receives the SQL query request sent by a client, acquiring data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as data to be processed, calling an adjacent duplicate removal function source code in a source code of the ClickHouse, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client;

and receiving the adjacent de-duplicated data returned by the server.

A data processing device is applied to a server, the server adds adjacent deduplication function source codes in the source codes of ClickHouse, and the device comprises:

the query request receiving module is used for receiving an SQL query request sent by a client;

the adjacent duplicate removal calculation module is configured to obtain data requested to be processed by the SQL query request from the ClickHouse, use the obtained data as data to be processed, call an adjacent duplicate removal function source code in the ClickHouse source code, and execute an adjacent duplicate removal process, where the adjacent duplicate removal process is: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;

and the return module is used for returning the data after the adjacent duplication elimination to the client.

Preferably, the adjacent duplicate removal calculation module is specifically configured to:

respectively corresponding to each array, and constructing a list;

if not, writing the data to be written into the list;

if the data to be written are the same, discarding the data to be written;

respectively judging whether unselected data exist in each array;

if yes, returning to the step of selecting one of the unselected data in each array in sequence as the data to be written;

constructing a list;

if not, writing the data to be written into the list;

if the data to be written are the same, discarding the data to be written;

judging whether the sorted data to be processed has unselected data;

A data processing device is applied to a client, and comprises:

the query request acquisition module is used for acquiring the SQL query request;

a sending module, configured to send the SQL query request to a server, so that the server receives the SQL query request sent by a client, obtains data requested to be processed by the SQL query request from the ClickHouse, uses the obtained data as data to be processed, calls an adjacent duplicate removal function source code in a source code of the ClickHouse, and executes an adjacent duplicate removal process, where the adjacent duplicate removal process is: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client;

and the data receiving module is used for receiving the adjacent deduplicated data returned by the server.

A server, comprising:

a memory for storing a program;

the processor is configured to run the program, and when the processor runs the program, the processor implements the data processing method according to any one of the above.

A client, comprising:

a memory for storing a program;

the processor is configured to run the program, and when the processor runs the program, the processor implements the data processing method as described above.

Compared with the prior art, the beneficial effect of this application is:

in the application, the server adds the source code of the adjacent duplicate removal function in the source code of the clickwouse to enable the clickwouse to have the adjacent duplicate removal function, on the basis, receives an SQL query request sent by a client, acquires data requested to be processed by the SQL query request from the clickwouse, takes the acquired data as data to be processed, calls the adjacent duplicate removal function source code in the source code of the clickwouse, executes the adjacent duplicate removal process, realizes the adjacent duplicate removal in the clickwouse, avoids the transmission of the data among different devices, can shorten the whole time of the adjacent duplicate removal, improves the efficiency of the adjacent duplicate removal, and utilizes the high running speed of the server to further ensure the efficiency of the adjacent duplicate removal.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a flowchart of an embodiment 1 of a data processing method provided in the present application;

FIG. 2 is a flow chart of an adjacent deduplication process provided herein;

FIG. 3 is a flow chart of another neighboring deduplication process provided herein;

fig. 4 is a flowchart of embodiment 2 of a data processing method provided in the present application;

fig. 5 is a schematic logical structure diagram of a data processing apparatus provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application discloses a data processing method, which is applied to a server, wherein the server adds adjacent duplication removal function source codes in a ClickHouse source code, and the method comprises the following steps: receiving an SQL query request sent by a client; acquiring the data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as the data to be processed, calling an adjacent duplicate removal function source code in the ClickHouse source code, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data; and returning the data after the adjacent deduplication to the client. In the present application, the efficiency of adjacent deduplication can be improved.

Next, a data processing method disclosed in an embodiment of the present application is introduced, where the data processing method disclosed in the embodiment of the present application is applied to a server, and the server adds an adjacent deduplication function source code to a clickwouse source code, as shown in fig. 1, a flowchart of an embodiment 1 of the data processing method provided by the present application may include the following steps:

and S11, receiving an SQL query request sent by the client.

The SQL query request sent by the client may include an identifier (e.g., name or storage location) of the data requested to be processed and information indicating that the neighbor deduplication calculation is performed.

Step S12, acquiring the data requested to be processed by the SQL query request from the ClickHouse, taking the acquired data as the data to be processed, calling the adjacent duplication removal function source code in the ClickHouse source code, and executing the adjacent duplication removal process, wherein the adjacent duplication removal process is as follows: and performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data.

ClickHouse can be understood as: OLAP-oriented distributed columnar databases sourced by Yandex corporation, entitled "russian Google" are capable of generating real-time data reports using SQL queries.

Specifically, the server may obtain the data requested to be processed by the SQL query request from the ClickHouse according to the identifier of the data requested to be processed in the SQL query request, use the obtained data as the data to be processed, and call the adjacent duplicate removal function source code in the ClickHouse source code according to the information indicating to perform the adjacent duplicate removal calculation in the SQL query request, so as to execute the adjacent duplicate removal process.

Wherein, the adjacent deduplication process may include: and performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data.

And S13, returning the data after the adjacent duplicate removal to the client.

In another embodiment of the present application, the above-mentioned adjacent deduplication process is described, as shown in fig. 2, and may include the following steps:

and S21, grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays.

And grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays, so that the efficiency of subsequent processing can be improved.

And S22, sorting the data in each array according to the data generation time to obtain sorted data.

And S23, respectively corresponding to each array, and constructing a list.

If the arrays are A, B and C respectively, corresponding to the array A, constructing a list a; corresponding to the array B, constructing a list B; and corresponding to the array C, constructing a list C.

The list is used to store data.

And step S24, selecting one of the unselected data in each array in sequence respectively to be used as the data to be written.

In this embodiment, the unselected data in each array may be understood as: and unselected data in the sorted data in each array.

And step S25, respectively checking whether the latest written data in each list is the same as the data to be written.

If not, go to step S26; if the same, step S27 is executed.

And S26, writing the data to be written into the list.

And S27, discarding the data to be written.

And step S28, respectively judging whether the unselected data exist in each array.

If yes, returning to execute the step S24; if not, step S29 is executed.

And S29, taking the data in the lists as the data after adjacent deduplication.

In another embodiment of the present application, another adjacent deduplication process is presented, which, as shown in fig. 3, may include the following steps:

and S31, sequencing the data to be processed according to the data generation time respectively to obtain the sequenced data to be processed.

And step S32, constructing a list.

The list is used to store data.

And step S33, selecting one data from the unselected data in the sorted data to be processed in sequence as data to be written.

And step S34, checking whether the latest written data in the list is the same as the data to be written.

If not, executing step S35; if yes, go to step S36.

And step S35, writing the data to be written into the list.

And S36, discarding the data to be written.

And step S37, judging whether the sorted to-be-processed data has unselected data.

If yes, returning to execute the step S33; if not, go to step S38.

And step S38, taking the data in the list as the data after the adjacent duplication elimination.

In another embodiment of the present application, another data processing method is introduced and applied to a client, as shown in fig. 4, a flowchart of embodiment 2 of the data processing method provided in the present application may include the following steps:

and S41, acquiring the SQL query request.

The client can receive the SQL query request input by the user and realize the acquisition of the SQL query request.

Step S42, the SQL query request is sent to a server, so that the server receives the SQL query request sent by a client, the data requested to be processed by the SQL query request is obtained from the ClickHouse, the obtained data is used as the data to be processed, the adjacent duplicate removal function source codes in the source codes of the ClickHouse are called, and the adjacent duplicate removal process is executed, wherein the adjacent duplicate removal process is as follows: and performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data, and returning the adjacent duplicate removed data to the client.

And S43, receiving the adjacent deduplicated data returned by the server.

And the client receives the adjacent deduplicated data returned by the server, and can actively push the data to the user or provide the user for acquiring the adjacent deduplicated data.

Next, a data processing apparatus provided in the present application will be described, and the data processing apparatus described below and the data processing method described above may be referred to in correspondence with each other.

Referring to fig. 5, the data processing apparatus is applied to a server, where the server adds adjacent source codes of a deduplication function to a source code of a clickwouse, and the data processing apparatus includes: a query request receiving module 11, an adjacent duplicate removal calculation module 12 and a return module 13.

The query request receiving module 11 is configured to receive an SQL query request sent by a client;

the adjacent duplicate removal calculation module 12 is configured to obtain, from the clickwouse, the data requested to be processed by the SQL query request, use the obtained data as data to be processed, call an adjacent duplicate removal function source code in the source code of the clickwouse, and execute an adjacent duplicate removal process, where the adjacent duplicate removal process is: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;

and a returning module 13, configured to return the neighboring deduplicated data to the client.

In this embodiment, the adjacent duplicate removal calculation module 12 may be specifically configured to:

respectively corresponding to each array, and constructing a list;

if not, writing the data to be written into the list;

if the data to be written are the same, discarding the data to be written;

respectively judging whether unselected data exist in each array;

In this embodiment, the neighboring duplicate removal calculation module 12 may be specifically configured to:

constructing a list;

if not, writing the data to be written into the list;

if the data to be written are the same, discarding the data to be written;

judging whether the sorted data to be processed has unselected data;

if yes, returning to the step of selecting one data which is not selected from the sorted data to be processed in sequence as data to be written;

In another embodiment of the present application, there is provided a data processing apparatus applied to a client, the apparatus including:

In another embodiment of the present application, there is provided a server including:

a memory for storing a program;

the processor is configured to run the program, and when the processor runs the program, the processor implements the data processing method as described in embodiment 1.

In another embodiment of the present application, there is provided a client comprising:

a memory for storing a program;

the processor is configured to run the program, and when the processor runs the program, the processor implements the data processing method as described in embodiment 2.

It should be noted that each embodiment is mainly described as a difference from the other embodiments, and the same and similar parts between the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and reference may be made to the partial description of the method embodiment for relevant points.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The data processing method, the data processing device, the server and the client provided by the application are introduced in detail, a specific example is applied in the description to explain the principle and the implementation of the application, and the description of the above embodiment is only used to help understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A data processing method is applied to a server, the server adds adjacent deduplication function source codes in a ClickHouse source code, and the method comprises the following steps:

receiving an SQL query request sent by a client;

returning the data after the adjacent duplicate removal to the client;

the performing adjacent duplicate removal calculation on the data to be processed to obtain the data after adjacent duplicate removal includes: grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays; sorting the data in each array according to the data generation time to obtain sorted data; respectively corresponding to each array, and constructing a list; selecting one of the unselected data sequenced in the arrays in sequence respectively as data to be written; respectively checking whether the latest written data in each list is the same as the data to be written; if not, writing the data to be written into the list; if the data are the same, discarding the data to be written; respectively judging whether unselected data exist in each array; if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written; and if not, taking the data in the lists as the data after adjacent deduplication.

2. A data processing method is applied to a client, and comprises the following steps:

acquiring an SQL query request;

receiving the adjacent deduplicated data returned by the server;

wherein, the performing adjacent duplicate removal calculation on the data to be processed to obtain the data after adjacent duplicate removal includes: grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays; sorting the data in each array according to the data generation time to obtain sorted data; respectively corresponding to each array, and constructing a list; selecting one of the unselected data sequenced in the arrays in sequence respectively as data to be written; respectively checking whether the latest written data in each list is the same as the data to be written; if not, writing the data to be written into the list; if the data to be written are the same, discarding the data to be written; respectively judging whether unselected data exist in each array; if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written; and if not, taking the data in the lists as the data after adjacent deduplication.

3. A data processing apparatus, applied to a server that adds an adjacent deduplication function source code to a clickwouse source code, the apparatus comprising:

the adjacent duplicate removal calculation module is used for acquiring the data requested to be processed by the SQL query request from the ClickHouse, using the acquired data as the data to be processed, calling an adjacent duplicate removal function source code in the source code of the ClickHouse, and executing an adjacent duplicate removal process, wherein the adjacent duplicate removal process is as follows: performing adjacent duplicate removal calculation on the data to be processed to obtain adjacent duplicate removed data;

a returning module, configured to return the data after the adjacent deduplication to the client;

the adjacent duplicate removal calculation module is specifically configured to: grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays; sorting the data in each array according to the data generation time to obtain sorted data; respectively corresponding to each array, and constructing a list; selecting one of the unselected data in each array in sequence as data to be written; respectively checking whether the latest written data in each list is the same as the data to be written; if not, writing the data to be written into the list; if the data to be written are the same, discarding the data to be written; respectively judging whether unselected data exist in each array; if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written; and if not, taking the data in the lists as the data after adjacent deduplication.

4. A data processing apparatus, applied to a client, the apparatus comprising:

the data receiving module is used for receiving the adjacent deduplicated data returned by the server;

the performing adjacent duplicate removal calculation on the data to be processed to obtain the data after adjacent duplicate removal includes: grouping the data to be processed according to different grouping conditions to obtain a plurality of different arrays; sorting the data in each array according to the data generation time to obtain sorted data; respectively corresponding to each array, and constructing a list; selecting one of the unselected data sequenced in the arrays in sequence respectively as data to be written; respectively checking whether the latest written data in each list is the same as the data to be written; if not, writing the data to be written into the list; if the data to be written are the same, discarding the data to be written; respectively judging whether unselected data exist in each array; if yes, returning to the step of executing the step of selecting one of the unselected data in each array in sequence as the data to be written; and if not, taking the data in the lists as the data after adjacent deduplication.

5. A server, comprising:

a memory for storing a program;

the processor for executing the program, the processor implementing the data processing method of claim 1 when the processor executes the program.

6. A client, comprising:

a memory for storing a program;

the processor for executing the program, the processor implementing the data processing method of claim 2 when the processor executes the program.