CN112436943B - Request deduplication method, device, equipment and storage medium based on big data - Google Patents

Request deduplication method, device, equipment and storage medium based on big data Download PDF

Info

Publication number
CN112436943B
CN112436943B CN202011186553.2A CN202011186553A CN112436943B CN 112436943 B CN112436943 B CN 112436943B CN 202011186553 A CN202011186553 A CN 202011186553A CN 112436943 B CN112436943 B CN 112436943B
Authority
CN
China
Prior art keywords
value
conversion
request
processing
same
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011186553.2A
Other languages
Chinese (zh)
Other versions
CN112436943A (en
Inventor
王玥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Liding Material Technology Co.,Ltd.
Original Assignee
Nanyang Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanyang Institute of Technology filed Critical Nanyang Institute of Technology
Priority to CN202011186553.2A priority Critical patent/CN112436943B/en
Publication of CN112436943A publication Critical patent/CN112436943A/en
Application granted granted Critical
Publication of CN112436943B publication Critical patent/CN112436943B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3236Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
    • H04L9/3239Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions involving non-keyed hash functions, e.g. modification detection codes [MDCs], MD5, SHA or RIPEMD
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0807Network architectures or network communication protocols for network security for authentication of entities using tickets, e.g. Kerberos
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0643Hash functions, e.g. MD5, SHA, HMAC or f9 MAC

Abstract

The embodiment of the application discloses a request duplicate removal method, a request duplicate removal device, request duplicate removal equipment and a storage medium based on big data, which belong to the technical field of big data processing; encrypting all the acquired request addresses, and sending the encrypted values to the kafka cluster; converting all the obtained different written contents, and sending a conversion value to the kafka cluster; dividing different receiving tokens by the Kafka cluster according to different encryption values; writing a conversion value and a time stamp corresponding to the newly received encryption value into the receiving partition for the newly received request based on the judgment condition; and sending the transformation value partitioned by the kafka cluster to a deduplication model according to a partition unit, acquiring a latest transformation value based on a judgment condition, performing anti-code processing on the encryption value and the transformation value respectively, and sending the processed content to a processing layer. The application is beneficial to relieving the pressure of repeated request data on the processor, and the service life of the processor is prolonged to a certain extent.

Description

Request deduplication method, device, equipment and storage medium based on big data
Technical Field
The present application relates to the field of big data processing technologies, and in particular, to a request deduplication method, apparatus, device, and storage medium based on big data.
Background
Currently, with the development of internet science and technology, more and more users are used, and a programmer is required to consider the repeated request condition in a big data state to avoid causing a high concurrency problem.
In practical application, if the speed-limiting and current-limiting processing is performed blindly, the problem that the request processing is not timely enough is often caused, a poor experience effect is brought to a user, the processing performance of the cpu is pursued to be improved, and the problem of high processor pressure when repeated data requests cannot be fundamentally solved. Therefore, when the request processing is carried out in the prior art, the problems that the repeated data request processing is not timely enough and the processor pressure is overlarge are caused.
Disclosure of Invention
An object of the embodiments of the present application is to provide a request deduplication method, apparatus, device and storage medium based on big data, so as to solve the problems that duplicate data request processing is not timely enough and a processor is under excessive pressure when performing request processing in the prior art.
In order to solve the above technical problem, an embodiment of the present application provides a request deduplication method based on big data, which adopts the following technical solutions:
a big data-based request deduplication method comprises the following steps:
acquiring concurrent write requests sent by a plurality of users, analyzing request contents, and acquiring request addresses and write contents of different users;
based on the MD5 encryption algorithm, encrypting all the acquired request addresses to acquire an encrypted value, and sending the encrypted value to the kafka cluster;
converting all the obtained different written contents based on a preset conversion model to obtain a conversion value, and sending the conversion value to the kafka cluster;
the Kafka cluster distinguishes based on the received encryption value, and different receiving tokens are divided according to different encryption values;
judging whether a history encrypted value is consistent with the encrypted value of the newly received request, if so, writing a converted value and a timestamp corresponding to the newly received encrypted value into a receiving partition, otherwise, establishing the receiving partition based on the newly received encrypted value;
and sending the transformation value partitioned by the kafka cluster to a deduplication model according to a partition unit, judging whether the same content of the transformation value exists in the same partition unit, if so, obtaining the latest transformation value based on the timestamp, deleting other same transformation values, performing anti-code processing on the encryption value and the transformation value based on a preset decoding model, and sending the processed content to a processing layer.
Further, the big data based request deduplication method includes:
converting the written content based on an ASCII encoding format, and respectively converting different characters in the written content into character strings in the ASCII format;
the method comprises the steps of carrying out binary conversion on a character string converted into an ASCII format, carrying out complement and segmentation on a binary value, adding a 0 complement to 24 bits before the first bit, segmenting the value subjected to complement into two character strings of 16 bits and 8 bits, carrying out hexadecimal processing on the segmented 16-bit character string to obtain a first conversion value, and carrying out octal processing on the segmented 8-bit character string to obtain a second conversion value.
Further, the big data based request deduplication method includes:
and acquiring the current time, preprocessing the current time, and converting the current time into a preset time format.
Further, the method for requesting deduplication based on big data, writing the translation value and the timestamp corresponding to the newly received encryption value into the receiving partition includes:
and judging whether the conversion value in the receiving partition is the same as the newly written conversion value, if so, directly obtaining the historical conversion value, and if not, writing the newly obtained conversion value into the receiving partition.
Further, the method for requesting deduplication based on big data, wherein the determining whether the conversion value in the receiving partition is the same as the conversion value newly written in the receiving partition includes:
and judging whether the first conversion values are the same, if so, judging the second conversion values, if so, judging that the second conversion values are the same, and otherwise, judging that the conversion values are different.
Further, the big data-based request deduplication method, wherein obtaining a conversion value based on a timestamp includes:
and comparing the timestamps corresponding to the same conversion value to obtain a conversion value I and a conversion value II when the timestamps are the maximum.
Further, in the big data-based request deduplication method, performing, based on a preset decoding model, the inverse code processing on the encrypted value and the transformed value respectively includes:
performing md5 decryption processing on the encrypted value to obtain a request address;
and respectively carrying out binary processing on the first conversion value and the second conversion value, carrying out ASCII code conversion processing on the generated binary values to obtain ASCII codes, and carrying out character string conversion processing on the ASCII codes to obtain processing results.
In order to solve the above technical problem, an embodiment of the present application further provides a request deduplication device based on big data, which adopts the following technical solution:
a big-data based request deduplication appliance, comprising:
the request acquisition module is used for acquiring concurrent write requests sent by a plurality of users, analyzing request contents and acquiring request addresses and write contents of different users;
the address encryption module is used for encrypting all the acquired request addresses based on an MD5 encryption algorithm to acquire an encrypted value and sending the encrypted value to the kafka cluster;
the content conversion module is used for converting all the obtained different written contents based on a preset conversion model to obtain a conversion value and sending the conversion value to the kafka cluster;
the receiving token generating module is used for the Kafka cluster to distinguish based on the received encrypted values and divide different receiving tokens according to different encrypted values;
the receiving partition determining module is used for judging whether the historical encrypted value is consistent with the encrypted value of the newly received request, if so, writing a converted value and a time stamp corresponding to the newly received encrypted value into the receiving partition, otherwise, establishing the receiving partition based on the newly received encrypted value;
and the request deduplication module is used for sending the conversion value partitioned by the kafka cluster to the deduplication model according to the partition unit, judging whether the same content with the same conversion value exists in the same partition unit, if so, obtaining the latest conversion value based on the timestamp, deleting other same conversion values, performing anti-code processing on the encryption value and the conversion value based on a preset decoding model, and sending the processed content to the processing layer.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
a computer device includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of a big data based request deduplication method proposed in the embodiment of the present application when executing the computer program.
In order to solve the above technical problem, an embodiment of the present application further provides a nonvolatile computer-readable storage medium, which adopts the following technical solutions:
a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a big data based request deduplication method as set forth in an embodiment of the present application.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
the embodiment of the application discloses a request duplicate removal method, a request duplicate removal device, request duplicate removal equipment and a storage medium based on big data, wherein request addresses and write-in contents of different users are obtained; encrypting all the acquired request addresses, and sending the encrypted values to the kafka cluster; converting all the obtained different written contents, and sending a conversion value to the kafka cluster; dividing different receiving tokens by the Kafka cluster according to different encryption values; writing a conversion value and a time stamp corresponding to the newly received encryption value into the receiving partition for the newly received request based on the judgment condition; and sending the transformation value partitioned by the kafka cluster to a duplicate removal model according to a partition unit, acquiring a latest transformation value based on a judgment condition, performing inverse code processing on the encryption value and the transformation value respectively based on a preset decoding model, and sending the processed content to a processing layer. By the method, the same request content is searched from the continuously generated requests, and only the latest request is reserved through de-duplication, so that the processor can be prevented from processing repeated requests.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is a diagram of an exemplary system architecture to which embodiments of the present application may be applied;
FIG. 2 is a flowchart of an embodiment of a big data based request deduplication method described in the embodiments of the present application;
FIG. 3 is a diagram illustrating request content parsing in an embodiment of the present application;
FIG. 4 is a schematic diagram of binary conversion and complement of an ASCII format string in an embodiment of the present application;
fig. 5 is a schematic diagram illustrating splitting and performing preset binary conversion on a complemented binary value in the embodiment of the present application;
FIG. 6 is a flow chart of determining and de-duplicating repeat requests in an embodiment of the present application;
FIG. 7 is a block diagram illustrating an embodiment of a big data based request deduplication apparatus according to an embodiment of the present application;
FIG. 8 is a block diagram illustrating a structure of a request acquisition module according to an embodiment of the present application;
FIG. 9 is a schematic structural diagram of a content transformation module in an embodiment of the present application;
fig. 10 is a schematic structural diagram of an embodiment of a computer device in the embodiment of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that the big data based request deduplication method provided in the embodiment of the present application is generally executed by a server/terminal device, and accordingly, a big data based request deduplication apparatus is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
With continued reference to FIG. 2, a flowchart of one embodiment of a big data based request deduplication method of the present application is shown, the big data based request deduplication method comprising the following steps:
step 201, obtaining concurrent write requests sent by a plurality of users, analyzing the request content, and obtaining request addresses and write contents of different users. In this embodiment, the parsing the request content includes: and analyzing the request content into two parts, namely a request address part and a write content part, based on a preset analysis field. For example: the request content is http:// www.lytu.edu.cn:80/chpage/index. Htmlstr = abc # a1, and with "str =" as the identification field, the part before the first "str =" is obtained as is http:// www.lytu.edu.cn:80/chpage/index. Html? "is the request address, and the part" abc # a1 "after the first" str = "is acquired is the write content.
Specifically referring to fig. 3, fig. 3 is a schematic diagram of request content parsing in the embodiment of the present application, where 301 shows request content of a user, 302 shows a character string according to which the request content is parsed, 303 shows a request address portion obtained after the request content is parsed, and 304 shows a write content portion obtained after the request content is parsed.
And step 202, encrypting all the acquired request addresses based on the MD5 encryption algorithm to acquire an encrypted value, and sending the encrypted value to the kafka cluster.
In this embodiment, the encryption algorithm based on MD5 encrypts all the acquired request addresses, for example: the request address part is: http:// www.lytu.edu.cn:80/chpage/index. Html? The request address part is encrypted to obtain a 16-bit lower case encrypted value of 7bd9f4d7a83541b0.
In this embodiment, the encrypted value is sent to the kafka cluster, for example, the encrypted value 063dc83e6e14423a, and the encrypted value is sent to the kafka cluster.
And step 203, converting all the obtained different written contents based on a preset conversion model, obtaining a conversion value, and sending the conversion value to the kafka cluster.
In this embodiment, the conversion values include: converting the written content based on an ASCII encoding format, and respectively converting different characters in the written content into character strings in the ASCII format; binary conversion is carried out on the character string converted into the ASCII format, the binary value is subjected to complement and division, 0 complement is added to 24 bits before the first bit, the value after complement is divided into two character strings of 16 bits and 8 bits, hexadecimal processing is carried out on the divided 16-bit character string to obtain a first conversion value, and octal processing is carried out on the divided 8-bit character string to obtain a second conversion value.
The written content is converted based on the ASCII encoding format, and different characters in the written content are converted into character strings in ASCII format, for example: the corresponding ASCII code of the single Chinese character is 29579; the single Chinese character rain corresponds to an ASCII code 38632; the number "1" corresponds to 49 ASCII code, and the request content is converted into ASCII code string by means of ASCII code conversion for the written request. For example: the content of the request is https:// www.baidu.com ", and then the whole request is converted into an ASCII encoding format, and an ASCII value corresponding to each character is obtained, namely: "104", "116", "112", "115", "58", "47", "119", "46", "98", "97", "105", "100", "117", "46", "99", "111", "109".
The binary conversion of the character string converted into ASCII format, for example, "104" and "116" described above, then "104" is converted into binary "1101000", "116" is converted into binary "1110100", and "38632" is converted into binary "1001011011101000".
The complement to the binary value, for example: the binary system corresponding to "104" is "1101000", and its complement is 24 "000000000000000001101000"; the binary number corresponding to "116" is "1110100", and its complement is "000000000000000001110100" at 24; the binary value for "38632" is "1001011011101000", and its complement to 24 is "000000001001011011101000".
Referring to fig. 4 in particular, fig. 4 is a schematic diagram of binary conversion and complement performed on a character string in ASCII format in the embodiment of the present application, where 401 shows the character string in ASCII format, 402 shows binary values after binary conversion, and 403 shows binary values after complement.
The binary value is segmented, for example: the binary system corresponding to the 104 is 1101000, the complementary bit is changed to 24 to be 000000000000000001101000, then 16 bits and 8 bits are split, and the following two character strings of 0000000000000000 and 01101000 are obtained; the binary system corresponding to the '116' is '1110100', the complementary bit is changed to 24 to be '000000000000000001110100', then 16 bits and 8 bits are split, and the following two character strings '0000000000000000' and '01110100' are obtained; the binary system corresponding to the 38632 is 1001011011101000, the complement bit is 24 to be 000000001001011011101000, and then 16-bit and 8-bit splitting is performed to obtain the following two character strings of 0000000010010110 and 11101000.
The hexadecimal processing is performed on the segmented 16-bit string to obtain a first conversion value, and the octal processing is performed on the segmented 8-bit string to obtain a second conversion value, for example: the above-mentioned "0000000000000000" and "01101000" obtain the conversion value one for "0000000000000000" to make sixteen system conversion into "0", and the conversion value two for "01101000" to make eight system conversion into "150"; the above-mentioned "0000000000000000" and "01110100" obtain the conversion value one for "0000000000000000" to make sixteen system conversion into "0", and the conversion value two for "01110100" to make eight system conversion into "164"; the transformation values of 0000000010010110 and 11101000 are obtained as the first transformation value of 0000000010010110 and then converted into 96 in sixteen systems, and the transformation value of 11101000 is obtained as the second transformation value of 350 in eight systems.
Referring to fig. 5 in particular, fig. 5 is a schematic diagram of splitting and performing preset binary conversion on a complemented binary value in the embodiment of the present application, where 501 shows the complemented binary value, 502 shows a 16-bit binary value after splitting the binary value, 503 shows a 8-bit binary value after splitting, 504 shows a first converted value after hexadecimal conversion, and 505 shows a second converted value after octal conversion.
Step 204, the kafka cluster distinguishes based on the received cryptographic value, and divides the different received tokens according to the different cryptographic values.
In this embodiment, the Kafka cluster distinguishes based on the received encrypted value, and divides different receiving tokens according to different encrypted values, for example: when the Kafka cluster processes concurrent requests, the requests can be written into different processing partitions, different receiving tokens are divided according to encryption values in the application, namely the requests with the same encryption value are used as similar requests and are divided into a request class.
Step 205, for the newly received request, it is determined whether there is a historical encrypted value that is consistent with the encrypted value, if so, the transformed value and the timestamp corresponding to the newly received encrypted value are written into the receiving partition, otherwise, the receiving partition is established based on the newly received encrypted value.
In this embodiment, the time stamp includes: and acquiring the current time, preprocessing the current time, and converting the current time into a preset time format.
For example: the current time is obtained as follows: 12, preprocessing the acquired time format, assuming that the preset time format is "hhmmss", going ": "after processing, the current time is" 120156".
In this embodiment, the writing the translation value and the timestamp corresponding to the newly received encryption value into the receiving partition includes: and judging whether the conversion value in the receiving partition is the same as the newly written conversion value, if so, directly acquiring a historical conversion value, and otherwise, writing the newly acquired conversion value into the receiving partition.
Explanation: if the conversion value in the receiving partition is the same as the newly written conversion value, the newly written request and the history request are the same request, the conversion value corresponding to the history request is directly obtained, and if the conversion value in the receiving partition is not the same as the newly written conversion value, the newly obtained conversion value is written into the receiving partition.
And step 206, sending the transformation value partitioned by the kafka cluster to a deduplication model according to a partition unit, judging whether the same content of the transformation value exists in the same partition unit, if so, obtaining the latest transformation value based on the timestamp, deleting other same transformation values, performing anti-code processing on the encryption value and the transformation value based on a preset decoding model, and sending the processed content to a processing layer.
In this embodiment, if there are identical requests in similar time periods, the latest request is compared directly by means of the timestamp, and the request is sent to the processing layer, so as to reduce the number of repeated requests.
In this embodiment, the determining whether the translation value in the receiving partition is the same as the newly written translation value includes: judging whether the conversion values are the same, if so, judging whether the conversion values are the same, and if not, judging whether the conversion values are the same, if so, judging that the conversion values are the same, and otherwise, judging that the conversion values are different.
Explanation: when judging whether the conversion values are the same, firstly, judging the first conversion value, if the first conversion value is the same, then judging the second conversion value, and if the first conversion value and the second conversion value are the same, indicating that the conversion values are the same.
In this embodiment, the obtaining the conversion value based on the timestamp includes: and comparing the timestamps corresponding to the same conversion value to obtain a conversion value I and a conversion value II when the timestamps are the maximum.
Explanation: if the translation values are the same, the format for which the time has been determined is translated to "hhmmss" format based on the timestamp comparison request time, e.g., at 120156 and 120159, the translation values are the same, and then the corresponding translation value one and translation value two at the timestamp of 120159 are obtained 4736.
In this embodiment, the performing, based on the preset decoding model, the inverse coding on the encrypted value and the transformed value respectively includes: performing md5 decryption processing on the encrypted value to obtain a request address; and respectively carrying out binary processing on the first conversion value and the second conversion value, carrying out ASCII code conversion processing on the generated binary values to obtain ASCII codes, and carrying out character string conversion processing on the ASCII codes to obtain processing results.
Performing md5 decryption on the encrypted value to obtain a request address, for example: the 16-bit small-case encrypted value is 7bd9f4d7a83541b0, and the encrypted value is decrypted, and the request address part is obtained by: http:// www.lytu.edu.cn:80/chpage/index. Html? .
The binary processing is performed on the first conversion value and the second conversion value respectively, the generated binary value is converted into an ASCII code, ASCII code is obtained, then character string conversion is performed on the ASCII code, and a processing result is obtained, for example: if the first hexadecimal conversion value is '96' and the second octal conversion value is '350', the first hexadecimal conversion value and the second hexadecimal conversion value are respectively subjected to binary conversion, the obtained binary values are '0000000010010110' and '11101000', the obtained binary values are subjected to character string splicing to form '000000001001011011101000', then decimal conversion is carried out to obtain an ASCII code of '38632', then Chinese conversion processing is carried out on '38632', and the content specifically requested by the user is 'rain'.
Specifically referring to fig. 6, fig. 6 is a flowchart of the determination and deduplication of the repeated request in the embodiment of the present application, where the flowchart shows that the request partition determination is performed through the encrypted value, if partitions with the same encrypted value already exist, that is, the request addresses are the same, it is directly determined whether a translation value one of the new request is the same as a certain translation value one in the partitions of the encrypted value, if the translation value two is the same, the translation value two is determined, if the translation value two is also the same, the translation value is subjected to the inverse code processing, the inverse code result of the latest time is obtained based on the timestamp, and the result after the inverse code is sent to the processing layer, that is, the request content originally sent by the user.
The request deduplication method based on big data in the embodiment of the application can acquire request addresses and write-in contents of different users; encrypting all the acquired request addresses, and sending the encrypted values to the kafka cluster; converting all the obtained different written contents, and sending a conversion value to the kafka cluster; dividing different receiving tokens by the Kafka cluster according to different encryption values; writing a conversion value and a time stamp corresponding to the newly received encryption value into the receiving partition for the newly received request based on the judgment condition; and sending the transformation value partitioned by the kafka cluster to a deduplication model according to a partition unit, acquiring a latest transformation value based on a judgment condition, performing anti-code processing on the encryption value and the transformation value respectively based on a preset decoding model, and sending the processed content to a processing layer. By the method, the same request content is searched from the continuously generated requests, and only the latest request is reserved through de-duplication, so that the processor can be prevented from processing repeated requests.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 7, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a big data based request deduplication device, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device may be applied to various electronic devices.
As shown in fig. 7, the big-data-based request deduplication device 7 according to the present embodiment includes: a request acquisition module 701, an address encryption module 702, a content conversion module 703, a reception token generation module 704, a reception partition determination module 705, and a request deduplication module 706. Wherein:
a request obtaining module 701, configured to obtain concurrent write requests sent by multiple users, analyze request contents, and obtain request addresses and write contents of different users;
the address encryption module 702 is configured to encrypt all the acquired request addresses based on an MD5 encryption algorithm to acquire an encrypted value, and send the encrypted value to the kafka cluster;
the content conversion module 703 is configured to convert all the obtained different written contents based on a preset conversion model to obtain a conversion value, and send the conversion value to the kafka cluster;
the receiving token generating module 704 is used for the Kafka cluster to distinguish based on the received encrypted values and divide different receiving tokens according to different encrypted values;
a receiving partition determining module 705, configured to determine, for a newly received request, whether a historical encrypted value is consistent with the encrypted value of the request, if so, write a transformed value and a timestamp corresponding to the newly received encrypted value into a receiving partition, otherwise, establish the receiving partition based on the newly received encrypted value;
and the request deduplication module 706 is configured to send the transformation value partitioned by the kafka cluster to the deduplication model according to the partition unit, determine whether content with the same transformation value exists in the same partition unit, obtain the latest transformation value based on the timestamp if the content with the same transformation value exists in the same partition unit, delete other same transformation values, perform inverse code processing on the encryption value and the transformation value based on a preset decoding model, and send the processed content to the processing layer.
In some embodiments of the present application, as shown in fig. 8, fig. 8 is a schematic structural diagram of a request obtaining module in an embodiment of the present application, where the request obtaining module 701 includes a request address obtaining unit 701a and a write content obtaining unit 701b.
In some embodiments of the present application, the request address obtaining unit 701a is configured to split the request content based on a preset splitting character string, so as to split a request address portion of the request content.
In some embodiments of the present application, the written content obtaining unit 701b is configured to split the request content based on a preset splitting character string, so as to split a written content portion of the request content.
In some embodiments of the present application, the address encryption module 702 is configured to perform an encryption operation on the request address after the request content is split based on an MD5 encryption algorithm, and convert the request address into an MD5 value in a preset format.
In some embodiments of the present application, as shown in fig. 9, fig. 9 is a schematic structural diagram of a content conversion module in the embodiments of the present application, where the content conversion module 703 includes an ASCII code conversion unit 703a, a conversion value conversion unit 703b, and a conversion value conversion unit 703c.
In some embodiments of the present application, the ASCII code conversion unit 703a is configured to perform ASCII conversion on each character in the written content based on a preset ASCII conversion algorithm, that is, convert the written content based on an ASCII encoding format, and convert different characters in the written content into character strings in ASCII format, respectively.
In some embodiments of the present application, the conversion value conversion unit 703b is configured to perform binary conversion on the character string converted into ASCII format, complement and divide a binary value, add 0 complement to 24 bits before the first bit, divide the complemented value into two character strings of 16 bits and 8 bits, and perform hexadecimal processing on the divided 16-bit character string to obtain a conversion value one.
In some embodiments of the present application, the conversion value two conversion unit 703c is configured to perform octal processing on the segmented 8-bit string to obtain a conversion value two.
In some embodiments of the present application, when sending the user request to the kafka distribution cluster, the receive token generation module 704 encrypts a value of the request address as distribution partitions in the kafka cluster, where the partitions are topics, and each topic corresponds to a homogeneous request.
In some embodiments of the present application, the receiving partition determining module 705 is configured to obtain current time when identifying the order of the request, pre-process the current time, convert the current time into a preset time format, obtain a timestamp, and identify the order of the request according to the timestamp.
In some embodiments of the present application, when writing the translation value and the timestamp corresponding to the newly received encryption value into the receiving partition, the receiving partition determining module 705 is further configured to determine whether the translation value in the receiving partition is the same as the newly written translation value, and if so, directly obtain the historical translation value, otherwise, write the newly obtained translation value into the receiving partition.
In some embodiments of the present application, the receiving partition determining module 705, when determining whether the transformation value is the same as the newly written transformation value in the receiving partition, is further configured to first determine whether the transformation value is the same, if the transformation value is the same, then perform a second determination on the transformation value, if the transformation value is the same as the second transformation value, then the transformation value is the same, and otherwise, the transformation value is different.
In some embodiments of the subject application, the request deduplication module 706, when obtaining the transformation value based on the timestamp, is configured to compare timestamps corresponding to the same transformation value, and obtain a transformation value one and a transformation value two when the timestamps are the largest.
In some embodiments of the present application, the request deduplication module 706 is configured to perform md5 decryption processing on the encrypted value to obtain a request address when performing the decryption processing on the encrypted value and the transformed value respectively based on a preset decoding model; and the binary processing module is also used for respectively carrying out binary processing on the first conversion value and the second conversion value, carrying out ASCII code conversion processing on the generated binary values, acquiring ASCII codes, and carrying out character string conversion processing on the ASCII codes to acquire processing results.
The request deduplication device based on big data obtains request addresses and write-in contents of different users; encrypting all the acquired request addresses, and sending the encrypted values to the kafka cluster; converting all the obtained different written contents, and sending a conversion value to the kafka cluster; dividing different receiving tokens by the Kafka cluster according to different encryption values; writing a conversion value and a time stamp corresponding to the newly received encryption value into the receiving partition for the newly received request based on the judgment condition; and sending the transformation value partitioned by the kafka cluster to a duplicate removal model according to a partition unit, acquiring a latest transformation value based on a judgment condition, performing inverse code processing on the encryption value and the transformation value respectively based on a preset decoding model, and sending the processed content to a processing layer. By the method, the same request content is searched from the continuously generated requests, and only the latest request is reserved through de-duplication, so that the processor can be prevented from processing repeated requests.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 10, fig. 10 is a block diagram of a basic structure of a computer device according to the embodiment.
The computer device 10 includes a memory 10a, a processor 10b, and a network interface 10c, which are communicatively connected to each other via a system bus. It should be noted that only a computer device 10 having components 10a-10c is shown, but it should be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 10a includes at least one type of readable storage medium including flash memory, hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the storage 10a may be an internal storage unit of the computer device 10, such as a hard disk or a memory of the computer device 10. In other embodiments, the memory 10a may also be an external storage device of the computer device 10, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 10. Of course, the memory 10a may also include both an internal storage unit and an external storage device of the computer device 10. In this embodiment, the memory 10a is generally used for storing an operating system installed in the computer device 10 and various types of application software, such as program codes of a big data based request deduplication method. Further, the memory 10a may also be used to temporarily store various types of data that have been output or are to be output.
The processor 10b may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or other data Processing chip in some embodiments. The processor 10b is generally used to control the overall operation of the computer device 10. In this embodiment, the processor 10b is configured to execute the program code stored in the memory 10a or process data, for example, execute the program code of the big data based request deduplication method.
The network interface 10c may comprise a wireless network interface or a wired network interface, and the network interface 10c is generally used for establishing communication connections between the computer device 10 and other electronic devices.
The present application further provides another embodiment, which is to provide a non-transitory computer-readable storage medium storing a big-data-based request deduplication program, which is executable by at least one processor to cause the at least one processor to perform the steps of the big-data-based request deduplication method as described above.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It should be understood that the above-described embodiments are merely exemplary of some, and not all, embodiments of the present application, and that the drawings illustrate preferred embodiments of the present application without limiting the scope of the claims appended hereto. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (9)

1. A big data-based request deduplication method is characterized by comprising the following steps:
acquiring concurrent write requests sent by a plurality of users, analyzing request contents, and acquiring request addresses and write contents of different users;
based on the MD5 encryption algorithm, encrypting all the acquired request addresses to acquire an encrypted value, and sending the encrypted value to the kafka cluster;
converting all the obtained different written contents based on a preset conversion model to obtain a conversion value, and sending the conversion value to the kafka cluster; the conversion values include: converting the written content based on an ASCII coding format, and respectively converting different characters in the written content into character strings in the ASCII format; binary conversion is carried out on the character string converted into the ASCII format, the binary value is subjected to complement and division, 0 complement is added to 24 bits before the first bit, the value after complement is divided into two character strings of 16 bits and 8 bits, the divided 16 bit character string is subjected to hexadecimal processing to obtain a first conversion value, and the divided 8 bit character string is subjected to octal processing to obtain a second conversion value;
the Kafka cluster distinguishes based on the received encryption value, and different receiving tokens are divided according to different encryption values;
judging whether a history encrypted value is consistent with the encrypted value of the newly received request, if so, writing a converted value and a timestamp corresponding to the newly received encrypted value into a receiving partition, otherwise, establishing the receiving partition based on the newly received encrypted value;
and sending the transformation value partitioned by the kafka cluster to a deduplication model according to a partition unit, judging whether the same content of the transformation value exists in the same partition unit, if so, obtaining the latest transformation value based on the timestamp, deleting other same transformation values, performing anti-code processing on the encryption value and the latest transformation value based on a preset decoding model, and sending the processed content to a processing layer.
2. The big-data-based request deduplication method of claim 1, wherein the timestamp comprises:
and acquiring the current time, preprocessing the current time, and converting the current time into a preset time format.
3. The big-data-based request deduplication method of claim 2, wherein writing the translation value and the timestamp corresponding to the newly received cryptographic value into the receiving partition comprises:
and judging whether the conversion value in the receiving partition is the same as the newly written conversion value, if so, directly obtaining the historical conversion value, and if not, writing the newly obtained conversion value into the receiving partition.
4. The big-data based request deduplication method as claimed in claim 3, wherein the determining whether the translation value in the receiving partition is the same as the newly written translation value comprises:
judging whether the conversion values are the same, if so, judging whether the conversion values are the same, and if not, judging whether the conversion values are the same, if so, judging that the conversion values are the same, and otherwise, judging that the conversion values are different.
5. The big-data-based request deduplication method of claim 4, wherein the obtaining a conversion value based on a timestamp comprises:
and comparing the timestamps corresponding to the same conversion value to obtain a conversion value I and a conversion value II when the timestamps are the maximum.
6. The big-data-based request deduplication method of claim 5, wherein the respectively performing the anti-code processing on the encrypted value and the transformed value based on a preset decoding model comprises:
performing md5 decryption processing on the encrypted value to obtain a request address;
and respectively carrying out binary processing on the first conversion value and the second conversion value, carrying out ASCII code conversion processing on the generated binary values to obtain ASCII codes, and carrying out character string conversion processing on the ASCII codes to obtain processing results.
7. A big data based request deduplication apparatus, comprising:
the request acquisition module is used for acquiring concurrent write requests sent by a plurality of users, analyzing request contents and acquiring request addresses and write contents of different users;
the address encryption module is used for encrypting all the acquired request addresses based on an MD5 encryption algorithm to acquire an encrypted value and sending the encrypted value to the kafka cluster;
the content conversion module is used for converting all the obtained different written contents based on a preset conversion model to obtain a conversion value and sending the conversion value to the kafka cluster;
the receiving token generating module is used for the Kafka cluster to distinguish based on the received encrypted values and divide different receiving tokens according to different encrypted values;
the receiving partition determining module is used for judging whether the historical encrypted value is consistent with the encrypted value of the newly received request, if so, writing a converted value and a time stamp corresponding to the newly received encrypted value into the receiving partition, otherwise, establishing the receiving partition based on the newly received encrypted value;
and the request deduplication module is used for sending the conversion value partitioned by the kafka cluster to the deduplication model according to the partition unit, judging whether the same content with the same conversion value exists in the same partition unit, if so, obtaining the latest conversion value based on the timestamp, deleting other same conversion values, performing anti-code processing on the encryption value and the conversion value based on a preset decoding model, and sending the processed content to the processing layer.
8. A computer device comprising a memory having a computer program stored therein and a processor that when executed implements the steps of the big-data based request deduplication method of any one of claims 1 through 6.
9. A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the big data based request deduplication method as recited in any one of claims 1 through 6.
CN202011186553.2A 2020-10-29 2020-10-29 Request deduplication method, device, equipment and storage medium based on big data Active CN112436943B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011186553.2A CN112436943B (en) 2020-10-29 2020-10-29 Request deduplication method, device, equipment and storage medium based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011186553.2A CN112436943B (en) 2020-10-29 2020-10-29 Request deduplication method, device, equipment and storage medium based on big data

Publications (2)

Publication Number Publication Date
CN112436943A CN112436943A (en) 2021-03-02
CN112436943B true CN112436943B (en) 2022-11-08

Family

ID=74694736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011186553.2A Active CN112436943B (en) 2020-10-29 2020-10-29 Request deduplication method, device, equipment and storage medium based on big data

Country Status (1)

Country Link
CN (1) CN112436943B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113206875B (en) * 2021-04-27 2022-09-02 深圳市晨北科技有限公司 Data transmission method, device and storage medium
CN114422259B (en) * 2022-01-26 2022-10-28 宋舒涵 Internet resource monitoring and distributing method facing high concurrent data request

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017076193A1 (en) * 2015-11-05 2017-05-11 北京奇虎科技有限公司 Method and apparatus for processing request from client
WO2020014954A1 (en) * 2018-07-20 2020-01-23 威富通科技有限公司 Data control method and terminal device
CN110795499A (en) * 2019-09-17 2020-02-14 中国平安人寿保险股份有限公司 Cluster data synchronization method, device and equipment based on big data and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10387044B2 (en) * 2017-04-05 2019-08-20 Kaminario Technologies Ltd. Deduplication in a distributed storage system
CN111143720A (en) * 2018-11-06 2020-05-12 顺丰科技有限公司 URL duplicate removal method, device and storage medium
CN110334086A (en) * 2019-05-30 2019-10-15 平安科技(深圳)有限公司 Data duplicate removal method, device, computer equipment and storage medium
CN110457305B (en) * 2019-08-13 2021-11-26 腾讯科技(深圳)有限公司 Data deduplication method, device, equipment and medium
CN111258966A (en) * 2020-01-14 2020-06-09 软通动力信息技术有限公司 Data deduplication method, device, equipment and storage medium
CN111259282B (en) * 2020-02-13 2023-08-29 深圳市腾讯计算机系统有限公司 URL (Uniform resource locator) duplication removing method, device, electronic equipment and computer readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017076193A1 (en) * 2015-11-05 2017-05-11 北京奇虎科技有限公司 Method and apparatus for processing request from client
WO2020014954A1 (en) * 2018-07-20 2020-01-23 威富通科技有限公司 Data control method and terminal device
CN110795499A (en) * 2019-09-17 2020-02-14 中国平安人寿保险股份有限公司 Cluster data synchronization method, device and equipment based on big data and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
海量关系数据去重处理技术研究与优化;黄奇鹏等;《计算机与数字工程》;20181020(第10期);全文 *

Also Published As

Publication number Publication date
CN112436943A (en) 2021-03-02

Similar Documents

Publication Publication Date Title
CN109254733B (en) Method, device and system for storing data
CN112527816B (en) Data blood relationship analysis method, system, computer equipment and storage medium
US20180248879A1 (en) Method and apparatus for setting access privilege, server and storage medium
CN112436943B (en) Request deduplication method, device, equipment and storage medium based on big data
CN111191255B (en) Information encryption processing method, server, terminal, device and storage medium
CN115017107A (en) Data retrieval method and device based on privacy protection, computer equipment and medium
CN115757495A (en) Cache data processing method and device, computer equipment and storage medium
CN107844488B (en) Data query method and device
CN111680477A (en) Method and device for exporting spreadsheet file, computer equipment and storage medium
CN110618999A (en) Data query method and device, computer storage medium and electronic equipment
CN114996675A (en) Data query method and device, computer equipment and storage medium
CN113268453A (en) Log information compression storage method and device
CN116842012A (en) Method, device, equipment and storage medium for storing Redis cluster in fragments
CN112182108A (en) Block chain based distributed data storage updating method and electronic equipment
CN111368693A (en) Identification method and device for identity card information
CN111311374A (en) University student-based idle commodity exchange method, device, equipment and storage medium
CN111046010A (en) Log storage method, device, system, electronic equipment and computer readable medium
CN114912003A (en) Document searching method and device, computer equipment and storage medium
CN112182603B (en) Anti-crawler method and device
CN115203672A (en) Information access control method and device, computer equipment and medium
CN112416875B (en) Log management method, device, computer equipment and storage medium
CN115374455A (en) Audio file processing method and device, computer equipment and storage medium
CN114626352A (en) Report automatic generation method and device, computer equipment and storage medium
CN113791735A (en) Video data storage method and device, computer equipment and storage medium
CN112632054A (en) Data set duplication removing method based on attribute encryption, storage medium and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: No. 80, Changjiang Road, Wancheng District, Nanyang City, Henan Province 473003

Applicant after: NANYANG INSTITUTE OF TECHNOLOGY

Address before: 473000 Qiyi Road, Nanyang City, Henan Province

Applicant before: NANYANG INSTITUTE OF TECHNOLOGY

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240328

Address after: 473000 Liding Industrial Park, No. 2497 Songshan Road, Lihe, Wancheng District, Nanyang City, Henan Province

Patentee after: Henan Liding Material Technology Co.,Ltd.

Country or region after: China

Address before: No. 80, Changjiang Road, Wancheng District, Nanyang City, Henan Province 473003

Patentee before: NANYANG INSTITUTE OF TECHNOLOGY

Country or region before: China