WO2016184316A1 - Data flow limiting method and device - Google Patents

Data flow limiting method and device Download PDF

Info

Publication number
WO2016184316A1
WO2016184316A1 PCT/CN2016/081216 CN2016081216W WO2016184316A1 WO 2016184316 A1 WO2016184316 A1 WO 2016184316A1 CN 2016081216 W CN2016081216 W CN 2016081216W WO 2016184316 A1 WO2016184316 A1 WO 2016184316A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
value
piece
sensitive hash
hash value
Prior art date
Application number
PCT/CN2016/081216
Other languages
French (fr)
Chinese (zh)
Inventor
胡四海
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2016184316A1 publication Critical patent/WO2016184316A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control

Definitions

  • the present application relates to the field of Internet technologies, and in particular, to a data limiting method and apparatus.
  • the existing current limiting scheme generally includes two types, a random current limiting scheme and a hashing scheme.
  • the random current limiting scheme is usually pure random current limiting. In this scheme, the data removed and retained is completely random, and the diversity of the current limiting data cannot be guaranteed.
  • the Hash scheme calculates the hash value to determine whether the two data are the same, and preferentially removes the same data, but cannot distinguish between the two similar data.
  • the purpose of the present application is to solve at least one of the technical problems in the related art to some extent.
  • the first object of the present application is to propose a data current limiting method.
  • the method can remove data according to the similarity and difference of the data, and can preferentially remove the same data, thereby maximizing the diversity of the data after the current limiting.
  • a second object of the present application is to propose a data current limiting device.
  • the data current limiting method of the first aspect of the present application includes: calculating a local sensitive hash value of the received data; and according to the locally sensitive hash value of the data and the saved at least one piece of data. a locally sensitive hash value, a similar value of the data to the at least one piece of data is calculated; and whether the data is saved is determined according to the similarity value.
  • the data current limiting method of the embodiment of the present application calculates a local sensitive hash value of the received data, and then calculates the data according to the local sensitive hash value of the data and the locally sensitive hash value of the saved at least one piece of data. At least one similarity value of the data, and finally determining whether to save the above data according to the similarity value, so that the data can be removed according to the similarity and difference of the data, and the same data can be preferentially removed, so that the diversity of the data after the current limiting can be achieved. maximize.
  • the data current limiting device of the second aspect of the present application includes: a calculation module, Calculating a locally sensitive hash value of the received data, and calculating a similarity of the data to the at least one piece of data according to a locally sensitive hash value of the data and a locally sensitive hash value of the saved at least one piece of data And a determining module, configured to determine whether to save the data according to the similarity value calculated by the computing module.
  • the calculation module calculates a local sensitive hash value of the received data, and calculates the above according to the local sensitive hash value of the data and the locally sensitive hash value of the saved at least one piece of data.
  • the determining module determines whether to save the foregoing data according to the similarity value calculated by the calculating module, so that the data may be removed according to the similarity and difference of the data, and the same data may be preferentially removed, thereby Maximize the diversity of data after current limiting.
  • FIG. 1 is a flow chart of an embodiment of a data limiting method of the present application.
  • FIG. 2 is a flow chart of another embodiment of a data limiting method of the present application.
  • FIG. 3 is a schematic structural diagram of an embodiment of a data current limiting device of the present application.
  • FIG. 4 is a schematic structural view of another embodiment of a data current limiting device of the present application.
  • the data limiting method in this embodiment may be implemented by a data limiting device, where the data limiting device may be disposed between an upstream server and a downstream server, specifically The data limiting device can be integrated in the upstream server or the downstream server to implement the function of limiting the data sent by the upstream server to the downstream server. Alternatively, the data limiting device may be disposed in a separate server or as a separate server, the independent server being located between the upstream server and the downstream server. The function of limiting the data sent by the upstream server to the downstream server is implemented.
  • the data limiting method may include:
  • Step 101 Calculate a Locality Sensitive Hashing (LSH) value of the received data.
  • LSH Locality Sensitive Hashing
  • the received data is the data sent by the upstream server, and after receiving the data sent by the upstream server, the data limiting device limits the flow and sends the data to the downstream server.
  • Step 102 Calculate a similarity value between the data and the at least one piece of data according to an LSH value of the data and an LSH value of the saved at least one piece of data.
  • the at least one piece of data saved may be at least one piece of data that has been saved in the cache, and the cache is a cache opened in the data limiting device or in a server including the data limiting device.
  • calculating a similarity value between the data and the at least one piece of data according to the LSH value of the data and the stored LSH value of the at least one piece of data may be: calculating a difference between an LSH value of the data and an LSH value of the at least one piece of data a value, and calculating a similar value of the above data to the at least one piece of data according to the difference value.
  • the data current limiting device calculates the similarity value between the data and the at least one piece of data according to the difference value, and may calculate the similarity value between the data and the at least one piece of data according to the formula (1).
  • D i is a difference value between the LSH value of the data and the LSH value of the at least one piece of data
  • S i is a similarity value between the data and the at least one piece of data, i is an integer, and i ⁇ 1.
  • the difference between the LSH value of the data and the LSH value of the at least one piece of data may be a Hamming distance (Hmming Distance; HD) of the LSH value of the data and the LSH value of the at least one piece of data.
  • HD Hamming Distance
  • Step 103 Determine whether to save the above data according to the similarity value.
  • the data current limiting device determining whether to save the data according to the similarity value may be: the data current limiting device calculates a pass probability of the data according to a maximum value of the similarity values and a predetermined sampling rate; if the pass probability is greater than or equal to The preset threshold is used to save the above data; and if the pass probability is less than the preset threshold, the data is not saved.
  • the preset threshold may be set according to the implementation requirement and/or the system performance, and the size of the preset threshold is not limited in this embodiment. For example, the preset threshold may be It is 50%.
  • saving the foregoing data may be: storing the foregoing data in the cache. Further, after the data is saved, the data current limiting device may further send the data saved in the cache to the downstream server, so that the data sent by the upstream server is restricted and sent to the downstream server.
  • the data current limiting device calculates the pass probability of the data according to the maximum value of the similarity value and the predetermined sampling rate, and may calculate the pass probability of the data according to the formula (2).
  • L is a predetermined sampling rate, for example: L can be 75%; S i is a similar value of the above data and the at least one piece of data, i is an integer, i ⁇ 1; Max (S i ) is the maximum of the above similar values.
  • the data current limiting device calculates an LSH value of the received data, and then calculates a similarity value between the data and the at least one piece of data according to the LSH value of the data and the stored LSH value of the at least one piece of data, and finally according to the foregoing
  • the similarity value determines whether the above data is saved, so that the degree of similarity and difference of the data can be removed, the data can be removed, and the same data can be preferentially removed, so that the diversity of the data after the current limit can be maximized.
  • the following describes the data current limiting method provided by the present application by taking the e-commerce platform transaction data as an example.
  • the predetermined sampling rate is 75%, that is, the current limit is required to remove 25% of the traffic.
  • the data of No.1 and No.4 are exactly the same.
  • the predetermined sampling rate is 75% (that is, the current limit is 25%), and the actual difference is expected to be the smallest difference.
  • the two pieces of data are: No. 4 (no difference from No. 1) and No. 2 (different from No. 1), that is, data No. 1, No. 3, No. 5, No. 6, No. 7, and No. 8 are retained.
  • This application uses LSH to make the sampled data diversified as much as possible, and to retain sufficient data difference. It can solve the problem of data loss of random current limiting scheme, and can also solve the Hash scheme. It can only judge similar and cannot judge similar. The problem is that after Hash, the problem of the difference in the original content cannot be preserved.
  • the data limiting method may include:
  • step 201 the cache space is opened.
  • the cache space is a cache space opened in the data current limiting device or in the server including the data current limiting device, and is used for buffering the LSH value of the latest N latest data sent by the upstream server.
  • N can be configured according to the actual situation. It is recommended to be the full value within 5 minutes.
  • the upper limit is 1024 to ensure that the memory limit is a few K.
  • the data current limiting device first calculates and caches the LSH value for the traffic data according to the sequence number sequence. After the data of the first data flows into the data current limiting device, the cache is as shown in Table 3. .
  • Step 202 When new traffic data comes in, calculate an LSH value of the data to be stored in the cache, and calculate a difference value between an LSH value of the data to be stored in the cache and an LSH value of at least one piece of data in the cache.
  • the above difference value is expressed by the Hamming distance HD.
  • the data current limiting device calculates the LSH value of the No. 2 data, and the LSH value of the No. 2 data can be as shown in Table 4.
  • the data current limiting device calculates a difference value from the LSH value of the No. 1 data in the cache, wherein the HD calculation method may be: the number of corresponding bits having different LSH values, that is, comparing the LSH value of the No. 2 data with the No. 1 data. What is the difference in the number of bits of the LSH value, and how many HDs are.
  • the HD calculation method may be: the number of corresponding bits having different LSH values, that is, comparing the LSH value of the No. 2 data with the No. 1 data. What is the difference in the number of bits of the LSH value, and how many HDs are.
  • it can be quickly calculated by XOR.
  • the comparison between the LSH value of the No. 2 data and the LSH value of the No. 1 data can be as shown in Table 5.
  • Step 203 Calculate a similarity value of the data to be cached and the at least one piece of data in the cache according to the difference value.
  • the data current limiting means can then calculate the above similar values according to equation (1).
  • Step 204 Calculate a pass probability of the data to be cached according to the maximum value of the similarity value and the predetermined sampling rate.
  • the above pass probability may be calculated according to the formula (2), and the data current limiting device may calculate the pass probability of obtaining the No. 2 data according to the formula (2) to be 5.83%.
  • Step 205 Determine whether the pass probability is greater than or equal to a preset threshold. If yes, step 206 is performed; if the pass probability is less than the preset threshold, step 207 is performed.
  • the preset threshold may be set according to the implementation requirement and/or the system performance, and the size of the preset threshold is not limited in this embodiment. However, in this embodiment, the preset threshold is 50% as an example for description.
  • Step 206 The data to be stored in the cache is stored in the cache, and the process ends.
  • Step 207 The data to be cached is not stored in the cache, and the process ends.
  • steps 202 to 207 may be repeated to limit the data of No. 3 to No. 8. Since the No. 2 data is not stored in the cache, the data in the cache is as shown in Table 6.
  • the LSH value of the data No. 3 and the HD of the LSH value of the No. 1 data are 10, so that the pass probability of obtaining the No. 3 data is 55.6%, which is greater than 50%, so the No. 3 is obtained.
  • the data is stored in the cache, and the data in the cache is as shown in Table 7.
  • the HD value of the LSH value of the No. 4 data and the LSH value of the No. 1 data is 0, and the LSH value of the No. 4 data and the HD of the LSH value of the No. 3 data are 10. Therefore, the pass probability of obtaining the No. 4 data is 0, so the No. 4 data is not stored in the above cache, and the data in the cache is still as shown in Table 7.
  • the HD value of the LSH value of the No. 5 data and the LSH value of the No. 1 data is 9, and the LSH value of the No. 5 data and the HD of the LSH value of the No. 3 data are 11. Therefore, the probability of obtaining the data of No. 5 can be calculated to be 50%, so the data of No. 5 is stored in the cache, and the data in the cache is as shown in Table 8.
  • the data current limiting device can perform data limiting according to the difference in data similarity, and preferentially remove the same data, thereby maximizing the diversity of the data after the current limiting.
  • FIG. 3 is a schematic structural diagram of an embodiment of the data current limiting device of the present application.
  • the data current limiting device in this embodiment can implement the process of the embodiment shown in FIG. 1 of the present application.
  • the data limiting device can be Including: a calculation module 31 and a determination module 32;
  • the calculation module 31 is configured to calculate an LSH value of the received data, and calculate a similarity value between the data and the at least one piece of data according to an LSH value of the data and an LSH value of the saved at least one piece of data; wherein, the calculating The module 31 is specifically configured to calculate a difference value between the LSH value of the data and the stored LSH value of the at least one piece of data, and calculate a similarity value between the data and the at least one piece of data according to the difference value.
  • the difference value calculated by the calculation module 31 may be a Hamming distance between the LSH value of the data and the LSH value of the at least one piece of data; specifically, the calculation module 31 may calculate the data and the at least one piece of data according to the formula (1). Similar values.
  • the determining module 32 is configured to determine whether to save the foregoing data according to the similarity value calculated by the calculating module 31.
  • the data limiting device may be disposed between the upstream server and the downstream server. Specifically, the data limiting device may be integrated into the upstream server or the downstream server to implement the function of limiting the data sent by the upstream server to the downstream server. Alternatively, the data current limiting device may be disposed in a separate server or as a separate server, and the independent server is located between the upstream server and the downstream server to implement upstream services. The function of the data sent to the downstream server is limited. The data received by the upstream server is the data sent by the upstream server. After receiving the data sent by the upstream server, the data limiting device limits the flow and sends the data to the downstream server.
  • the at least one piece of data saved may be at least one piece of data that has been saved in the cache, and the cache is a cache opened in the data limiting device or in a server including the data limiting device.
  • the calculation module 31 calculates the LSH value of the received data, and calculates a similarity value between the data and the at least one piece of data according to the LSH value of the data and the LSH value of the saved at least one piece of data; 32: determining whether to save the foregoing data according to the similarity value calculated by the calculating module 31, so that data can be removed according to the similarity and difference of the data, and the same data can be preferentially removed, thereby maximizing the diversity of the data after the current limiting. .
  • the determining module 32 can The method includes: a probability calculation sub-module 321 and a deposit sub-module 322;
  • the probability calculation sub-module 321 is configured to calculate a pass probability of the data according to a maximum value of the similarity values calculated by the calculation module 31 and a predetermined sampling rate; specifically, the probability calculation sub-module 321 can calculate the above according to the formula (2).
  • the probability of passing data is configured to calculate a pass probability of the data according to a maximum value of the similarity values calculated by the calculation module 31 and a predetermined sampling rate; specifically, the probability calculation sub-module 321 can calculate the above according to the formula (2). The probability of passing data.
  • the storage sub-module 322 is configured to save the foregoing data when the probability of passing by the probability calculation sub-module 321 is greater than or equal to a preset threshold.
  • the preset threshold may be set according to the implementation requirement and/or the system performance, and the size of the preset threshold is not limited in this embodiment. For example, the preset threshold may be 50%. .
  • the saving of the data may be: the depositing sub-module 322 storing the data in the cache. Further, after the data is saved, the data current limiting device may further send the data saved in the cache to the downstream server, so that the data sent by the upstream server is restricted and sent to the downstream server.
  • the data current limiting device can perform data limiting according to the difference in data similarity, and preferentially remove the same data, thereby maximizing the diversity of the data after the current limiting.
  • Any process or method description in the flowchart or otherwise described herein may be understood to include a Modules, segments or portions of code of one or more executable instructions for implementing steps of a particular logical function or process, and the scope of preferred embodiments of the application includes additional implementations, which may not be as shown or The order of discussion includes performing functions in a substantially simultaneous manner or in the reverse order, depending on the functionality involved, which should be understood by those skilled in the art to which the embodiments of the present application pertain.
  • portions of the application can be implemented in hardware, software, firmware, or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals.
  • Discrete logic circuit, ASIC with suitable combination logic gate Programmable Gate Array (PGA), Field Programmable Gate Array (FPGA).
  • each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
  • the above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Complex Calculations (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The present invention provides a data flow limiting method and device. The data flow limiting method comprises: computing a locality sensitive Hash value of received data; computing, according to the locality sensitive Hash value of the data and a locality sensitive Hash value of at least one piece of saved data, a similarity value of the data and the at least one piece of data; and determining, according to the similarity value, whether to save the data or not. The present invention can remove data according to the similarity degree and the difference of the data, and can preferentially remove same data, thereby maximizing diversity of the data subjected to flow limiting.

Description

数据限流方法和装置Data limiting method and device
本申请要求2015年05月15日递交的申请号为201510250007.3、发明名称为“数据限流方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application Serial No. No. No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No
技术领域Technical field
本申请涉及互联网技术领域,尤其涉及一种数据限流方法和装置。The present application relates to the field of Internet technologies, and in particular, to a data limiting method and apparatus.
背景技术Background technique
计算机系统间调用,出于各种原因(资源不足、系统压力大等),常常面临需要进行流量限制的情况。现有的限流方案,一般包括两种,随机限流方案和哈希(Hash)方案。其中,随机限流方案通常为纯随机限流,这种方案中,去除和保留的数据,完全随机,无法保证限流数据的多样性。而Hash方案是通过计算得到的Hash值,判断两条数据是否相同,优先去除相同的数据,但对于相似的两条数据却无法区分。Calls between computer systems, for various reasons (insufficient resources, high system pressure, etc.), often face the need for traffic restrictions. The existing current limiting scheme generally includes two types, a random current limiting scheme and a hashing scheme. The random current limiting scheme is usually pure random current limiting. In this scheme, the data removed and retained is completely random, and the diversity of the current limiting data cannot be guaranteed. The Hash scheme calculates the hash value to determine whether the two data are the same, and preferentially removes the same data, but cannot distinguish between the two similar data.
发明内容Summary of the invention
本申请的目的旨在至少在一定程度上解决相关技术中的技术问题之一。The purpose of the present application is to solve at least one of the technical problems in the related art to some extent.
为此,本申请的第一个目的在于提出一种数据限流方法。该方法可以根据数据的相似程度和差异,去除数据,并可以优先去除相同数据,从而可以使限流后的数据的多样性最大化。To this end, the first object of the present application is to propose a data current limiting method. The method can remove data according to the similarity and difference of the data, and can preferentially remove the same data, thereby maximizing the diversity of the data after the current limiting.
本申请的第二个目的在于提出一种数据限流装置。A second object of the present application is to propose a data current limiting device.
为了实现上述目的,本申请第一方面实施例的数据限流方法,包括:计算接收到的数据的局部敏感哈希值;根据所述数据的局部敏感哈希值与已保存的至少一条数据的局部敏感哈希值,计算所述数据与所述至少一条数据的相似值;根据所述相似值确定是否保存所述数据。In order to achieve the above object, the data current limiting method of the first aspect of the present application includes: calculating a local sensitive hash value of the received data; and according to the locally sensitive hash value of the data and the saved at least one piece of data. a locally sensitive hash value, a similar value of the data to the at least one piece of data is calculated; and whether the data is saved is determined according to the similarity value.
本申请实施例的数据限流方法,计算接收到的数据的局部敏感哈希值,然后根据上述数据的局部敏感哈希值与已保存的至少一条数据的局部敏感哈希值,计算上述数据与至少一条数据的相似值,最后根据上述相似值确定是否保存上述数据,从而可以实现根据数据的相似程度和差异,去除数据,并可以优先去除相同数据,从而可以使限流后的数据的多样性最大化。The data current limiting method of the embodiment of the present application calculates a local sensitive hash value of the received data, and then calculates the data according to the local sensitive hash value of the data and the locally sensitive hash value of the saved at least one piece of data. At least one similarity value of the data, and finally determining whether to save the above data according to the similarity value, so that the data can be removed according to the similarity and difference of the data, and the same data can be preferentially removed, so that the diversity of the data after the current limiting can be achieved. maximize.
为了实现上述目的,本申请第二方面实施例的数据限流装置,包括:计算模块,用 于计算接收到的数据的局部敏感哈希值,并根据所述数据的局部敏感哈希值与已保存的至少一条数据的局部敏感哈希值,计算所述数据与所述至少一条数据的相似值;确定模块,用于根据所述计算模块计算的相似值确定是否保存所述数据。In order to achieve the above object, the data current limiting device of the second aspect of the present application includes: a calculation module, Calculating a locally sensitive hash value of the received data, and calculating a similarity of the data to the at least one piece of data according to a locally sensitive hash value of the data and a locally sensitive hash value of the saved at least one piece of data And a determining module, configured to determine whether to save the data according to the similarity value calculated by the computing module.
本申请实施例的数据限流装置,计算模块计算接收到的数据的局部敏感哈希值,并根据上述数据的局部敏感哈希值与已保存的至少一条数据的局部敏感哈希值,计算上述数据与至少一条数据的相似值;然后,确定模块根据上述计算模块计算的相似值确定是否保存上述数据,从而可以实现根据数据的相似程度和差异,去除数据,并可以优先去除相同数据,从而可以使限流后的数据的多样性最大化。In the data current limiting device of the embodiment of the present application, the calculation module calculates a local sensitive hash value of the received data, and calculates the above according to the local sensitive hash value of the data and the locally sensitive hash value of the saved at least one piece of data. And the determining module determines whether to save the foregoing data according to the similarity value calculated by the calculating module, so that the data may be removed according to the similarity and difference of the data, and the same data may be preferentially removed, thereby Maximize the diversity of data after current limiting.
本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。The aspects and advantages of the present invention will be set forth in part in the description which follows.
附图说明DRAWINGS
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from
图1为本申请数据限流方法一个实施例的流程图;1 is a flow chart of an embodiment of a data limiting method of the present application;
图2为本申请数据限流方法另一个实施例的流程图;2 is a flow chart of another embodiment of a data limiting method of the present application;
图3为本申请数据限流装置一个实施例的结构示意图;3 is a schematic structural diagram of an embodiment of a data current limiting device of the present application;
图4为本申请数据限流装置另一个实施例的结构示意图。4 is a schematic structural view of another embodiment of a data current limiting device of the present application.
具体实施方式detailed description
下下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能理解为对本申请的限制。相反,本申请的实施例包括落入所附加权利要求书的精神和内涵范围内的所有变化、修改和等同物。The embodiments of the present application are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are intended to be illustrative only, and are not to be construed as limiting. Rather, the embodiment of the present application includes all changes, modifications, and equivalents falling within the spirit and scope of the appended claims.
图1为本申请数据限流方法一个实施例的流程图,本实施例的数据限流方法可以由数据限流装置实现,上述数据限流装置可以设置在上游服务器与下游服务器之间,具体地,上述数据限流装置可以集成在上游服务器或下游服务器中,实现对上游服务器发往下游服务器的数据进行限流的功能。或者,上述数据限流装置也可以设置在一独立的服务器中或者作为一独立的服务器,该独立的服务器位于上游服务器与下游服务器之间, 实现对上游服务器发往下游服务器的数据进行限流的功能。1 is a flowchart of an embodiment of a data limiting method of the present application. The data limiting method in this embodiment may be implemented by a data limiting device, where the data limiting device may be disposed between an upstream server and a downstream server, specifically The data limiting device can be integrated in the upstream server or the downstream server to implement the function of limiting the data sent by the upstream server to the downstream server. Alternatively, the data limiting device may be disposed in a separate server or as a separate server, the independent server being located between the upstream server and the downstream server. The function of limiting the data sent by the upstream server to the downstream server is implemented.
如图1所示,该数据限流方法可以包括:As shown in FIG. 1, the data limiting method may include:
步骤101,计算接收到的数据的局部敏感哈希(Locality Sensitive Hashing;以下简称:LSH)值。Step 101: Calculate a Locality Sensitive Hashing (LSH) value of the received data.
具体地,上述接收到的数据即为上游服务器发出的数据,数据限流装置接收到上游服务器发出的数据之后,对其进行限流,再发送给下游服务器。Specifically, the received data is the data sent by the upstream server, and after receiving the data sent by the upstream server, the data limiting device limits the flow and sends the data to the downstream server.
步骤102,根据上述数据的LSH值与已保存的至少一条数据的LSH值,计算上述数据与上述至少一条数据的相似值。Step 102: Calculate a similarity value between the data and the at least one piece of data according to an LSH value of the data and an LSH value of the saved at least one piece of data.
其中,上述已保存的至少一条数据可以为缓存中已保存的至少一条数据,上述缓存为在上述数据限流装置中或者在包含上述数据限流装置的服务器中开辟的缓存。The at least one piece of data saved may be at least one piece of data that has been saved in the cache, and the cache is a cache opened in the data limiting device or in a server including the data limiting device.
具体地,根据上述数据的LSH值与已保存的至少一条数据的LSH值,计算上述数据与上述至少一条数据的相似值可以为:计算上述数据的LSH值与上述至少一条数据的LSH值的差异值,并根据上述差异值计算上述数据与上述至少一条数据的相似值。Specifically, calculating a similarity value between the data and the at least one piece of data according to the LSH value of the data and the stored LSH value of the at least one piece of data may be: calculating a difference between an LSH value of the data and an LSH value of the at least one piece of data a value, and calculating a similar value of the above data to the at least one piece of data according to the difference value.
其中,数据限流装置根据上述差异值计算上述数据与上述至少一条数据的相似值可以为:按照式(1)计算上述数据与上述至少一条数据的相似值。The data current limiting device calculates the similarity value between the data and the at least one piece of data according to the difference value, and may calculate the similarity value between the data and the at least one piece of data according to the formula (1).
Figure PCTCN2016081216-appb-000001
Figure PCTCN2016081216-appb-000001
其中,Di为上述数据的LSH值与上述至少一条数据的LSH值的差异值;Si为上述数据与上述至少一条数据的相似值,i为整数,i≥1。Wherein, D i is a difference value between the LSH value of the data and the LSH value of the at least one piece of data; S i is a similarity value between the data and the at least one piece of data, i is an integer, and i≥1.
本实施例中,上述数据的LSH值与上述至少一条数据的LSH值的差异值可以为上述数据的LSH值与上述至少一条数据的LSH值的汉明距离(Hamming Distance;以下简称:HD)。In this embodiment, the difference between the LSH value of the data and the LSH value of the at least one piece of data may be a Hamming distance (Hmming Distance; HD) of the LSH value of the data and the LSH value of the at least one piece of data.
步骤103,根据上述相似值确定是否保存上述数据。Step 103: Determine whether to save the above data according to the similarity value.
具体地,数据限流装置根据上述相似值确定是否保存上述数据可以为:数据限流装置根据上述相似值中的最大值和预定的抽样率计算上述数据的通过概率;如果上述通过概率大于或等于预设阈值,则保存上述数据;而如果上述通过概率小于上述预设阈值,则不保存上述数据。其中,上述预设阈值可以在具体实现时,根据实现需求和/或系统性能等自行设定,本实施例对上述预设阈值的大小不作限定,举例来说,该预设阈值可以 为50%。Specifically, the data current limiting device determining whether to save the data according to the similarity value may be: the data current limiting device calculates a pass probability of the data according to a maximum value of the similarity values and a predetermined sampling rate; if the pass probability is greater than or equal to The preset threshold is used to save the above data; and if the pass probability is less than the preset threshold, the data is not saved. The preset threshold may be set according to the implementation requirement and/or the system performance, and the size of the preset threshold is not limited in this embodiment. For example, the preset threshold may be It is 50%.
具体地,保存上述数据可以为:将上述数据存入上述缓存。进一步地,在保存上述数据之后,数据限流装置还可以将缓存中保存的数据发送给下游服务器,从而实现了对上游服务器发出的数据进行限流后,发送给下游服务器。Specifically, saving the foregoing data may be: storing the foregoing data in the cache. Further, after the data is saved, the data current limiting device may further send the data saved in the cache to the downstream server, so that the data sent by the upstream server is restricted and sent to the downstream server.
其中,数据限流装置根据上述相似值中的最大值和预定的抽样率计算上述数据的通过概率可以为:按照式(2)计算上述数据的通过概率。The data current limiting device calculates the pass probability of the data according to the maximum value of the similarity value and the predetermined sampling rate, and may calculate the pass probability of the data according to the formula (2).
Figure PCTCN2016081216-appb-000002
Figure PCTCN2016081216-appb-000002
其中,P为上述数据的通过概率;L为预定的抽样率,例如:L可以为75%;Si为上述数据与上述至少一条数据的相似值,i为整数,i≥1;Max(Si)为上述相似值中的最大值。Where P is the probability of passing the above data; L is a predetermined sampling rate, for example: L can be 75%; S i is a similar value of the above data and the at least one piece of data, i is an integer, i ≥ 1; Max (S i ) is the maximum of the above similar values.
上述实施例中,数据限流装置计算接收到的数据的LSH值,然后根据上述数据的LSH值与已保存的至少一条数据的LSH值,计算上述数据与至少一条数据的相似值,最后根据上述相似值确定是否保存上述数据,从而可以实现根据数据的相似程度和差异,去除数据,并可以优先去除相同数据,从而可以使限流后的数据的多样性最大化。In the above embodiment, the data current limiting device calculates an LSH value of the received data, and then calculates a similarity value between the data and the at least one piece of data according to the LSH value of the data and the stored LSH value of the at least one piece of data, and finally according to the foregoing The similarity value determines whether the above data is saved, so that the degree of similarity and difference of the data can be removed, the data can be removed, and the same data can be preferentially removed, so that the diversity of the data after the current limit can be maximized.
下面以电商平台交易数据为例对本申请提供的数据限流方法进行说明。假设有一系统,需要对交易数据进行实时抽样检查,并尽可能保留抽样数据的多样性,预定的抽样率为75%,即需要限流去除25%的流量。The following describes the data current limiting method provided by the present application by taking the e-commerce platform transaction data as an example. Suppose there is a system that requires real-time sampling of transaction data and preserves the diversity of sampled data as much as possible. The predetermined sampling rate is 75%, that is, the current limit is required to remove 25% of the traffic.
假设按序号顺序,交易数据如表1所示。Assume that in the order of the serial numbers, the transaction data is as shown in Table 1.
表1Table 1
Figure PCTCN2016081216-appb-000003
Figure PCTCN2016081216-appb-000003
Figure PCTCN2016081216-appb-000004
Figure PCTCN2016081216-appb-000004
从表1中可以看出,1号和4号数据完全一样,基于表1中的交易数据,8条数据,预定抽样率75%(即限流25%),实际希望被去除的是差异最小的2条数据为:4号(与1号无差异)和2号(与1号仅购买数量不同),即保留1号、3号、5号、6号、7号和8号数据。As can be seen from Table 1, the data of No.1 and No.4 are exactly the same. Based on the transaction data in Table 1, 8 data, the predetermined sampling rate is 75% (that is, the current limit is 25%), and the actual difference is expected to be the smallest difference. The two pieces of data are: No. 4 (no difference from No. 1) and No. 2 (different from No. 1), that is, data No. 1, No. 3, No. 5, No. 6, No. 7, and No. 8 are retained.
本申请使用LSH,使抽样得到的数据,尽量多样化,保留足够的数据差异性,可以解决随机限流方案数据多样性丢失的问题,也可以解决Hash方案,只可判断相近、无法判断相似的问题,即Hash后无法保留原始内容的差异程度的问题。This application uses LSH to make the sampled data diversified as much as possible, and to retain sufficient data difference. It can solve the problem of data loss of random current limiting scheme, and can also solve the Hash scheme. It can only judge similar and cannot judge similar. The problem is that after Hash, the problem of the difference in the original content cannot be preserved.
LSH的计算方法有很多种,如Jaccard、SimHash或MinHash等,本申请以一种64位SimHash的实现为例,表1中各序号数据对应的SimHash值可以如表2所示(每位上的0/1为1个比特(Bit)位,1个SimHash值可用64个Bit位存储)。There are many calculation methods for LSH, such as Jaccard, SimHash or MinHash. The application takes a 64-bit SimHash implementation as an example. The corresponding SimHash values in Table 1 can be as shown in Table 2 (on each bit). 0/1 is 1 bit (Bit) bit, and 1 SimHash value can be stored with 64 bit bits).
表2Table 2
Figure PCTCN2016081216-appb-000005
Figure PCTCN2016081216-appb-000005
图2为本申请数据限流方法另一个实施例的流程图,如图2所示,该数据限流方法可以包括:2 is a flowchart of another embodiment of a data limiting method of the present application. As shown in FIG. 2, the data limiting method may include:
步骤201,开辟缓存空间。In step 201, the cache space is opened.
其中,该缓存空间为在数据限流装置中或者在包含上述数据限流装置的服务器中开辟的缓存空间,用于缓存上游服务器发出的最近N条最新数据的LSH值。N可以根据实际情况进行配置,建议为5分钟内全量值,超过1024时,上限为1024,以保证内存限制在几K。 The cache space is a cache space opened in the data current limiting device or in the server including the data current limiting device, and is used for buffering the LSH value of the latest N latest data sent by the upstream server. N can be configured according to the actual situation. It is recommended to be the full value within 5 minutes. When the value exceeds 1024, the upper limit is 1024 to ensure that the memory limit is a few K.
本实施例中,由于数量关系,可以假设N=3,数据限流装置首先按序号顺序,对流量数据,计算并缓存LSH值,1号数据流入数据限流装置后,缓存如表3所示。In this embodiment, due to the quantity relationship, it can be assumed that N=3, and the data current limiting device first calculates and caches the LSH value for the traffic data according to the sequence number sequence. After the data of the first data flows into the data current limiting device, the cache is as shown in Table 3. .
表3table 3
缓存Cache
10101111010011111111001011011000110101001001001100110111010000101010111101001111111100101101100011010100100100110011011101000010
步骤202,新流量数据进来时,计算待存入缓存的数据的LSH值,并计算待存入缓存的数据的LSH值与上述缓存中的至少一条数据的LSH值的差异值。Step 202: When new traffic data comes in, calculate an LSH value of the data to be stored in the cache, and calculate a difference value between an LSH value of the data to be stored in the cache and an LSH value of at least one piece of data in the cache.
本实施例中,用汉明距离来HD表示上述差异值。In the present embodiment, the above difference value is expressed by the Hamming distance HD.
2号数据流入后,数据限流装置计算2号数据的LSH值,2号数据的LSH值可以如表4所示。After the data No. 2 flows in, the data current limiting device calculates the LSH value of the No. 2 data, and the LSH value of the No. 2 data can be as shown in Table 4.
表4Table 4
10101111010011111111011011011000110101001001001100110111010000101010111101001111111101101101100011010100100100110011011101000010
然后数据限流装置计算与缓存内1号数据的LSH值的差异值,其中,HD的计算方法可以为:LSH值不同的对应位的数量,即比较2号数据的LSH值与1号数据的LSH值的各位上的差异,有多少位不同,则HD为多少。优选地,在计算HD时,可通过异或(xor)快速计算。Then, the data current limiting device calculates a difference value from the LSH value of the No. 1 data in the cache, wherein the HD calculation method may be: the number of corresponding bits having different LSH values, that is, comparing the LSH value of the No. 2 data with the No. 1 data. What is the difference in the number of bits of the LSH value, and how many HDs are. Preferably, when computing HD, it can be quickly calculated by XOR.
本实施例中,2号数据的LSH值与1号数据的LSH值的对比可以如表5所示。In this embodiment, the comparison between the LSH value of the No. 2 data and the LSH value of the No. 1 data can be as shown in Table 5.
表5table 5
Figure PCTCN2016081216-appb-000006
Figure PCTCN2016081216-appb-000006
从表5中可以看出,2号数据的LSH值与1号数据的LSH值仅有1位不同,于是可以得出HD=1。As can be seen from Table 5, the LSH value of the No. 2 data is different from the LSH value of the No. 1 data, and thus HD=1 can be obtained.
步骤203,根据上述差异值计算待存入缓存的数据与上述缓存中的至少一条数据的相似值。Step 203: Calculate a similarity value of the data to be cached and the at least one piece of data in the cache according to the difference value.
由于HD越大,相似值越低。不同场景下,HD与相似值的对应关系并不固定,在64位SimHash场景下,经测试得到:HD=1时,相似的准确率,接近85%;而HD=10时,相似的准确率,不到30%。 The larger the HD, the lower the similarity value. In different scenarios, the correspondence between HD and similar values is not fixed. In the 64-bit SimHash scenario, it is tested: when HD=1, the similar accuracy is close to 85%; and when HD=10, the similar accuracy is , less than 30%.
于是数据限流装置可以按照式(1)计算上述相似值。The data current limiting means can then calculate the above similar values according to equation (1).
根据式(1)可以计算获得2号数据的LSH值与1号数据的LSH值的相似度为:S=0.93。According to the formula (1), the similarity between the LSH value of the data No. 2 and the LSH value of the No. 1 data can be calculated as: S=0.93.
步骤204,根据上述相似值中的最大值和预定的抽样率计算上述待存入缓存的数据的通过概率。Step 204: Calculate a pass probability of the data to be cached according to the maximum value of the similarity value and the predetermined sampling rate.
其中,上述通过概率可以按照式(2)进行计算,数据限流装置根据式(2)可以计算获得2号数据的通过概率为5.83%。The above pass probability may be calculated according to the formula (2), and the data current limiting device may calculate the pass probability of obtaining the No. 2 data according to the formula (2) to be 5.83%.
步骤205,判断上述通过概率是否大于或等于预设阈值。如果是,则执行步骤206;如果上述通过概率小于预设阈值,则执行步骤207。Step 205: Determine whether the pass probability is greater than or equal to a preset threshold. If yes, step 206 is performed; if the pass probability is less than the preset threshold, step 207 is performed.
其中,上述预设阈值可以在具体实现时,根据实现需求和/或系统性能等自行设定,本实施例对上述预设阈值的大小不作限定。但本实施例中,以该预设阈值为50%为例进行说明。The preset threshold may be set according to the implementation requirement and/or the system performance, and the size of the preset threshold is not limited in this embodiment. However, in this embodiment, the preset threshold is 50% as an example for description.
步骤206,将上述待存入缓存的数据存入上述缓存,本次流程结束。Step 206: The data to be stored in the cache is stored in the cache, and the process ends.
步骤207,不将上述待存入缓存的数据存入上述缓存,本次流程结束。Step 207: The data to be cached is not stored in the cache, and the process ends.
由于2号数据的通过概率为5.83%,远低于50%,因此不将2号数据存入上述缓存,退出本次流程。Since the pass probability of the No. 2 data is 5.83%, which is much lower than 50%, the No. 2 data is not stored in the above cache, and the process is exited.
接下来,可以重复步骤202~步骤207,对3号~8号数据进行限流。由于2号数据未被存入缓存,因此缓存中的数据如表6所示。Next, steps 202 to 207 may be repeated to limit the data of No. 3 to No. 8. Since the No. 2 data is not stored in the cache, the data in the cache is as shown in Table 6.
表6Table 6
2号后缓存Cache after number 2
10101111010011111111001011011000110101001001001100110111010000101010111101001111111100101101100011010100100100110011011101000010
3号数据进入数据限流装置时,3号数据的LSH值与1号数据的LSH值的HD为10,于是可以计算获得3号数据的通过概率为55.6%,大于50%,于是将3号数据存入缓存,这时缓存中的数据如表7所示。When the data of No. 3 enters the data current limiting device, the LSH value of the data No. 3 and the HD of the LSH value of the No. 1 data are 10, so that the pass probability of obtaining the No. 3 data is 55.6%, which is greater than 50%, so the No. 3 is obtained. The data is stored in the cache, and the data in the cache is as shown in Table 7.
表7Table 7
3号后缓存Cache after number 3
10101111010011111111001011011000110101001001001100110111010000101010111101001111111100101101100011010100100100110011011101000010
10111111110001101111101010001000110100001001001000110111011000101011111111000110111110101000100011010000100100100011011101100010
4号数据进入数据限流装置时,4号数据的LSH值与1号数据的LSH值的HD为0,4号数据的LSH值与3号数据的LSH值的HD为10。于是可以计算获得4号数据的通过概率为0,因此不将4号数据存入上述缓存,缓存中的数据仍如表7所示。When the No. 4 data enters the data current limiting device, the HD value of the LSH value of the No. 4 data and the LSH value of the No. 1 data is 0, and the LSH value of the No. 4 data and the HD of the LSH value of the No. 3 data are 10. Therefore, the pass probability of obtaining the No. 4 data is 0, so the No. 4 data is not stored in the above cache, and the data in the cache is still as shown in Table 7.
5号数据进入数据限流装置时,5号数据的LSH值与1号数据的LSH值的HD为9,5号数据的LSH值与3号数据的LSH值的HD为11。于是可以计算获得5号数据的通过概率为50%,因此将5号数据存入缓存,这时缓存中的数据如表8所示。When the No. 5 data enters the data current limiting device, the HD value of the LSH value of the No. 5 data and the LSH value of the No. 1 data is 9, and the LSH value of the No. 5 data and the HD of the LSH value of the No. 3 data are 11. Therefore, the probability of obtaining the data of No. 5 can be calculated to be 50%, so the data of No. 5 is stored in the cache, and the data in the cache is as shown in Table 8.
表8Table 8
5号后缓存Cache after number 5
10101111010011111111001011011000110101001001001100110111010000101010111101001111111100101101100011010100100100110011011101000010
10111111110001101111101010001000110100001001001000110111011000101011111111000110111110101000100011010000100100100011011101100010
10101111110001101110101001011000110001001011001000110011011000101010111111000110111010100101100011000100101100100011001101100010
继续6号、7号和8号数据,在此不再赘述。Continue with data on the 6th, 7th and 8th, and will not repeat them here.
上述数据限流方法中,数据限流装置可以按照数据相似程度的差异,进行数据限流,优先去除相同数据,从而可以使限流后的数据的多样性最大化。In the above data limiting method, the data current limiting device can perform data limiting according to the difference in data similarity, and preferentially remove the same data, thereby maximizing the diversity of the data after the current limiting.
图3为本申请数据限流装置一个实施例的结构示意图,本实施例中的数据限流装置可以实现本申请图1所示实施例的流程,如图3所示,该数据限流装置可以包括:计算模块31和确定模块32;3 is a schematic structural diagram of an embodiment of the data current limiting device of the present application. The data current limiting device in this embodiment can implement the process of the embodiment shown in FIG. 1 of the present application. As shown in FIG. 3, the data limiting device can be Including: a calculation module 31 and a determination module 32;
其中,计算模块31,用于计算接收到的数据的LSH值,并根据上述数据的LSH值与已保存的至少一条数据的LSH值,计算上述数据与上述至少一条数据的相似值;其中,计算模块31,具体用于计算上述数据的LSH值与已保存的至少一条数据的LSH值的差异值,并根据上述差异值计算上述数据与上述至少一条数据的相似值。其中,计算模块31计算的上述差异值可以为上述数据的LSH值与上述至少一条数据的LSH值的汉明距离;具体地,计算模块31可以按照式(1)计算上述数据与上述至少一条数据的相似值。The calculation module 31 is configured to calculate an LSH value of the received data, and calculate a similarity value between the data and the at least one piece of data according to an LSH value of the data and an LSH value of the saved at least one piece of data; wherein, the calculating The module 31 is specifically configured to calculate a difference value between the LSH value of the data and the stored LSH value of the at least one piece of data, and calculate a similarity value between the data and the at least one piece of data according to the difference value. The difference value calculated by the calculation module 31 may be a Hamming distance between the LSH value of the data and the LSH value of the at least one piece of data; specifically, the calculation module 31 may calculate the data and the at least one piece of data according to the formula (1). Similar values.
确定模块32,用于根据计算模块31计算的相似值确定是否保存上述数据。The determining module 32 is configured to determine whether to save the foregoing data according to the similarity value calculated by the calculating module 31.
上述数据限流装置可以设置在上游服务器与下游服务器之间,具体地,上述数据限流装置可以集成在上游服务器或下游服务器中,实现对上游服务器发往下游服务器的数据进行限流的功能。或者,上述数据限流装置也可以设置在一独立的服务器中或者作为一独立的服务器,该独立的服务器位于上游服务器与下游服务器之间,实现对上游服务 器发往下游服务器的数据进行限流的功能。上述接收到的数据即为上游服务器发出的数据,数据限流装置接收到上游服务器发出的数据之后,对其进行限流,再发送给下游服务器。The data limiting device may be disposed between the upstream server and the downstream server. Specifically, the data limiting device may be integrated into the upstream server or the downstream server to implement the function of limiting the data sent by the upstream server to the downstream server. Alternatively, the data current limiting device may be disposed in a separate server or as a separate server, and the independent server is located between the upstream server and the downstream server to implement upstream services. The function of the data sent to the downstream server is limited. The data received by the upstream server is the data sent by the upstream server. After receiving the data sent by the upstream server, the data limiting device limits the flow and sends the data to the downstream server.
其中,上述已保存的至少一条数据可以为缓存中已保存的至少一条数据,上述缓存为在上述数据限流装置中或者在包含上述数据限流装置的服务器中开辟的缓存。The at least one piece of data saved may be at least one piece of data that has been saved in the cache, and the cache is a cache opened in the data limiting device or in a server including the data limiting device.
上述实施例中,计算模块31计算接收到的数据的LSH值,并根据上述数据的LSH值与已保存的至少一条数据的LSH值,计算上述数据与至少一条数据的相似值;然后,确定模块32根据上述计算模块31计算的相似值确定是否保存上述数据,从而可以实现根据数据的相似程度和差异,去除数据,并可以优先去除相同数据,从而可以使限流后的数据的多样性最大化。In the above embodiment, the calculation module 31 calculates the LSH value of the received data, and calculates a similarity value between the data and the at least one piece of data according to the LSH value of the data and the LSH value of the saved at least one piece of data; 32: determining whether to save the foregoing data according to the similarity value calculated by the calculating module 31, so that data can be removed according to the similarity and difference of the data, and the same data can be preferentially removed, thereby maximizing the diversity of the data after the current limiting. .
图4为本申请数据限流装置另一个实施例的结构示意图,与图3所示的数据限流装置相比,不同之处在于,图4所示的数据限流装置中,确定模块32可以包括:概率计算子模块321和存入子模块322;4 is a schematic structural diagram of another embodiment of the data current limiting device of the present application. Compared with the data current limiting device shown in FIG. 3, the difference is that in the data current limiting device shown in FIG. 4, the determining module 32 can The method includes: a probability calculation sub-module 321 and a deposit sub-module 322;
其中,概率计算子模块321,用于根据计算模块31计算的相似值中的最大值和预定的抽样率计算上述数据的通过概率;具体地,概率计算子模块321可以按照式(2)计算上述数据的通过概率。The probability calculation sub-module 321 is configured to calculate a pass probability of the data according to a maximum value of the similarity values calculated by the calculation module 31 and a predetermined sampling rate; specifically, the probability calculation sub-module 321 can calculate the above according to the formula (2). The probability of passing data.
存入子模块322,用于当概率计算子模块321计算的通过概率大于或等于预设阈值时,保存上述数据。其中,上述预设阈值可以在具体实现时,根据实现需求和/或系统性能等自行设定,本实施例对上述预设阈值的大小不作限定,举例来说,该预设阈值可以为50%。The storage sub-module 322 is configured to save the foregoing data when the probability of passing by the probability calculation sub-module 321 is greater than or equal to a preset threshold. The preset threshold may be set according to the implementation requirement and/or the system performance, and the size of the preset threshold is not limited in this embodiment. For example, the preset threshold may be 50%. .
具体地,保存上述数据可以为:存入子模块322将上述数据存入上述缓存。进一步地,在保存上述数据之后,数据限流装置还可以将缓存中保存的数据发送给下游服务器,从而实现了对上游服务器发出的数据进行限流后,发送给下游服务器。Specifically, the saving of the data may be: the depositing sub-module 322 storing the data in the cache. Further, after the data is saved, the data current limiting device may further send the data saved in the cache to the downstream server, so that the data sent by the upstream server is restricted and sent to the downstream server.
上述数据限流装置,可以按照数据相似程度的差异,进行数据限流,优先去除相同数据,从而可以使限流后的数据的多样性最大化。The data current limiting device can perform data limiting according to the difference in data similarity, and preferentially remove the same data, thereby maximizing the diversity of the data after the current limiting.
需要说明的是,在本申请的描述中,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,在本申请的描述中,除非另有说明,“多个”的含义是两个或两个以上。It should be noted that in the description of the present application, the terms "first", "second" and the like are used for descriptive purposes only, and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" is two or more unless otherwise stated.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一 个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。Any process or method description in the flowchart or otherwise described herein may be understood to include a Modules, segments or portions of code of one or more executable instructions for implementing steps of a particular logical function or process, and the scope of preferred embodiments of the application includes additional implementations, which may not be as shown or The order of discussion includes performing functions in a substantially simultaneous manner or in the reverse order, depending on the functionality involved, which should be understood by those skilled in the art to which the embodiments of the present application pertain.
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(Programmable Gate Array;以下简称:PGA),现场可编程门阵列(Field Programmable Gate Array;以下简称:FPGA)等。It should be understood that portions of the application can be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuit, ASIC with suitable combination logic gate, Programmable Gate Array (PGA), Field Programmable Gate Array (FPGA).
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。One of ordinary skill in the art can understand that all or part of the steps carried by the method of implementing the above embodiments can be completed by a program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, one or a combination of the steps of the method embodiments is included.
此外,本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
上述提到的存储介质可以是只读存储器,磁盘或光盘等。The above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. A structure, material or feature is included in at least one embodiment or example of the application. In the present specification, the schematic representation of the above terms does not necessarily mean the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples.
尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。 While the embodiments of the present application have been shown and described above, it is understood that the above-described embodiments are illustrative and are not to be construed as limiting the scope of the present application. The embodiments are subject to variations, modifications, substitutions and variations.

Claims (8)

  1. 一种数据限流方法,其特征在于,包括:A data current limiting method, comprising:
    计算接收到的数据的局部敏感哈希值;Calculating a locally sensitive hash of the received data;
    根据所述数据的局部敏感哈希值与已保存的至少一条数据的局部敏感哈希值,计算所述数据与所述至少一条数据的相似值;Calculating a similarity value of the data and the at least one piece of data according to a locally sensitive hash value of the data and a locally sensitive hash value of the saved at least one piece of data;
    根据所述相似值确定是否保存所述数据。Whether to save the data is determined based on the similarity value.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述数据的局部敏感哈希值与已保存的至少一条数据的局部敏感哈希值,计算所述数据与所述至少一条数据的相似值包括:The method according to claim 1, wherein said calculating said data and said at least one piece of data based on a locally sensitive hash value of said data and a locally sensitive hash value of said saved at least one piece of data Similar values include:
    计算所述数据的局部敏感哈希值与所述至少一条数据的局部敏感哈希值的差异值;Calculating a difference value between the locally sensitive hash value of the data and a locally sensitive hash value of the at least one piece of data;
    根据所述差异值计算所述数据与所述至少一条数据的相似值。Calculating a similarity value of the data and the at least one piece of data according to the difference value.
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述相似值确定是否保存所述数据包括:The method according to claim 1 or 2, wherein the determining whether to save the data according to the similarity value comprises:
    根据所述相似值中的最大值和预定的抽样率计算所述数据的通过概率;Calculating a pass probability of the data according to a maximum value of the similarity values and a predetermined sampling rate;
    如果所述通过概率大于或等于预设阈值,则保存所述数据。The data is saved if the pass probability is greater than or equal to a preset threshold.
  4. 根据权利要求2所述的方法,其特征在于,所述数据的局部敏感哈希值与所述至少一条数据的局部敏感哈希值的差异值包括所述数据的局部敏感哈希值与所述至少一条数据的局部敏感哈希值的汉明距离。The method of claim 2 wherein the difference between the locally sensitive hash value of the data and the locally sensitive hash value of the at least one piece of data comprises a locally sensitive hash value of the data and the Hamming distance of a locally sensitive hash of at least one piece of data.
  5. 一种数据限流装置,其特征在于,包括:A data current limiting device, comprising:
    计算模块,用于计算接收到的数据的局部敏感哈希值,并根据所述数据的局部敏感哈希值与已保存的至少一条数据的局部敏感哈希值,计算所述数据与所述至少一条数据的相似值;a calculation module, configured to calculate a local sensitive hash value of the received data, and calculate the data and the at least according to the locally sensitive hash value of the data and the locally sensitive hash value of the saved at least one piece of data a similar value for a piece of data;
    确定模块,用于根据所述计算模块计算的相似值确定是否保存所述数据。And a determining module, configured to determine whether to save the data according to the similarity value calculated by the computing module.
  6. 根据权利要求5所述的装置,其特征在于,The device according to claim 5, characterized in that
    所述计算模块,具体用于计算所述数据的局部敏感哈希值与已保存的至少一条数据的局部敏感哈希值的差异值,并根据所述差异值计算所述数据与所述至少一条数据的相似值。The calculating module is specifically configured to calculate a difference value between the locally sensitive hash value of the data and the locally sensitive hash value of the saved at least one piece of data, and calculate the data and the at least one piece according to the difference value Similar values for the data.
  7. 根据权利要求5或6所述的装置,其特征在于,所述确定模块包括:The apparatus according to claim 5 or 6, wherein the determining module comprises:
    概率计算子模块,用于根据所述计算模块计算的相似值中的最大值和预定的抽样率计算所述数据的通过概率; a probability calculation submodule, configured to calculate a pass probability of the data according to a maximum value of the similarity values calculated by the calculation module and a predetermined sampling rate;
    存入子模块,用于当所述概率计算子模块计算的通过概率大于或等于预设阈值时,保存所述数据。And storing the sub-module, when the pass probability calculated by the probability calculation sub-module is greater than or equal to a preset threshold, saving the data.
  8. 根据权利要求6所述的装置,其特征在于,The device of claim 6 wherein:
    所述计算模块计算的差异值包括所述数据的局部敏感哈希值与所述至少一条数据的局部敏感哈希值的汉明距离。 The difference value calculated by the calculation module includes a Hamming distance of a locally sensitive hash value of the data and a locally sensitive hash value of the at least one piece of data.
PCT/CN2016/081216 2015-05-15 2016-05-06 Data flow limiting method and device WO2016184316A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510250007.3A CN106302202B (en) 2015-05-15 2015-05-15 Data current limiting method and device
CN201510250007.3 2015-05-15

Publications (1)

Publication Number Publication Date
WO2016184316A1 true WO2016184316A1 (en) 2016-11-24

Family

ID=57319444

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/081216 WO2016184316A1 (en) 2015-05-15 2016-05-06 Data flow limiting method and device

Country Status (2)

Country Link
CN (1) CN106302202B (en)
WO (1) WO2016184316A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158967A (en) * 2007-11-16 2008-04-09 北京交通大学 Quick-speed audio advertisement recognition method based on layered matching
CN102722554A (en) * 2012-05-28 2012-10-10 中国人民解放军信息工程大学 Randomness weakening method of location-sensitive hash
CN102929891A (en) * 2011-08-11 2013-02-13 阿里巴巴集团控股有限公司 Text processing method and device
EP2685404A2 (en) * 2012-07-10 2014-01-15 Facebook, Inc. Method and system for determining image similarity
CN103530812A (en) * 2013-07-25 2014-01-22 国家电网公司 Power grid state similarity quantitative analyzing method based on locality sensitive hashing

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8050251B2 (en) * 2009-04-10 2011-11-01 Barracuda Networks, Inc. VPN optimization by defragmentation and deduplication apparatus and method
CN102622366B (en) * 2011-01-28 2014-07-30 阿里巴巴集团控股有限公司 Similar picture identification method and similar picture identification device
CN102323958A (en) * 2011-10-27 2012-01-18 上海文广互动电视有限公司 Data de-duplication method
CN103916421B (en) * 2012-12-31 2017-08-25 中国移动通信集团公司 Cloud storage data service device, data transmission system, server and method
US9690711B2 (en) * 2013-03-13 2017-06-27 International Business Machines Corporation Scheduler training for multi-module byte caching
CN103258005B (en) * 2013-04-12 2017-02-08 百度在线网络技术(北京)有限公司 Processing method and device for search results
CN103559259A (en) * 2013-11-04 2014-02-05 同济大学 Method for eliminating similar-duplicate webpage on the basis of cloud platform
CN103744964A (en) * 2014-01-06 2014-04-23 同济大学 Webpage classification method based on locality sensitive Hash function
CN103984753B (en) * 2014-05-28 2018-02-09 北京京东尚科信息技术有限公司 A kind of web crawlers goes the extracting method and device of multiplex eigenvalue

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158967A (en) * 2007-11-16 2008-04-09 北京交通大学 Quick-speed audio advertisement recognition method based on layered matching
CN102929891A (en) * 2011-08-11 2013-02-13 阿里巴巴集团控股有限公司 Text processing method and device
CN102722554A (en) * 2012-05-28 2012-10-10 中国人民解放军信息工程大学 Randomness weakening method of location-sensitive hash
EP2685404A2 (en) * 2012-07-10 2014-01-15 Facebook, Inc. Method and system for determining image similarity
CN103530812A (en) * 2013-07-25 2014-01-22 国家电网公司 Power grid state similarity quantitative analyzing method based on locality sensitive hashing

Also Published As

Publication number Publication date
CN106302202A (en) 2017-01-04
CN106302202B (en) 2020-07-28

Similar Documents

Publication Publication Date Title
US10097464B1 (en) Sampling based on large flow detection for network visibility monitoring
US9979624B1 (en) Large flow detection for network visibility monitoring
WO2019096122A1 (en) Data processing method and device
US10536360B1 (en) Counters for large flow detection
US20140101761A1 (en) Systems and methods for capturing, replaying, or analyzing time-series data
CN109525500B (en) Information processing method and information processing device capable of automatically adjusting threshold
US10171423B1 (en) Services offloading for application layer services
CN110162270B (en) Data storage method, storage node and medium based on distributed storage system
US20160352598A1 (en) Message aggregation, combining and compression for efficient data communications in gpu-based clusters
US10003515B1 (en) Network visibility monitoring
US9276879B2 (en) Memory transfer optimization of network adapter data placement when performing header-data split operations
WO2017107793A1 (en) Data processing method and device
US9832125B2 (en) Congestion notification system
CN108073527B (en) Cache replacement method and equipment
WO2020134620A1 (en) Method for accepting blockchain evidence storage transaction and system
US20190014016A1 (en) Data acquisition device, data acquisition method and storage medium
US20140169517A1 (en) Tracking a relative arrival order of events being stored in multiple queues using a counter
US8036217B2 (en) Method and apparatus to count MAC moves at line rate
US8830714B2 (en) High speed large scale dictionary matching
US20140215611A1 (en) Apparatus and method for detecting attack of network system
WO2015192668A1 (en) Evaluation processing method and device for voice service
US10069929B2 (en) Estimating cache size for cache routers in information centric networks
WO2017157164A1 (en) Data aggregation method and device
US9146741B2 (en) Eliminating redundant masking operations instruction processing circuits, and related processor systems, methods, and computer-readable media
WO2016184316A1 (en) Data flow limiting method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16795803

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16795803

Country of ref document: EP

Kind code of ref document: A1