CN113836440A - Processing method and device for neighbor calculation - Google Patents

Processing method and device for neighbor calculation Download PDF

Info

Publication number
CN113836440A
CN113836440A CN202111125713.7A CN202111125713A CN113836440A CN 113836440 A CN113836440 A CN 113836440A CN 202111125713 A CN202111125713 A CN 202111125713A CN 113836440 A CN113836440 A CN 113836440A
Authority
CN
China
Prior art keywords
neighbor
calculation
approximate
computation
recall rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111125713.7A
Other languages
Chinese (zh)
Inventor
欧阳利萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111125713.7A priority Critical patent/CN113836440A/en
Publication of CN113836440A publication Critical patent/CN113836440A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a processing method for near neighbor computation, which relates to the fields of artificial intelligence and big data, in particular to the fields of intelligent search, intelligent recommendation, advertisement recommendation, knowledge graph, user understanding and the like, and can be used for evaluating scenes such as recall rate of near neighbor computation and the like. The specific implementation scheme comprises the following steps: sampling the access flow in the process of performing approximate neighbor calculation to obtain at least one sampling flow; performing asynchronous real neighbor calculation on at least one sampling flow in real time to obtain at least one corresponding real neighbor calculation result; and determining a recall rate for the approximate neighbor computation based on the at least one real neighbor computation result and the corresponding at least one approximate neighbor computation result, wherein the at least one approximate neighbor computation result is obtained by performing the approximate neighbor computation on the at least one sampling traffic.

Description

Processing method and device for neighbor calculation
Technical Field
The present disclosure relates to the field of artificial intelligence and big data, and more particularly, to the fields of intelligent search, intelligent recommendation, advertisement recommendation, knowledge graph, and user understanding, which can be used in scenarios such as assessing recall rates of approximate neighbor computations.
Background
The neighbor calculation is often a global service, and the stability of the system is extremely critical. Meanwhile, the result of neighbor calculation is not ordinary numerical data, but relational data representing neighbor relations, so that it is difficult to track and monitor the accuracy of the neighbor calculation by a simple method.
Disclosure of Invention
The present disclosure provides a processing method, apparatus, device, storage medium, and computer program product for neighbor computation.
According to an aspect of the present disclosure, there is provided a processing method for neighbor computation, including: sampling the access flow in the process of performing approximate neighbor calculation to obtain at least one sampling flow; performing asynchronous real neighbor calculation on the at least one sampling flow in real time to obtain at least one corresponding real neighbor calculation result; and determining a recall rate for the approximate neighbor computation based on the at least one real neighbor computation and a corresponding at least one approximate neighbor computation, wherein the at least one approximate neighbor computation is obtained by performing the approximate neighbor computation on the at least one sample traffic.
According to another aspect of the present disclosure, there is provided a processing apparatus for neighbor computation, comprising: the sampling module is used for sampling the access flow in the process of performing approximate neighbor calculation to obtain at least one sampling flow; the real neighbor calculation module is used for performing asynchronous real neighbor calculation on the at least one sampling flow in real time to obtain at least one corresponding real neighbor calculation result; and a determination module for determining a recall rate of the approximate neighbor computation based on the at least one real neighbor computation result and a corresponding at least one approximate neighbor computation result, wherein the at least one approximate neighbor computation result is obtained by performing the approximate neighbor computation on the at least one sample traffic.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the embodiments of the present disclosure.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method according to embodiments of the present disclosure.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 illustrates a system architecture suitable for embodiments of the present disclosure;
FIG. 2 illustrates a flow diagram of a processing method for neighbor computation according to an embodiment of the present disclosure;
FIG. 3 illustrates a schematic diagram of true neighbor computation according to an embodiment of the present disclosure;
FIG. 4 illustrates a schematic diagram of approximate neighbor computation according to an embodiment of the present disclosure;
figure 5 illustrates a block diagram of a processing apparatus for neighbor computation according to an embodiment of the present disclosure; and
fig. 6 illustrates a block diagram of an electronic device used to implement the methods and apparatus of embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be understood that ANN is known collectively as approximate neighbor computation. The KNN is called K-nearest neighbors, namely K nearest neighbors are calculated, and K nearest neighbors are taken. In practical application, the nearest neighbor can be solved by adopting real nearest neighbor calculation, and the nearest neighbor can also be solved by adopting approximate nearest neighbor calculation in an accelerated way. For example, KNN algorithm, ANN algorithm, etc. are widely used in the fields of calculating the distance between vectors, correlation application, etc. For example, the method has extremely wide application in the fields of internet recommendation systems, advertisement systems, search systems and the like.
It should also be appreciated that the problem involved with the near-neighbor computation implementation available to the industry is quite complex. These all can cause serious bias and even error in the calculation result of the approximate neighbor, which brings serious systematic risk. For example, vector calculation is often a complex process, and vector expression does not have semantic characteristics, so that vector calculation is hardly noticeable, and an abnormal calculation result causes a global error. For another example, the approximate vector calculation algorithm (also called fast neighbor calculation algorithm or accelerated neighbor calculation algorithm) itself directly affects the construction of the index, the update of the index is needed for updating the approximate vector calculation algorithm each time, and if the update of the approximate vector calculation algorithm is asynchronous with the update of the index, the vector calculation result is abnormal.
In addition, the index constructed for realizing vector accelerated calculation needs to be obtained by offline calculation and then synchronized to be used on line, and the updating process has risks of updating failure, abnormal updating and the like.
Meanwhile, the result obtained by the ANN algorithm is not a common numerical result, but other objects, vectors and spatial distance sequencing of the objects and the vectors are adopted, so that the observability is poor.
In conclusion, the ANN algorithm is wide in application, large in scale, and important in influence, vector calculation can be accelerated, but the ANN calculation result does not have good observability.
Therefore, the embodiment of the present disclosure provides a processing method for neighbor computation, which can efficiently perform real-time monitoring on the recall rate of approximate neighbor computation adopted in an industrial application scenario involving large-scale vector computation.
The present disclosure will be described in detail below with reference to the drawings and specific embodiments.
A system architecture of a processing method and apparatus for neighbor computation suitable for embodiments of the present disclosure is introduced as follows.
FIG. 1 illustrates a system architecture suitable for embodiments of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be used in other environments or scenarios.
As shown in fig. 1, the system architecture 100 in the embodiment of the present disclosure may include: a server 101 and a server 102. Among other things, the server 101 may be configured to provide online near-neighbor computing services. The service end 102 may provide a true neighbor computation service, that is, true neighbor computation may be performed on samples of access traffic on the line in an asynchronous manner. The server 102 may be a server or a distributed server cluster.
An application scenario of the processing method and apparatus for neighbor computation suitable for the embodiments of the present disclosure is described as follows.
The processing method for neighbor computation provided by the embodiment of the disclosure can be applied to the fields of intelligent recommendation, intelligent search, advertisement recommendation, knowledge graph, user understanding and the like, and especially has a key application value in the core fields of intelligent recommendation, intelligent search, advertisement recommendation and the like. For example, the technical scheme can be used for a neighbor application supporting big data social network relation data.
According to an embodiment of the present disclosure, a processing method for neighbor computation is provided.
Figure 2 illustrates a flow diagram of a processing method for neighbor computation according to an embodiment of the present disclosure.
As shown in fig. 2, a processing method 200 for neighbor computation may include: operations S210 to S230.
In operation S210, in the process of performing the approximate neighbor calculation, the access traffic is sampled to obtain at least one sampled traffic.
In operation S220, asynchronous real neighbor calculation is performed on at least one sample traffic in real time to obtain a corresponding at least one real neighbor calculation result.
In operation S230, a recall rate of the approximate neighbor calculation is determined based on at least one real neighbor calculation result and at least one corresponding approximate neighbor calculation result, wherein the at least one approximate neighbor calculation result is obtained by performing the approximate neighbor calculation on at least one sampling traffic.
It should be understood that neighbor computation refers to finding the closest vector to one vector (the digitized abstraction of an object) by calculating the spatial distance from the other vector (which may be referred to as a candidate vector), i.e., the nearest neighbor (e.g., top N).
In an embodiment of the present disclosure, the true neighbor calculation may include: by calculating the Euclidean distance or cosine distance from one vector to other vectors, the vector which is closest to the vector is found, namely the nearest neighbor. In other words, the above-mentioned true neighbor calculation may include: and neighbor calculation realized based on Euclidean distance calculation or cosine distance calculation.
For example, if the nearest neighbor of the vector a is calculated by using the real neighbor calculation, the euclidean distances or the cosine distances between the vector a and all candidate vectors need to be calculated one by one, and then the vector a is sorted according to the calculation result of the real neighbor. As shown in fig. 3, taking the candidate vector B as an example, when the distance between two vectors is calculated by using the true neighbor calculation method, the euclidean distance or the cosine distance between the vector a and the vector B is directly calculated.
Thus, what is obtained with the true neighbor calculation method is the true distance between vector a and vector B. Therefore, the result obtained by the calculation of the real neighbor can be used as a reference value for judging the recall rate of the calculation of the approximate neighbor.
However, the complexity of the real neighbor calculation is very high, and it is necessary to calculate (euclidean or cosine) spatial distance for each candidate object or candidate vector, which is time-consuming, costly, and difficult to apply in large-scale industry. In practical scenarios, candidate vectors or candidates may be in the tens of millions, hundreds of millions, or even larger. In this case, if the true neighbor calculation method is adopted, there are problems of high calculation cost and slow response speed.
Therefore, for industrial application scenarios involving large-scale, very large-scale vector computations, near-neighbor computation schemes may be employed. For example, Approximate neighbor calculation may be implemented by using an approximation algorithm (analog neighbors Oh Yeah, short for simulated neighbor algorithm), an HNSW algorithm (hierarchical clustering algorithm, short for hierarchical clustering algorithm), a Faiss PQ algorithm (Product quantization algorithm, short for Product quantization algorithm), and the like.
It will be appreciated that the goal of the annoy algorithm is to build a data structure that will find the closest point to any query point in a short amount of time, at the expense of accuracy in exchange for much faster speed than brute force searches. The HNSW algorithm is a hierarchical optimization of an NSW (robust Small World, clustering algorithm for short) algorithm, can improve the query performance, starts to search from a sparse graph and gradually extends into a bottom graph. The Faiss PQ algorithm is a product quantization algorithm, wherein the product is a Cartesian product, which means that an original vector space is decomposed into Cartesian products of a plurality of low-dimensional vector spaces, and the low-dimensional vector spaces obtained by decomposition are respectively quantized. Each vector can then be represented by a quantized combination of a plurality of low-dimensional vector spaces.
In embodiments of the present disclosure, approximate neighbor computation (also known as accelerated neighbor computation or fast neighbor computation) may include: the space distance from one vector to other vectors is calculated by a clustering method, and the vector which is the nearest neighbor to the vector is found. In other words, the approximate neighbor computation may include: and (4) neighbor calculation realized based on a clustering method.
For example, if the nearest neighbor of the vector a is calculated by using the near neighbor calculation, the euclidean distances or the cosine distances between the vector a and all the candidate vectors do not need to be calculated one by one, but all clusters corresponding to the full number of candidate vectors may be determined first, then the spatial distances between the vector a and the central points of the clusters are calculated, then the spatial distances between the candidate vectors and the central points of the clusters are determined by table lookup, then the spatial distances between the vector a and the candidate vectors are obtained by fitting the two, and finally the ordering is performed according to the calculation result of the near neighbor. As shown in fig. 4, taking the candidate vector B as an example, when the distance between two vectors is calculated by using the approximate neighbor calculation method, the distance L1 between the vector a and the cluster center C of the vector B is calculated first, then the table is looked up to obtain the distance L2 between the vector B and the vector C, and the distance between the two vectors can be obtained by fitting L1 and L2. For example, L1+ L2 may be taken as the distance of vector A to vector B.
Because the number of the cluster center points corresponding to the full vector candidate vectors is much smaller than the scale of the full vector candidate vectors, and the distance from the candidate vector in each vector cluster to the cluster center point thereof can be obtained by pre-calculation, the distance from the candidate vector in the vector cluster to the cluster center point thereof can be obtained by table lookup during actual use. Therefore, the calculation cost of the approximate neighbor calculation can be greatly reduced, and the calculation efficiency can be greatly improved as a whole. However, the accuracy of the approximate neighbor calculation may decrease. In other words, the approximate neighbor calculation can simplify the calculation process, achieve the purposes of high-performance calculation and quick response, but also bring another problem that the recall rate is reduced and is no longer 100%.
Further, the accuracy of a single distance calculation ultimately affects the overall neighbor calculation recall. In the embodiment of the present disclosure, the recall rate refers to a similarity between a nearest neighbor result obtained through approximate neighbor calculation and a nearest neighbor result obtained through real neighbor calculation.
Also, the near-neighbor computation available to the industry, generally involves more complex problems. These all can cause serious deviation and even serious error of the nearest neighbor result obtained by the approximate neighbor calculation, which brings serious risk to systematics.
Meanwhile, the result of neighbor calculation is not general numerical data, but the ordering of other objects, vectors and their spatial distances, and the observability is poor.
Therefore, there is a need for a robust, real-time, industrially available recall monitoring approach for near-neighbor computations.
Therefore, in the embodiment of the disclosure, for a large-scale industrial application scenario, the approximate neighbor calculation can be used on the line to obtain the corresponding nearest neighbor, so that the vector calculation process can be simplified, and the purposes of high-performance vector calculation and fast response are achieved. Meanwhile, in the process of performing approximate neighbor calculation, the access traffic may be sampled to obtain at least one sampled traffic. For each sampling flow, real neighbor calculation can be carried out in real time in an asynchronous mode to obtain a corresponding real nearest neighbor result. The recall rate for the approximate neighbor computation may then be determined by comparing the true neighbor computation-derived nearest neighbor result with the corresponding nearest neighbor result derived based on the approximate neighbor computation, whether for each individual sample traffic or for all sample traffic as a whole.
It should be understood that in the embodiment of the present disclosure, the calculation scale may be reduced as a whole by means of sampling. Meanwhile, as the access flow in industrial application is usually large in scale, the sampling flow can fully express the actual situation on the line. In addition, by means of asynchronous calculation, the real neighbor calculation and the on-line high-performance approximate neighbor calculation can be stripped from each other, and the asynchronous calculation mode does not affect the response and the performance of the on-line service.
In addition, compared with the situation that the real online condition cannot be fed back by the pre-evaluation recall rate, the offline evaluation recall rate is poor in timeliness, and the abnormal condition of the online service cannot be found at the first time, in the embodiment of the disclosure, the real neighbor calculation is performed on the sampling flow through sampling access flow and a real-time asynchronous mode, and then the recall rate of the approximate neighbor calculation is evaluated by comparing the results obtained by the two neighbor calculations.
As an alternative embodiment, the approximate neighbor computation is performed by the first task; and performing real neighbor computation by a second task, wherein the first task and the second task are independent of each other.
In the embodiment of the disclosure, by using different tasks to implement the approximate neighbor calculation and the real neighbor calculation, it can be ensured that the two neighbor calculations are performed asynchronously, and it can be ensured that the real neighbor calculation does not affect the performance and response of the upward service, that is, the online approximate neighbor calculation.
In the embodiment of the disclosure, the hnsw algorithm can be adopted for the online approximate neighbor calculation in the first task, so that the calculation is simplified, and the timeliness is improved.
In addition, in the process of performing the approximate neighbor calculation on the line, the access traffic on the line may be sampled, for example, the access traffic on the line may be sampled in a certain proportion (e.g., 1%) by using random sampling or any other sampling method.
Further, the sampled access traffic (i.e., the sampling traffic) may be subject to true neighbor computation in an asynchronous manner in the second task. Therefore, the original flow of the online service is not influenced. Moreover, after the online service completes the calculation for each sampling flow, the corresponding nearest neighbor result can be returned to the second task, so that the flat response of the online service (namely the calculation time of the online service) is not influenced.
Furthermore, in embodiments of the present disclosure, asynchronous refers to the computation process of true neighbor computation being independent of the computation process of high performance approximate neighbor computation on the line. The real neighbor calculation in the second task can adopt a multithreading or multiprocessing mode to asynchronously calculate the real neighbor. Alternatively, the true neighbor calculation may be performed asynchronously in a multi-threaded or independent process manner. Neither the calculation process nor the calculation results of these ways directly affect the first task.
It should be noted that each thread in the multithreading method may be an independent thread. Wherein the first task process (i.e. the online high-performance approximate neighbor calculation process) is used to transfer the sampling flow and the calculation result of the first task to the real neighbor calculation which is performed asynchronously (such as queue) in an asynchronous manner. And the first task process may not wait for the results to return directly.
It should be noted that the multi-process, independent-process approach refers to performing true neighbor computation with a single process or multiple processes (which may run on different machines). Wherein, the first task process (namely, the online high-performance approximate neighbor calculation process) is used for transferring the sampling flow and the calculation result of the approximate neighbor calculation to the real neighbor calculation process. And the first task process may not wait for the results to return directly.
As an alternative embodiment, the recall rate of the approximate neighbor calculation is determined, including at least one of the following.
At least one instantaneous recall rate of the approximate neighbor calculation is determined.
Determining an average recall rate of the approximate neighbor calculation over a first preset time period.
Recall rates for approximate neighbor computations in a single scenario are determined.
Recall rates for approximate neighbor computations are determined under multiple scenarios.
In one embodiment, a recall rate may be calculated for each sample flow. The recall rate is the instantaneous recall rate. Illustratively, the instantaneous recall rate may be calculated by the following equation 1.
Equation 1: recall is M/N
Wherein Recall represents the Recall rate; m represents the number of vectors in the nearest neighbor which belongs to the real nearest neighbor and is obtained by calculation in the nearest neighbors obtained by calculation of the accelerated nearest neighbors; n represents the number of vectors in the nearest neighbor (e.g., Top N) computed from the true neighbors.
In another embodiment, for a plurality of sample flows within a period of time, a recall rate may be calculated for each sample flow, and then all the recall rates are weighted and averaged to obtain a corresponding average recall rate.
In embodiments of the present disclosure, by comparing neighbor results obtained from online high-performance approximate neighbor computations with nearest neighbor results obtained from true neighbor computations, a corresponding plurality of recall rates may be obtained. For example, the recall rate may be calculated for a single module (i.e., a single scene), or the recall rate may be calculated globally (i.e., multiple scenes). And if the instantaneous recall rate can be obtained, the average recall rate of different time spans can also be obtained, and the application is more flexible.
Through the embodiment of the disclosure, the recall rate of the approximate neighbor calculation can be evaluated in multiple angles, and the application is flexible.
As an alternative embodiment, the method further comprises: and recording the recall rate of the approximate neighbor calculation to obtain recorded information about the recall rate.
As an alternative embodiment, the method further comprises: and counting the recall rate meeting the preset condition based on the recorded information to obtain statistical data about the recall rate.
As an alternative embodiment, the method further comprises: and responding to the statistical data of the recall rate exceeding the preset value in a second preset time period, and sending corresponding prompt information.
In the embodiment of the present disclosure, every time a recall rate is calculated, a piece of record information about the recall rate may be recorded, that is, a record log may be generated. In other words, recall data may be recorded online using methods such as logging. For example, the online recall data for one or more time periods (e.g., 1 second, 1 minute) may be aggregated, counted, and recorded in connection with a single or multiple online services.
Example 1, the calculated recall rates based on all sample flows for all machines in 1 second may be averaged to obtain an average recall rate. In this example, the average may be directly accumulated and averaged, or a threshold may be set, first, a [0, 1] determination is performed on each recall rate (where 0 represents failure and 1 represents success), and then, the average is performed according to the number of times of success and failure to obtain the average recall rate.
Example 2, the calculated recall rate based on all sample flows for all machines in 10 seconds can be counted as being less than 80% of the recall rate.
Through the embodiment of the disclosure, the recall rate is recorded, so that the user can conveniently look up and analyze the recall rate, and the statistics of the recall rate can be facilitated. The recall rate is counted, particularly the overall recall rate is counted and analyzed, so that the overall recall rate can be calculated, and the recall rate of the overall view angle can be obtained.
Furthermore, related services can be triggered according to the recorded information and the statistical information about the recall rate, for example, alarm logic can be triggered to send prompt information to the user so that the user can find out abnormal conditions of online service at the first time. For example, if the single-machine service recall rate is lower than 70% and continues for 1 minute, the alarm logic can be triggered to send an alarm short message.
In addition, in the embodiment of the disclosure, the recall rate record information and the statistical information can be graphically displayed in the recall rate monitoring process, so that the recall rate record information and the statistical information can be conveniently referred by a user.
By the embodiment of the disclosure, the recall rate of the upward approximate neighbor calculation can be monitored in real time, with high performance and low cost, and the method has no influence on online service (the online approximate neighbor calculation is carried out), and the reliability of the recall rate calculation result is high.
According to an embodiment of the present disclosure, the present disclosure also provides a processing apparatus for neighbor computation.
Figure 5 illustrates a block diagram of a processing device for neighbor computation according to an embodiment of the present disclosure.
As shown in fig. 5, the processing apparatus 500 for neighbor computation may include: a sampling module 510, a true neighbor calculation module 520, and a determination module 530.
A sampling module 510, configured to sample the access traffic during the approximate neighbor computation to obtain at least one sampled traffic.
A real neighbor calculation module 520, configured to perform asynchronous real neighbor calculation on the at least one sampling traffic in real time to obtain a corresponding at least one real neighbor calculation result.
A determining module 530 configured to determine a recall rate of the approximate neighbor computation based on the at least one real neighbor computation result and a corresponding at least one approximate neighbor computation result, wherein the at least one approximate neighbor computation result is obtained by performing the approximate neighbor computation on the at least one sample traffic.
As an alternative embodiment, the approximate neighbor computation is performed by a first task; and performing the true neighbor computation by a second task, wherein the first task and the second task are independent of each other.
As an alternative embodiment, the determining module comprises at least one of: a first determining unit for determining at least one instantaneous recall rate of the approximate neighbor calculation; a second determining unit, configured to determine an average recall rate of the approximate neighbor calculation within a first preset time period; a third determining unit, configured to determine a recall rate of the approximate neighbor computation in a single scene; a fourth determining unit, configured to determine recall rates of the approximate neighbor computation in multiple scenarios.
As an alternative embodiment, the apparatus further comprises: and the recording module is used for recording the recall rate of the approximate neighbor calculation so as to obtain the recording information about the recall rate.
As an alternative embodiment, the apparatus further comprises: and the counting module is used for counting the recall rate meeting the preset condition based on the record information so as to obtain the statistical data about the recall rate.
As an alternative embodiment, the apparatus further comprises: and the sending module is used for responding to the fact that the statistical data of the recall rate exceed the preset value in a second preset time period and sending corresponding prompt information.
As an alternative embodiment, the true neighbor calculation includes: neighbor calculation based on Euclidean distance calculation or cosine distance calculation; and/or the approximate neighbor computation comprises: and (4) neighbor calculation realized based on a clustering method.
It should be understood that the embodiments of the apparatus part of the present disclosure are the same as or similar to the embodiments of the method part of the present disclosure, and the technical problems to be solved and the technical effects to be achieved are also the same as or similar to each other, and the detailed description of the present disclosure is omitted.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the electronic device 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Various components in the electronic device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the processing method for neighbor calculation. For example, in some embodiments, the processing method for neighbor computation may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the processing method for neighbor computation described above may be performed. Alternatively, in other embodiments, the calculation unit 601 may be configured by any other suitable means (e.g. by means of firmware) to perform the processing method for the neighbor calculation.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.
In the technical scheme of the disclosure, the related flow data are recorded, stored, applied and the like, which all accord with the regulations of related laws and regulations and do not violate the good customs of the public order.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. A processing method for neighbor computation, comprising:
sampling the access flow in the process of performing approximate neighbor calculation to obtain at least one sampling flow;
performing asynchronous real neighbor calculation on the at least one sampling flow in real time to obtain at least one corresponding real neighbor calculation result; and
determining a recall rate for the approximate neighbor computation based on the at least one real neighbor computation result and a corresponding at least one approximate neighbor computation result,
wherein the at least one approximate neighbor computation result is obtained by performing the approximate neighbor computation on the at least one sample traffic.
2. The method of claim 1, wherein:
performing the approximate neighbor computation by a first task; and
performing the true neighbor computation by a second task, wherein the first task and the second task are independent of each other.
3. The method of claim 1, wherein determining a recall rate for the approximate neighbor computation comprises at least one of:
determining at least one instantaneous recall rate of the approximate neighbor calculation;
determining an average recall rate of the approximate neighbor calculation over a first preset time period;
determining a recall rate of the approximate neighbor computation in a single scenario;
determining recall of the approximate neighbor computation under a plurality of scenarios.
4. The method of claim 3, further comprising:
and recording the recall rate of the approximate neighbor calculation to obtain recorded information about the recall rate.
5. The method of claim 4, further comprising:
and counting the recall rate meeting the preset condition based on the recorded information to obtain statistical data about the recall rate.
6. The method of claim 5, further comprising:
and responding to the statistical data of the recall rate exceeding the preset value in a second preset time period, and sending corresponding prompt information.
7. The method of any of claims 1-6, wherein:
the true neighbor computation includes: neighbor calculation realized directly based on Euclidean distance calculation or cosine distance calculation; and/or
The approximate neighbor computation includes: and (4) neighbor calculation realized based on a clustering method.
8. A processing apparatus for neighbor computation, comprising:
the sampling module is used for sampling the access flow in the process of performing approximate neighbor calculation to obtain at least one sampling flow;
the real neighbor calculation module is used for performing asynchronous real neighbor calculation on the at least one sampling flow in real time to obtain at least one corresponding real neighbor calculation result; and
a determination module to determine a recall rate for the approximate neighbor computation based on the at least one real neighbor computation result and a corresponding at least one approximate neighbor computation result,
wherein the at least one approximate neighbor computation result is obtained by performing the approximate neighbor computation on the at least one sample traffic.
9. The apparatus of claim 8, wherein:
performing the approximate neighbor computation by a first task; and
performing the true neighbor computation by a second task, wherein the first task and the second task are independent of each other.
10. The apparatus of claim 8, wherein the means for determining comprises at least one of:
a first determining unit for determining at least one instantaneous recall rate of the approximate neighbor calculation;
a second determining unit, configured to determine an average recall rate of the approximate neighbor calculation within a first preset time period;
a third determining unit, configured to determine a recall rate of the approximate neighbor computation in a single scene;
a fourth determining unit, configured to determine recall rates of the approximate neighbor computation in multiple scenarios.
11. The apparatus of claim 10, further comprising:
and the recording module is used for recording the recall rate of the approximate neighbor calculation so as to obtain the recording information about the recall rate.
12. The apparatus of claim 11, further comprising:
and the counting module is used for counting the recall rate meeting the preset condition based on the record information so as to obtain the statistical data about the recall rate.
13. The apparatus of claim 12, further comprising:
and the sending module is used for responding to the fact that the statistical data of the recall rate exceed the preset value in a second preset time period and sending corresponding prompt information.
14. The apparatus of any one of claims 8 to 13, wherein:
the true neighbor computation includes: neighbor calculation realized directly based on Euclidean distance calculation or cosine distance calculation; and/or
The approximate neighbor computation includes: and (4) neighbor calculation realized based on a clustering method.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.
CN202111125713.7A 2021-09-24 2021-09-24 Processing method and device for neighbor calculation Pending CN113836440A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111125713.7A CN113836440A (en) 2021-09-24 2021-09-24 Processing method and device for neighbor calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111125713.7A CN113836440A (en) 2021-09-24 2021-09-24 Processing method and device for neighbor calculation

Publications (1)

Publication Number Publication Date
CN113836440A true CN113836440A (en) 2021-12-24

Family

ID=78970079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111125713.7A Pending CN113836440A (en) 2021-09-24 2021-09-24 Processing method and device for neighbor calculation

Country Status (1)

Country Link
CN (1) CN113836440A (en)

Similar Documents

Publication Publication Date Title
CN111984499A (en) Fault detection method and device for big data cluster
CN115033463B (en) System exception type determining method, device, equipment and storage medium
CN114580916A (en) Enterprise risk assessment method and device, electronic equipment and storage medium
CN113987086A (en) Data processing method, data processing device, electronic device, and storage medium
CN115375039A (en) Industrial equipment fault prediction method and device, electronic equipment and storage medium
CN111461306B (en) Feature evaluation method and device
CN115329748B (en) Log analysis method, device, equipment and storage medium
CN116955856A (en) Information display method, device, electronic equipment and storage medium
CN111666191A (en) Data quality monitoring method and device, electronic equipment and storage medium
CN113836440A (en) Processing method and device for neighbor calculation
CN115600607A (en) Log detection method and device, electronic equipment and medium
CN114581711A (en) Target object detection method, apparatus, device, storage medium, and program product
CN114661562A (en) Data warning method, device, equipment and medium
CN114428711A (en) Data detection method, device, equipment and storage medium
CN114492364A (en) Same vulnerability judgment method, device, equipment and storage medium
CN113887101A (en) Visualization method and device of network model, electronic equipment and storage medium
CN113722593A (en) Event data processing method and device, electronic equipment and medium
CN115905492A (en) Alarm information analysis method, device, equipment and medium
CN114897073A (en) Model iteration method and device for intelligent industry and electronic equipment
CN115774648A (en) Abnormity positioning method, device, equipment and storage medium
CN115392396A (en) Information processing method and device, electronic equipment and readable storage medium
CN115455019A (en) Search intention identification method, device and equipment based on user behavior analysis
CN114140032A (en) Facility running state monitoring method, device, equipment and storage medium
CN115964637A (en) Data processing method and device, electronic equipment and storage medium
CN116088769A (en) Asynchronous chip, data carrying method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination