CN113535824A

CN113535824A - Data searching method and device, electronic equipment and storage medium

Info

Publication number: CN113535824A
Application number: CN202110850414.3A
Authority: CN
Inventors: 陈畅怀
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-10-22
Anticipated expiration: 2041-07-27

Abstract

The embodiment of the application provides a data searching method, a data searching device, electronic equipment and a storage medium, and the data searching device is used for acquiring target data needing to be inquired and submitted by a data inquiring party; calculating the similarity between the target data and each sample data, determining a plurality of similarity intervals, and distributing each sample data to the corresponding similarity interval according to the similarity between each sample data and the target data; selecting sample data in a designated similarity interval according to a preset interval selection rule, and sorting the selected sample data according to the similarity to obtain a first sorting result; and sending the first sequencing result to a data inquiring party. When the first sequencing result is sent to the data query side, only the sample data in the specified similarity interval is sequenced, and not all the sample data, so that the efficiency of data search can be increased.

Description

Data searching method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data search method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of information search and artificial intelligence technologies, the application of information search technologies has covered all industries, such as image search, video search, document search, web search, and other multimedia information search.

In the related technology, after the data to be retrieved is acquired, the similarity between the data and all sample data in the database is calculated, all the sample data is sorted according to the sequence from high similarity to low similarity, the sorted queues of all the sample data are cached, and then the corresponding sample data is selected according to the sorted queues of all the sample data and fed back to the data query party.

However, by adopting the method, all sample data needs to be sorted, the response time tends to increase rapidly with the increase of the data scale, a large amount of computing resources are consumed, and the efficiency of data searching is seriously influenced.

Disclosure of Invention

An object of the embodiments of the present application is to provide a data search method, an apparatus, an electronic device, and a storage medium, so as to increase efficiency of data search. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a data search method, where the method includes:

acquiring target data to be inquired submitted by a data inquirer;

calculating the similarity between the target data and each sample data, determining a plurality of similarity intervals, and distributing each sample data to the corresponding similarity interval according to the similarity between each sample data and the target data;

selecting sample data in a designated similarity interval according to a preset interval selection rule, and sorting the selected sample data according to the similarity to obtain a first sorting result;

and sending the first sequencing result to the data inquirer.

In a possible implementation manner, the calculating a similarity between the target data and each sample data, determining a plurality of similarity intervals, and allocating each sample data to a corresponding similarity interval according to the similarity between each sample data and the target data includes:

calculating the similarity between the target data and each sample data according to a preset sequence, and adjusting the similarity range corresponding to each similarity interval according to the upper and lower boundaries of each similarity obtained currently;

and according to the similarity between each sample data and the target data, distributing each sample data to a corresponding similarity interval.

In a possible implementation manner, after the allocating each sample data into a corresponding similarity interval according to the similarity between each sample data and the target data, the method further includes:

for any similarity interval, when the number of the sample data in the similarity interval exceeds a preset number threshold, the similarity interval is divided into a plurality of similarity intervals again, and the sample data in each of the divided similarity intervals is correspondingly adjusted.

In a possible implementation manner, the selecting, according to a preset interval selection rule, sample data in an assigned similarity interval, and sorting the selected sample data according to the similarity to obtain a first sorting result includes:

obtaining the number of sample data which can be displayed at most on the data query side single page to obtain a first numerical value;

according to the sequence of similarity from high to low, selecting a first second numerical value similarity interval as a designated similarity interval, wherein the total number of sample data in the first second numerical value similarity interval is not less than the first numerical value, the total number of sample data in a first third numerical value similarity interval is less than the first numerical value, and the third numerical value is equal to the second numerical value minus 1;

and sorting the sample data in the specified similarity interval according to the sequence of the similarity from high to low to obtain a first sequence, and selecting the first numerical value sample data in the first sequence as a first sorting result.

In one possible implementation, after the sending the first sorting result to the data inquirer, the method further includes:

when receiving a query message of the data query party, wherein the query message indicates that more query results are requested, determining the number of other sample data in the specified similarity interval except the first sequencing result to obtain a fourth numerical value;

according to the fourth numerical value and the first numerical value, calculating the number of the sample data which needs to be selected to obtain a fifth numerical value;

according to the sequence of similarity from high to low, selecting a sixth numerical value similarity interval from other similarity intervals except the specified similarity interval as a current specified similarity interval, wherein the total quantity of sample data in the sixth numerical value similarity interval is not less than the fifth numerical value, the total quantity of the sample data in a seventh numerical value similarity interval is less than the fifth numerical value, and the seventh numerical value is equal to the sixth numerical value minus 1;

sorting the sample data in the currently specified similarity interval according to the sequence of similarity from high to low to obtain a second sequence, and selecting the last fourth numerical sample data in the first sequence and the first fifth numerical sample data in the second sequence as a second sorting result;

and sending the second sequencing result to the data inquirer.

for each similarity interval in which sample data is not sorted, selecting sample data in an eighth-value similarity interval according to a sequence of similarity from high to low to sort the sample data to obtain a third sorting result, wherein the eighth value is the number of preset intervals or meets the condition that the total number of the sample data in the eighth-value similarity interval in each similarity interval in which sample data is not sorted is not less than the preset sample value, the total number of the sample data in the ninth-value similarity interval is less than the preset sample value, and the ninth value is equal to the eighth value minus 1;

and sending the third sequencing result to the data inquirer.

In one possible embodiment of the method according to the invention,

after the sending the first sorted results to the data querier, the method further comprises:

when receiving a query message of the data query party, the query message indicating that a query result of a tenth numerical value page is displayed, selecting an eleventh numerical value sample interval to a twelfth numerical value sample interval as a target sample interval according to a first numerical value of sample data which can be displayed most on a single page of the data query party and the quantity of the sample data in each sample interval in the order of similarity from high to low, wherein the total number of sample data in the first eleventh numerical value minus 1 sample interval is not more than a thirteenth numerical value, the thirteenth numerical value is equal to the product of the first numerical value and the tenth numerical value minus the first data, the total number of sample data in the first eleventh numerical value sample interval is more than the thirteenth numerical value, the total number of sample data in the first twelfth numerical value minus 1 sample interval is less than a fourteenth numerical value, and the fourteenth numerical value is equal to the product of the first numerical value and the tenth numerical value, the total number of sample data in the first twelfth numerical sample interval is not less than the fourteenth numerical value;

sequencing the sample data in the target sample interval according to the sequence of similarity from high to low to obtain a third sequence;

selecting sample data of a fifteenth numerical value to a sixteenth numerical value from the third sequence as a fourth sorting result, wherein the fifteenth numerical value is equal to the thirteenth numerical value minus a seventeenth numerical value plus 1, the sixteenth numerical value is equal to the fourteenth numerical value minus the seventeenth numerical value, and the seventeenth numerical value is the total number of the sample data in a sample interval of first eleventh numerical value minus 1;

and sending the fourth sorting result to the data inquirer.

In a second aspect, an embodiment of the present application provides a data search apparatus, where the apparatus includes:

the target data acquisition module is used for acquiring target data to be inquired submitted by a data inquiry party;

the sample data distribution module is used for calculating the similarity between the target data and each sample data, determining a plurality of similarity intervals and distributing each sample data to the corresponding similarity interval according to the similarity between each sample data and the target data;

the sample data sorting module is used for selecting sample data in the specified similarity interval according to a preset interval selection rule, and sorting the selected sample data according to the similarity to obtain a first sorting result;

and the sequencing result sending module is used for sending the first sequencing result to the data inquirer.

In a possible implementation manner, the sample data distribution module is specifically configured to: calculating the similarity between the target data and each sample data according to a preset sequence, and adjusting the similarity range corresponding to each similarity interval according to the upper and lower boundaries of each similarity obtained currently; and according to the similarity between each sample data and the target data, distributing each sample data to a corresponding similarity interval.

In a possible implementation, the sample data allocation module is further configured to: for any similarity interval, when the number of the sample data in the similarity interval exceeds a preset number threshold, the similarity interval is divided into a plurality of similarity intervals again, and the sample data in each of the divided similarity intervals is correspondingly adjusted.

In a possible implementation, the sample data sorting module includes:

the sample data quantity obtaining submodule is used for obtaining the quantity of the sample data which can be displayed at most on the data inquiry side single page to obtain a first numerical value;

the similarity interval selection submodule is used for selecting a second numerical value similarity interval as a designated similarity interval according to the sequence of similarity from high to low, wherein the total quantity of sample data in the second numerical value similarity interval is not less than the first numerical value, the total quantity of the sample data in a third numerical value similarity interval is less than the first numerical value, and the third numerical value is equal to the second numerical value minus 1;

and the sample data selecting submodule is used for sequencing the sample data in the specified similarity interval from high to low according to the similarity to obtain a first sequence, and selecting the first numerical sample data in the first sequence as a first sequencing result.

In a possible implementation, the apparatus further includes a data delay ordering module configured to: when receiving a query message of the data query party, wherein the query message indicates that more query results are requested, determining the number of other sample data in the specified similarity interval except the first sequencing result to obtain a fourth numerical value; according to the fourth numerical value and the first numerical value, calculating the number of the sample data which needs to be selected to obtain a fifth numerical value; according to the sequence of similarity from high to low, selecting a sixth numerical value similarity interval from other similarity intervals except the specified similarity interval as a current specified similarity interval, wherein the total quantity of sample data in the sixth numerical value similarity interval is not less than the fifth numerical value, the total quantity of the sample data in a seventh numerical value similarity interval is less than the fifth numerical value, and the seventh numerical value is equal to the sixth numerical value minus 1; sorting the sample data in the currently specified similarity interval according to the sequence of similarity from high to low to obtain a second sequence, and selecting the last fourth numerical sample data in the first sequence and the first fifth numerical sample data in the second sequence as a second sorting result; and sending the second sequencing result to the data inquirer.

In a possible implementation, the apparatus further includes a data delay ordering module configured to: for each similarity interval in which sample data is not sorted, selecting sample data in an eighth-value similarity interval according to a sequence of similarity from high to low to sort the sample data to obtain a third sorting result, wherein the eighth value is the number of preset intervals or meets the condition that the total number of the sample data in the eighth-value similarity interval in each similarity interval in which sample data is not sorted is not less than the preset sample value, the total number of the sample data in the ninth-value similarity interval is less than the preset sample value, and the ninth value is equal to the eighth value minus 1; and sending the third sequencing result to the data inquirer.

In a possible implementation, the apparatus further includes a data delay ordering module configured to: when receiving a query message of the data query party, the query message indicating that a query result of a tenth numerical value page is displayed, selecting an eleventh numerical value sample interval to a twelfth numerical value sample interval as a target sample interval according to a first numerical value of sample data which can be displayed most on a single page of the data query party and the quantity of the sample data in each sample interval in the order of similarity from high to low, wherein the total number of sample data in the first eleventh numerical value minus 1 sample interval is not more than a thirteenth numerical value, the thirteenth numerical value is equal to the product of the first numerical value and the tenth numerical value minus the first data, the total number of sample data in the first eleventh numerical value sample interval is more than the thirteenth numerical value, the total number of sample data in the first twelfth numerical value minus 1 sample interval is less than a fourteenth numerical value, and the fourteenth numerical value is equal to the product of the first numerical value and the tenth numerical value, the total number of sample data in the first twelfth numerical sample interval is not less than the fourteenth numerical value; sequencing the sample data in the target sample interval according to the sequence of similarity from high to low to obtain a third sequence; selecting sample data of a fifteenth numerical value to a sixteenth numerical value from the third sequence as a fourth sorting result, wherein the fifteenth numerical value is equal to the thirteenth numerical value minus a seventeenth numerical value plus 1, the sixteenth numerical value is equal to the fourteenth numerical value minus the seventeenth numerical value, and the seventeenth numerical value is the total number of the sample data in a sample interval of first eleventh numerical value minus 1; and sending the fourth sorting result to the data inquirer.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to implement the data search method according to any one of the present applications when executing the program stored in the memory.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the data search method described in any of the present application.

The embodiment of the application has the following beneficial effects:

according to the data searching method, the data searching device, the electronic equipment and the storage medium, target data needing to be inquired and submitted by a data inquiring party are obtained; calculating the similarity between the target data and each sample data, determining a plurality of similarity intervals, and distributing each sample data to the corresponding similarity interval according to the similarity between each sample data and the target data; selecting sample data in a designated similarity interval according to a preset interval selection rule, and sorting the selected sample data according to the similarity to obtain a first sorting result; and sending the first sequencing result to a data inquiring party. When the first sequencing result is sent to the data query side, only the sample data in the specified similarity interval is sequenced, and not all the sample data, so that the efficiency of data search can be increased. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a schematic diagram of a data search method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a possible implementation manner of step S102 in the embodiment of the present application;

fig. 3 is a schematic diagram of a possible implementation manner of step S103 in the embodiment of the present application;

FIG. 4a is a schematic diagram of a data search method according to an embodiment of the present application;

fig. 4b is a schematic diagram of a first possible implementation manner of step S105 in the embodiment of the present application;

fig. 5 is a schematic diagram of a second possible implementation manner of step S105 in the embodiment of the present application;

fig. 6 is a schematic diagram of a third possible implementation manner of step S105 in the embodiment of the present application;

FIG. 7 is a diagram illustrating a data search apparatus according to an embodiment of the present application;

fig. 8 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the description herein are intended to be within the scope of the present disclosure.

In order to increase the efficiency of data search, an embodiment of the present application provides a data search method, and with reference to fig. 1, the method includes:

s101, target data needing to be inquired and submitted by a data inquiring party are obtained.

The data searching method of the embodiment of the application can be realized through electronic equipment, and specifically, the electronic equipment can be a personal computer, a hard disk video recorder, a database server or a retrieval server and the like. The target data is data to be queried submitted by a data querying party, and the target data can be data in the form of characters, images, videos, sounds or tables, and the like, and is within the protection scope of the application.

And S102, calculating the similarity between the target data and each sample data, determining a plurality of similarity intervals, and distributing each sample data to the corresponding similarity interval according to the similarity between each sample data and the target data.

The similarity between the target data and each sample data can be calculated in parallel by utilizing a plurality of threads, and the sample data is distributed into corresponding similarity intervals according to the similarity; the similarity interval corresponding to the sample data refers to a similarity range of the similarity interval including the similarity between the sample data and the target data, for example, if the similarity between the sample data X and the target data is 60%, the similarity range of the similarity interval a is 80% -100%, the similarity range of the similarity interval b is 50% -80%, and the similarity range of the similarity interval c is 30% -50%, the sample data X is allocated to the similarity interval b.

The similarity interval may be divided in advance or in real time, and the division of the similarity interval may be equally divided or unequally divided, all of which are within the protection scope of the present application. In one example, the upper and lower boundaries of the similarity distribution may be obtained, and the number of similarity intervals may be specified according to the division of the upper and lower boundaries of the similarity. In one possible embodiment, referring to fig. 2, the calculating the similarity between the target data and each sample data, determining a plurality of similarity intervals, and allocating each sample data to a corresponding similarity interval according to the similarity between each sample data and the target data includes:

and S1021, calculating the similarity between the target data and each sample data according to a preset sequence, and adjusting the similarity range corresponding to each similarity interval according to the upper and lower boundaries of each similarity obtained currently.

And S1022, allocating each sample data to a corresponding similarity interval according to the similarity between each sample data and the target data.

The similarity of the target data and the sample data can be calculated by using different threads, and simultaneously, according to the upper and lower boundaries of each similarity obtained currently, the similarity range corresponding to each similarity interval can be dynamically adjusted, for example, a certain similarity interval can be subdivided into a plurality of similarity intervals, some similarity intervals can be combined into one similarity interval, and the like.

In one possible embodiment, after the sample data are allocated to the corresponding similarity intervals according to the similarities between the sample data and the target data, the method further includes: for any similarity interval, when the number of the sample data in the similarity interval exceeds a preset number threshold, the similarity interval is divided into a plurality of similarity intervals again, and the sample data in each of the divided similarity intervals is correspondingly adjusted.

When the sample data included in one similarity interval exceeds a preset number threshold, the similarity interval is unreasonably divided, the number of the included sample data is large, and if the sample data in the similarity interval is sequenced, a lot of 'useless work' is done, so that the similarity interval needs to be divided into a plurality of similarity intervals again to reduce the number of the sample data in each similarity interval, thereby reducing the 'useless work' in the sequencing process and improving the data searching efficiency. The preset number threshold may be determined according to the number O of sample data that can be displayed at most in the data query square page, and may be set to O, 0.8O, 0.7O, 0.6O, 0.5O, 0.4O, 0.3O, or the like, for example.

S103, selecting sample data in the designated similarity interval according to a preset interval selection rule, and sorting the selected sample data according to the similarity to obtain a first sorting result.

The preset interval selection rule may be set in a self-defined manner according to actual conditions, for example, a preset number of similarity intervals may be selected as the designated similarity interval according to the sequence of similarity from high to low by default.

In an example, referring to fig. 3, selecting sample data in an assigned similarity interval according to a preset interval selection rule, and sorting the selected sample data according to the similarity to obtain a first sorting result includes:

and S1031, obtaining the number O (first numerical value) of sample data which can be displayed at most on the data inquiry square page.

S1032, according to the sequence of similarity from high to low, selecting the first N (second numerical value) similarity intervals as the designated similarity intervals, wherein the total number of the sample data in the first N (second numerical value) similarity intervals is not less than O, the total number of the sample data in the first N-1 (third numerical value) similarity intervals is less than O, and N is a positive integer.

S1033, sorting the sample data in the specified similarity interval according to the sequence of the similarity from high to low to obtain a first sequence, and selecting the first O sample data in the first sequence as a first sorting result.

When the data inquirer displays the search result, the search result is generally displayed in a paging mode, and the maximum number of O sample data can be displayed on each page, so that the minimum N meeting the following conditions can be selected according to the sequence of the similarity from high to low:

wherein n is_iThe number of sample data in the ith similarity interval (in order of high to low similarity).

And S104, sending the first sequencing result to the data inquirer.

And sending the first sorting result to the data inquirer so that the data inquirer displays the first sorting result.

In a possible implementation, referring to fig. 4a, after step S104, the method further includes: and S105, aiming at other sample data except the first sequencing result, sending the other sample data to the data inquiry party in a delayed sequencing mode.

The method for delaying sorting of the unsorted sample data is adopted, for example, the unsorted sample data is not sorted temporarily, the sample data is sorted only when a data query party triggers to check more sample data, all results cannot be checked in most of the time, generally, only T results which are sorted in the front are checked, and T is less than S, so that the sample which does not need to be checked by a user does not need to be sorted, the consumption of sorting resources of the part of samples is saved, and the calculation resources can be greatly saved.

In a possible implementation manner, referring to fig. 4b, the sending, by using a delayed sorting method, other sample data except for the first sorting result to the data querying party includes:

s1051, when receiving the query message indicating that the data querying party requests more query results, determining the number M (fourth value) of other sample data in the specified similarity interval except the first ranking result.

And S1052, calculating the quantity A (fifth numerical value) of the sample data which needs to be selected according to the M and the A. In one example, a ═ O-M.

S1053, according to the sequence of similarity from high to low, selecting the first B (sixth value) similarity intervals as the current appointed similarity interval from other similarity intervals except the appointed similarity interval, wherein the total number of sample data in the first B (sixth value) similarity intervals is not less than A, the total number of sample data in the first B-1 (seventh value) similarity intervals is less than A, and B is a positive integer.

S1054, sorting the sample data in the current designated similarity interval according to the sequence of the similarity from high to low to obtain a second sequence, and selecting the last M sample data in the first sequence and the first A sample data in the second sequence as a second sorting result.

And S1055, sending the second sorting result to the data inquirer.

When the data querying party requests more query results, O sample data may be selected again and sent to the data querying party, and the selection process is similar to the process of S1051-S1054, and is not described here again.

The remaining unordered similarity intervals are sorted, the remaining similarity intervals may be sorted at one time, or sorting by intervals may be selected, for example, the Q similarity intervals are sorted each time according to the sequence of similarity from high to low, or several similarity intervals in which the total number of sample data in the continuous similarity interval is greater than the preset sample value R are sorted. The peak value requirement of the retrieval ordering on the resource consumption can be reduced, the unordered samples are ordered in the time when the user views the retrieval result returned preferentially, the resource consumption is dispersed, the system response speed can be improved, the use cost can be reduced, the accuracy of the retrieval result can be guaranteed not to be influenced, and the user feels no sense.

In a possible implementation manner, referring to fig. 5, the sending, by using a delayed sorting method, other sample data to the data querying party for other sample data except the first sorting result includes:

S105A, aiming at each similarity interval without sample data sorting, selecting sample data in the first Q (eighth numerical value) similarity intervals for sorting according to the sequence of similarity from high to low to obtain a third sorting result, wherein Q is the number of preset intervals or the total number of the sample data in the first Q similarity intervals in each similarity interval without sample data sorting is not less than R, the total number of the sample data in the first Q-1 (ninth numerical value) similarity intervals is less than R, and R is the preset sample numerical value;

S105B, sending the third sorting result to the data inquirer.

When the first sequencing result is returned, the total paging display information of the retrieval result can be sent to the data inquirer, so that the data inquirer can select a subsequent page to be checked. After finishing the sorting and sending of the sample data of the page needing to be checked by the current data inquirer, other sample data can be not sorted temporarily, and the subsequent P pages of the current page can also be sorted. Generally speaking, a data query party selects to view subsequent results according to a page sequence, so that when a user views a current page, a plurality of pages behind the current page are sorted, and when the user directly turns pages in sequence, the sorted subsequent page results can be directly displayed. And other sample data can not be sequenced, and only when the data inquiry party triggers an instruction for checking a certain page, the sample data corresponding to the page can be sequenced and returned to be displayed to the data inquiry party. The similarity intervals needing to be sorted can be determined according to the number of the sample data in the similarity intervals and the pages needing to be checked, and only the sample data in the similarity intervals are sorted.

In a possible implementation manner, referring to fig. 6, the sending, by using a delayed sorting method, other sample data except for the first sorting result to the data querying party includes:

s105a, when receiving the query message of the data query party indicating that the query result of the C (tenth numerical value) th page is displayed, selecting the sample intervals from the No. D (eleventh numerical value) sample interval to the No. E (twelfth numerical value) sample interval as target sample intervals according to the number O of the sample data which can be displayed most on a single page of the data query party and the number of the sample data in each sample interval in the order from high similarity to low similarity, wherein the total number of the sample data in the first D-1 sample intervals is not more than (C-1) xO (thirteenth numerical value), the total number of the sample data in the first D sample intervals is more than (C-1) xO, the total number of the sample data in the first E-1 sample intervals is less than C x O (fourteenth numerical value), and the total number of the sample data in the first E sample intervals is not less than C x O;

s105b, sequencing the sample data in the target sample interval according to the sequence of similarity from high to low to obtain a third sequence;

S105C, selecting the (C-1) th to the CxO-F (sixteenth numerical value) sample data in the third sequence as a fourth sorting result, wherein F (seventeenth numerical value) is the total number of the sample data in the first D-1 sample interval;

s105d, sending the fourth sorting result to the data inquirer.

For example, it is assumed that the similarity interval is divided into 7 similarity intervals of [0,0.2 ], [0.2,0.4 ], [0.4,0.6 ], [0.6,0.7 ], [0.7,0.8), [0.8,0.9) and [0.9,1], the number of sample data in the corresponding interval similar to the target data in the database is 20, 30, 35, 56, 90, 60, respectively, and the maximum number of sample data that can be displayed on a single page of the data query party when the search result is displayed is 20. The whole database contains 381 pieces of data, and the data needs to be displayed in 20 pages, and only one sample is displayed in the last page. If the current data inquirer is checking the first page, after checking the first page, selecting to check the result of the 8 th page, if the 8 th page corresponds to 141-160 items in the sorting result, the page corresponds to the similarity interval, the results are distributed in two intervals of [0.7,0.8) and [0.8,0.9), if the two intervals are not sorted, the two intervals are sorted, and the sample data corresponding to the 8 th page after sorting is selected to be displayed, namely the 81 th to 90 th items in the interval [0.8,0.9) and the 1 st to 10 th items in the interval.

By adopting the method in the embodiment of the application, only the results which need to be checked are sorted, and the results which are not checked are not sorted, so that resources are saved. And because the number of samples in the corresponding interval in the viewed page is small, the real-time performance of searching is not influenced. In addition, whether the similarity interval needs to be divided more finely can be judged according to the number of the sample data in the similarity interval corresponding to the page selected by the data inquiry party, so that the searching efficiency is further improved, and the consumption of computing resources is reduced.

The delay sequencing in the embodiment of the application can improve the system concurrency under the same hardware resource. Assuming that the number of database samples is S, the time complexity for fully sorting the database samples is S log S. By the method in the embodiment of the application, the sample data are distributed to M subintervals according to the similarity, the samples are assumed to be uniformly distributed, the similarity intervals are also divided at equal intervals, then N similarity intervals are selected for priority display, and the number of the samples with priority is equal to that of the samples with priority

The time complexity when the result is first returned is

Since M > N, therefore

While the resources consumed by the retrieval system are linearly related to the sequencing time complexity. Therefore, under the same hardware resource, the retrieval system realized by the scheme can support more users to use simultaneously, namely the concurrent support quantity of the system can be obviously improved, or the requirement of the hardware resource is obviously reduced under the condition of supporting the same concurrent support quantity, and the use cost of the system is reduced. For unsorted samples.

An embodiment of the present application further provides a data search apparatus, referring to fig. 7, the apparatus includes:

the target data acquisition module 11 is used for acquiring target data to be inquired submitted by a data inquiring party;

the sample data distribution module 12 is configured to calculate a similarity between the target data and each sample data, determine multiple similarity intervals, and distribute each sample data to a corresponding similarity interval according to the similarity between the sample data and the target data;

the sample data sorting module 13 is configured to select sample data in an assigned similarity interval according to a preset interval selection rule, and sort the selected sample data according to the similarity to obtain a first sorting result;

and the sorting result sending module 14 is configured to send the first sorting result to the data querying party.

In a possible embodiment, the apparatus further comprises:

and the data delay sorting module is used for sending other sample data except the first sorting result to the data inquiry party in a delay sorting mode.

In a possible implementation, the sample data sorting module includes:

In a possible implementation manner, the data delay sorting module is specifically configured to: when receiving a query message of the data query party, wherein the query message indicates that more query results are requested, determining the number of other sample data in the specified similarity interval except the first sequencing result to obtain a fourth numerical value; according to the fourth numerical value and the first numerical value, calculating the number of the sample data which needs to be selected to obtain a fifth numerical value; according to the sequence of similarity from high to low, selecting a sixth numerical value similarity interval from other similarity intervals except the specified similarity interval as a current specified similarity interval, wherein the total quantity of sample data in the sixth numerical value similarity interval is not less than the fifth numerical value, the total quantity of the sample data in a seventh numerical value similarity interval is less than the fifth numerical value, and the seventh numerical value is equal to the sixth numerical value minus 1; sorting the sample data in the currently specified similarity interval according to the sequence of similarity from high to low to obtain a second sequence, and selecting the last fourth numerical sample data in the first sequence and the first fifth numerical sample data in the second sequence as a second sorting result; and sending the second sequencing result to the data inquirer.

In a possible implementation manner, the data delay sorting module is specifically configured to: for each similarity interval in which sample data is not sorted, selecting sample data in an eighth-value similarity interval according to a sequence of similarity from high to low to sort the sample data to obtain a third sorting result, wherein the eighth value is the number of preset intervals or meets the condition that the total number of the sample data in the eighth-value similarity interval in each similarity interval in which sample data is not sorted is not less than the preset sample value, the total number of the sample data in the ninth-value similarity interval is less than the preset sample value, and the ninth value is equal to the eighth value minus 1; and sending the third sequencing result to the data inquirer.

In a possible implementation manner, the data delay sorting module is specifically configured to: when receiving a query message of the data query party, the query message indicating that a query result of a tenth numerical value page is displayed, selecting an eleventh numerical value sample interval to a twelfth numerical value sample interval as a target sample interval according to a first numerical value of sample data which can be displayed most on a single page of the data query party and the quantity of the sample data in each sample interval in the order of similarity from high to low, wherein the total number of sample data in the first eleventh numerical value minus 1 sample interval is not more than a thirteenth numerical value, the thirteenth numerical value is equal to the product of the first numerical value and the tenth numerical value minus the first data, the total number of sample data in the first eleventh numerical value sample interval is more than the thirteenth numerical value, the total number of sample data in the first twelfth numerical value minus 1 sample interval is less than a fourteenth numerical value, and the fourteenth numerical value is equal to the product of the first numerical value and the tenth numerical value, the total number of sample data in the first twelfth numerical sample interval is not less than the fourteenth numerical value; sequencing the sample data in the target sample interval according to the sequence of similarity from high to low to obtain a third sequence; selecting sample data of a fifteenth numerical value to a sixteenth numerical value from the third sequence as a fourth sorting result, wherein the fifteenth numerical value is equal to the thirteenth numerical value minus a seventeenth numerical value plus 1, the sixteenth numerical value is equal to the fourteenth numerical value minus the seventeenth numerical value, and the seventeenth numerical value is the total number of the sample data in a sample interval of first eleventh numerical value minus 1; and sending the fourth sorting result to the data inquirer.

An embodiment of the present application further provides an electronic device, including: a processor and a memory;

the memory is used for storing computer programs;

the processor is configured to implement the data search method according to any one of the present applications when executing the computer program stored in the memory.

Optionally, referring to fig. 8, in addition to the processor 21 and the memory 23, the electronic device according to the embodiment of the present application further includes a communication interface 22 and a communication bus 24, where the processor 21, the communication interface 22, and the memory 23 complete mutual communication through the communication bus 24.

The communication bus mentioned in the electronic device may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a RAM (Random Access Memory) or an NVM (Non-Volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the data search method described in any of the applications.

In yet another embodiment provided herein, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the data search method described in any of the applications.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It should be noted that, in this document, the technical features in the various alternatives can be combined to form the scheme as long as the technical features are not contradictory, and the scheme is within the scope of the disclosure of the present application. Relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, the computer program product and the storage medium, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A method of searching data, the method comprising:

acquiring target data to be inquired submitted by a data inquirer;

selecting sample data in a designated similarity interval according to a preset interval selection rule, and sequencing the selected sample data according to the similarity to obtain a first sequencing result;

and sending the first sequencing result to the data inquirer.

2. The method of claim 1, wherein the calculating the similarity between the target data and each sample data, determining a plurality of similarity intervals, and allocating each sample data to a corresponding similarity interval according to the similarity between the target data and each sample data comprises:

3. The method according to claim 2, wherein after allocating each sample data into a corresponding similarity interval according to the similarity between the sample data and the target data, the method further comprises:

4. The method according to claim 1, wherein the selecting sample data in a designated similarity interval according to a preset interval selection rule, and sorting the selected sample data according to the similarity to obtain a first sorting result comprises:

5. The method of claim 4, wherein after the sending the first sorted results to the data querier, the method further comprises:

and sending the second sequencing result to the data inquirer.

6. The method of claim 1, wherein after the sending the first sorted results to the data querier, the method further comprises:

and sending the third sequencing result to the data inquirer.

7. The method of claim 1, wherein after the sending the first sorted results to the data querier, the method further comprises:

and sending the fourth sorting result to the data inquirer.

8. A data search apparatus, characterized in that the apparatus comprises:

the sample data distribution module is used for calculating the similarity between the target data and each sample data, determining a plurality of similarity intervals, and distributing each sample data to the corresponding similarity interval according to the similarity between each sample data and the target data;

9. The apparatus according to claim 8, wherein the sample data allocation module is specifically configured to: calculating the similarity between the target data and each sample data according to a preset sequence, and adjusting the similarity range corresponding to each similarity interval according to the upper and lower boundaries of each similarity obtained currently; and according to the similarity between each sample data and the target data, distributing each sample data to a corresponding similarity interval.

10. The apparatus of claim 9, wherein the sample data distribution module is further configured to: for any similarity interval, when the number of the sample data in the similarity interval exceeds a preset number threshold, the similarity interval is divided into a plurality of similarity intervals again, and the sample data in each of the divided similarity intervals is correspondingly adjusted.

11. An electronic device comprising a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to implement the data search method according to any one of claims 1 to 7 when executing the program stored in the memory.

12. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, implements the data search method according to any one of claims 1 to 7.