CN114840721B - Data searching method and device and electronic equipment - Google Patents

Data searching method and device and electronic equipment Download PDF

Info

Publication number
CN114840721B
CN114840721B CN202210763651.0A CN202210763651A CN114840721B CN 114840721 B CN114840721 B CN 114840721B CN 202210763651 A CN202210763651 A CN 202210763651A CN 114840721 B CN114840721 B CN 114840721B
Authority
CN
China
Prior art keywords
point
points
column
data
recording
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210763651.0A
Other languages
Chinese (zh)
Other versions
CN114840721A (en
Inventor
何文松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wenjingsong Technology Co ltd
Original Assignee
Beijing Wenjingsong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wenjingsong Technology Co ltd filed Critical Beijing Wenjingsong Technology Co ltd
Priority to CN202210763651.0A priority Critical patent/CN114840721B/en
Publication of CN114840721A publication Critical patent/CN114840721A/en
Application granted granted Critical
Publication of CN114840721B publication Critical patent/CN114840721B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data searching method, a data searching device and electronic equipment, wherein the data searching device comprises the following steps: randomly selecting a plurality of column points corresponding to the target data set from all data points corresponding to the target data set; generating a recording point corresponding to each column point according to data points existing in a preset range around each column point; and constructing an index map corresponding to the target data set according to each column point and the record points corresponding to each column point respectively, so as to search data according to the index map. The technical scheme of the embodiment of the invention can reduce the construction time of the index map and improve the construction efficiency of the index map.

Description

Data searching method and device and electronic equipment
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a data searching method and device and electronic equipment.
Background
The existing data searching methods mainly include two types, inverted File (IVF) algorithm and searching algorithm based on index map.
In the IVF algorithm, a k-means clustering method is usually adopted, that is, a clustering center point is found through multiple data iterations, and finally, all data points are divided into corresponding clustering centers. The search algorithm based on the index map includes KGraph algorithm and a high-dimensional vector retrieval algorithm (NSG) based on graphics.
However, the accuracy of the search result determined by the clustering result in the IVF algorithm is poor, and the KGraph algorithm updates the neighbor points corresponding to the data points mainly through multiple data iteration processes, so that the index map construction time is long.
Disclosure of Invention
The embodiment of the invention provides a data searching method, a data searching device and electronic equipment, which can improve the construction speed of an index map.
In a first aspect, an embodiment of the present invention provides a data search method, where the method includes:
randomly selecting a plurality of column points corresponding to the target data set from all data points corresponding to the target data set;
generating a recording point corresponding to each column point according to data points existing in a preset range around each column point;
and constructing an index map corresponding to the target data set according to the column points and the record points corresponding to the column points respectively, so as to search data according to the index map.
In a second aspect, an embodiment of the present invention further provides a data search apparatus, where the apparatus includes:
the column point selection module is used for randomly selecting a plurality of column points corresponding to the target data set from all data points corresponding to the target data set;
the recording point generating module is used for generating recording points corresponding to each column point according to data points existing in a preset range around each column point;
and the index map building module is used for building an index map corresponding to the target data set according to the column points and the record points corresponding to the column points respectively so as to search data according to the index map.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
one or more processors;
storage means for storing one or more programs;
the data search method provided by any embodiment of the invention is implemented when the one or more programs are executed by the one or more processors, such that the one or more processors execute the programs.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the data search method provided in any embodiment of the present invention.
According to the technical scheme of the embodiment of the invention, a plurality of column points corresponding to the target data set are randomly selected from all data points corresponding to the target data set, recording points corresponding to each column point are generated according to the data points existing in a preset range around each column point, an index map corresponding to the target data set is constructed according to each column point and the recording points corresponding to each column point, and data searching is carried out according to the index map.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1a is a flowchart of a data searching method according to an embodiment of the present invention;
FIG. 1b is a flowchart of a method for obtaining a histogram of all data points according to an embodiment of the present invention;
fig. 1c is a flowchart of a method for generating a recording point according to an embodiment of the present invention;
FIG. 2 is a flowchart of a data searching method according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a data searching method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data search apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device implementing the data search method according to the embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1a is a flowchart of a data searching method according to an embodiment of the present invention, where this embodiment is applicable to a case of searching data in a data set, and the method may be executed by a data searching apparatus. The data search device may be implemented by software and/or hardware, and may be generally integrated in an electronic device with a data processing function, and specifically includes the following steps:
and 110, randomly selecting a plurality of column points corresponding to the target data set from all data points corresponding to the target data set.
In this embodiment, the data point may be understood as a floating point number or an integer included in the target data set. After all data points corresponding to the target data set are acquired, a plurality of data points can be randomly selected from all the data points to serve as column points. Wherein, the pillar points can be understood as stable points with supporting function in the target data set.
In a specific embodiment, if all data points corresponding to the target data set are stored in a random manner, a plurality of data points can be continuously obtained from all the data points as the pillar points.
And 120, generating recording points corresponding to the column points respectively according to the data points existing in the preset range around each column point.
In this step, data points existing within a preset range around each of the pillars may be recorded, and the recorded data points are used as recording points, thereby generating recording points corresponding to each of the pillars.
In a specific embodiment, after the record points corresponding to each pillar point are generated, some data points may be selected again as pillar points according to the number of times that each data point is recorded. Specifically, the data points that are not recorded or are recorded less frequently may be used as the pillars in the secondary selection process.
In an optional implementation manner of this embodiment, when determining the pillar point corresponding to the target data set, in addition to randomly selecting the pillar point from all the data points of the data set by using the above method, the pillar point may be determined from data points other than the target data set. Specifically, assuming that the data points corresponding to the target data set are dispersed within a preset range inside and outside a certain spherical surface, the data points outside the target data set may be selected inside and outside the spherical surface as the cylindrical points. For example, when the data points corresponding to the target data set are dispersed within a preset range inside and outside the sphere of the regular octahedron, the vertex of the regular octahedron can be taken as a cylindrical point.
The advantage that sets up like this lies in, can guarantee on the one hand that the column point chooses result evenly distributed, and on the other hand need not screen the column point to improve the definite efficiency of column point.
Step 130, constructing an index map corresponding to the target data set according to each column point and the record points corresponding to each column point, so as to perform data search according to the index map.
In this step, optionally, an index map corresponding to the target data set may be constructed according to the distance between the pillars and the distance between the recording points under the pillars.
After constructing the index map corresponding to the target data set, if a data search request input by a user is received, a query point in the data search request may be obtained, and a search result matched with the query point may be obtained through the index map. Wherein the query point can be understood as specific query data.
In this embodiment, if the number of the corresponding column points of the target data set is equal to
Figure 862174DEST_PATH_IMAGE001
Then, the existing ones in the preset range around each column point can be recorded in batch
Figure 153479DEST_PATH_IMAGE002
A data point, wherein,
Figure 13987DEST_PATH_IMAGE003
much larger than the total number of data points N corresponding to the target data set,
Figure 9625DEST_PATH_IMAGE004
a fixed value or a floating value may be set, which is not limited in the present embodiment. This has the advantage of ensuring that the data points are recorded multiple times, thereby reducing the column point selectionCompared with the IVF-based algorithm in the prior art, the method has the advantages that the construction time of the index map can be shortened, and the construction efficiency of the index map is improved; secondly, compared with the search result of the IVF algorithm, the data search result obtained by the index map of the present embodiment has higher precision.
According to the technical scheme of the embodiment of the invention, a plurality of column points corresponding to the target data set are randomly selected from all data points corresponding to the target data set, recording points corresponding to each column point are generated according to the data points existing in the preset range around each column point, and an index map corresponding to the target data set is constructed according to each column point and the recording points corresponding to each column point, so that the time for constructing the index map can be shortened and the construction efficiency of the index map can be improved.
Based on the above embodiments, specifically, fig. 1b is a flowchart of a method for obtaining a pillar point from all data points, which includes the following steps:
and step 111, randomly selecting a plurality of data points as original column points from all the data points corresponding to the target data set.
And 112, if the number of the original column points is larger than a preset threshold value, screening a final column point corresponding to the target data set from the plurality of original column points according to the data characteristics of each original column point.
In this embodiment, in order to make the distribution of the pillars more uniform and closer to the overall distribution of the target data set, a plurality of data points may be selected as the original pillars, and if the number of the original pillars is greater than the preset threshold, the selected plurality of original pillars may be clipped. Optionally, according to data characteristics (e.g., data stability, etc.) of each original column point, the original column points that do not meet the preset requirements are removed, and the remaining original column points are retained as final column points corresponding to the target data set.
In a specific embodiment, the distance between each original column point and a data point in a surrounding preset range may be calculated, the degree of penetration corresponding to each original column point is determined according to the distance calculation result, and finally, the original column point with the smaller degree of penetration is reserved as the final column point.
In an embodiment of this embodiment, randomly selecting a plurality of data points as the original pillar points from all the data points corresponding to the target data set includes: and if the number of all the data points in the target data set is greater than the preset number, randomly selecting a plurality of data points corresponding to each column point grade according to a plurality of preset column point grades to serve as original column points corresponding to each column point grade.
In this embodiment, if the scale of the target data set is large, the original pillars corresponding to each pillar level may be acquired in batches. Specifically, assuming that there are three levels of column points, a plurality of primary original column points may be obtained according to a first number, a plurality of secondary original column points may be obtained according to a second number, and a plurality of tertiary original column points may be obtained according to a third number. Wherein the first number is less than the second number, and the second number is less than the third number.
The advantage of this arrangement is that it is convenient to construct the index map corresponding to the large-scale data set, and the construction efficiency of the index map is improved.
In this embodiment, the method for screening the final column points from the plurality of original column points may be applied to the determination process of the cluster in the k-means algorithm, thereby implementing optimization of the k-means algorithm.
In a specific embodiment, fig. 1c is a flowchart of a method for generating a recording point, which includes the following steps:
and step 121, recording data points existing in a preset range around each column point.
And step 122, generating recording points corresponding to the column points respectively according to the recording times corresponding to each data point and the distance between each data point and the column point.
In this embodiment, data points existing in a preset range around each column point may be recorded in batch, the recording frequency corresponding to each data point is counted, and a data point that is not recorded or has a recording frequency less than the preset frequency is taken as a recording point corresponding to a column point closest to the data point. Optionally, the data points with the recording times larger than the upper limit value of the times may be eliminated.
In an implementation manner of this embodiment, generating the recording points corresponding to each pillar point according to the number of times of recording corresponding to each data point and the distance between each data point and the pillar point includes: judging whether the current column point grade is a first grade or not, if not, acquiring a previous grade corresponding to the current column point grade and a target column point corresponding to the previous grade according to a plurality of preset column point grades; and acquiring each target recording point corresponding to the target column point, and generating a recording point corresponding to the current column point according to the distance between the current column point and each target recording point.
In a specific embodiment, when generating the record point corresponding to each pillar, if the current pillar is the first rank, the record point corresponding to the current pillar may be generated by the method of steps 121 to 122. If the current pillar point grade is not the first grade, assuming the second grade, each first-grade pillar point can be obtained, the first-grade pillar point which is closer to the current pillar point is taken as a target pillar point, then each recording point (namely a target recording point) corresponding to the target pillar point is obtained, the distance between the current pillar point and each target recording point is calculated, and the target recording point which is closer to the current pillar point is taken as the recording point corresponding to the current pillar point.
The advantage of this arrangement is that if the current column point level is not the first level, the recording point corresponding to the current column point can be quickly determined by directly acquiring the recording point corresponding to the column point in the previous level, thereby improving the construction efficiency of the index map.
Example two
This embodiment is a further refinement of the above embodiment, and the same or corresponding terms as those of the above embodiment are explained, and this embodiment is not described again. Fig. 2 is a flowchart of a data search method provided in the second embodiment, the technical solution of the second embodiment may be combined with one or more methods in the solutions of the foregoing embodiments, as shown in fig. 2, the method provided in the second embodiment may further include:
step 210, randomly selecting a plurality of cylindrical points corresponding to the target data set from all data points corresponding to the target data set.
And step 220, generating a recording point corresponding to each column point according to the data points existing in the preset range around each column point.
And step 230, taking each column point as an index, and constructing a first index map corresponding to the target data set so as to search data according to the first index map.
In this embodiment, after a plurality of pillars corresponding to a target data set are selected, each pillar may be used as an index according to a distance between the pillars, and a simple index map (i.e., a first index map) corresponding to the target data set is constructed.
In this embodiment, after the first index map is constructed, if a data search request is received, query points included in the data search request may be acquired, and distances between the query points and each column point are calculated through the first index map; determining an alternative column point corresponding to the query point according to the distance calculation result; and acquiring alternative recording points corresponding to the alternative column points, calculating the distance between the query point and each alternative recording point, and determining a data search result matched with the query point according to the distance calculation result.
In a specific embodiment, a plurality of column points corresponding to the target data set may be quickly obtained through the first index map, the distance between the query point and each column point is calculated, after the distance between the query point and each column point is calculated, a column point closer to the query point may be used as an alternative column point, then a record point (i.e., an alternative record point) corresponding to the alternative column point is quickly obtained through the first index map, the distance between the query point and each alternative record point is calculated, and finally, the alternative record point closer to the query point is used as a data search result.
In a specific embodiment, if the number of pillars corresponding to the target data set is large, an index map of pillars corresponding to a plurality of pillars may be generated in advance. After receiving the data search request, the column points to be calculated may be screened in the column point index map, and then the distance between the query point and each screened column point may be calculated in the manner described above.
According to the technical scheme of the embodiment of the invention, a plurality of column points corresponding to the target data set are randomly selected from all data points corresponding to the target data set, recording points corresponding to each column point are generated according to the data points existing in the preset range around each column point, each column point is used as an index, a first index map corresponding to the target data set is constructed, and a technical means of data searching is carried out according to the first index map, so that the construction time of the index map can be reduced, and the construction efficiency of the index map is improved.
EXAMPLE III
This embodiment is a further refinement of the above embodiment, and the same or corresponding terms as those of the above embodiment are explained, and this embodiment is not described again. Fig. 3 is a flowchart of a data search method provided in a third embodiment, in the third embodiment, the technical solution of the third embodiment may be combined with one or more methods in the solutions of the foregoing embodiments, as shown in fig. 3, the method provided in the third embodiment may further include:
and 310, randomly selecting a plurality of column points corresponding to the target data set from all data points corresponding to the target data set.
And 320, generating recording points corresponding to the column points respectively according to the data points existing in the preset range around each column point.
And step 330, taking the recording points corresponding to the column points as indexes, and constructing a second index map corresponding to the target data set so as to search data according to the second index map.
In this embodiment, optionally, each recording point may be used as an index according to a distance between the recording points, and a complex index map (i.e., a second index map) corresponding to the target data set may be constructed.
Note that the recording points may be used as an index because the generation process of the recording points is related to the distance between the pillar points. In the data search, if a certain recording point is selected, a plurality of columns corresponding to the recording point are also selected, and the plurality of columns are recorded by the plurality of columns.
In an implementation manner of this embodiment, constructing a second index map corresponding to the target data set by using the record points corresponding to the respective pillar points as indexes includes: determining neighbor points corresponding to the recording points according to the distance between the recording points under each column point; and constructing a second index map corresponding to the target data set according to the neighbor points corresponding to the record points.
In a specific embodiment, the distance between the recording points under each column point can be calculated, the recording point closer to the current recording point is used as a neighbor point, the neighbor point is added to the index queue matched with the current recording point, finally, the neighbor point corresponding to each recording point is obtained according to the index queue matched with each recording point, and the second index map is constructed according to the neighbor point corresponding to each recording point.
In this embodiment, after the second index map is constructed, if a data search request is received, query points included in the data search request may be obtained, and distances between the query points and neighboring points corresponding to the record points are calculated through the second index map; and determining a data search result matched with the query point according to the distance calculation result.
According to the technical scheme of the embodiment of the invention, a plurality of column points corresponding to the target data set are randomly selected from all data points corresponding to the target data set, the recording points corresponding to the column points are generated according to the data points existing in the preset range around each column point, the recording points corresponding to the column points are used as indexes, the second index map corresponding to the target data set is constructed, and the data search is carried out according to the index map, so that the construction time of the index map can be reduced, and the construction efficiency of the index map is improved.
Example four
Fig. 4 is a schematic structural diagram of a data search apparatus according to a fourth embodiment of the present invention, as shown in fig. 4, the apparatus includes: a column point selecting module 410, a record point generating module 420 and an index map constructing module 430.
The column point selecting module 410 is configured to randomly select a plurality of column points corresponding to the target data set from all data points corresponding to the target data set;
a recording point generating module 420, configured to generate a recording point corresponding to each column point according to data points existing in a preset range around each column point;
an index map building module 430, configured to build an index map corresponding to the target data set according to each of the pillars and the record points corresponding to each of the pillars, so as to perform data search according to the index map.
According to the technical scheme provided by the embodiment of the invention, a plurality of column points corresponding to the target data set are randomly selected from all data points corresponding to the target data set, recording points corresponding to each column point are generated according to the data points existing in the preset range around each column point, and an index map corresponding to the target data set is constructed according to each column point and the recording points corresponding to each column point, so that the technical means of data searching according to the index map can reduce the construction time of the index map and improve the construction efficiency of the index map.
On the basis of the above embodiment, the pole point selecting module 410 includes:
the original column point acquisition unit is used for randomly selecting a plurality of data points from all data points corresponding to the target data set as original column points;
the column point screening unit is used for screening a final column point corresponding to the target data set from a plurality of original column points according to the data characteristics of each original column point if the number of the original column points is larger than a preset threshold value;
and the data point acquisition unit is used for randomly selecting a plurality of data points corresponding to each column point grade according to a plurality of preset column point grades to serve as original column points corresponding to each column point grade if the number of all data points in the target data set is greater than the preset number.
The recording point generating module 420 includes:
the data point recording unit is used for recording data points existing in a preset range around each column point;
the recording point determining unit is used for generating recording points corresponding to the column points according to the recording times corresponding to the data points and the distance between each data point and the column point;
the grade judging unit is used for judging whether the grade of the current column point is a first grade or not;
the target column point acquisition unit is used for acquiring a previous grade corresponding to the current column point grade and target column points corresponding to the previous grade according to a plurality of preset column point grades when the current column point grade is not the first grade;
and the target recording point acquisition unit is used for acquiring each target recording point corresponding to the target column point and generating a recording point corresponding to the current column point according to the distance between the current column point and each target recording point.
The index map building module 430 includes:
a first index map construction unit, configured to construct a first index map corresponding to the target data set by using each of the pillars as an index;
a second index map construction unit, configured to construct a second index map corresponding to the target data set by using the record points corresponding to the respective pillar points as indexes;
a neighbor point determining unit, configured to determine a neighbor point corresponding to each record point according to a distance between each record point under each column point, and construct a second index map corresponding to the target data set according to the neighbor point corresponding to each record point;
a query point obtaining unit, configured to obtain a query point included in the data search request if the index map is a first index map, and calculate a distance between the query point and each column point through the first index map;
the alternative column point determining unit is used for determining alternative column points corresponding to the query points according to the distance calculation result;
and the search result determining unit is used for acquiring the alternative recording points corresponding to the alternative column points, calculating the distance between the query point and each alternative recording point, and determining the data search result matched with the query point according to the distance calculation result.
The device can execute the methods provided by all the embodiments of the invention, and has corresponding functional modules and beneficial effects for executing the methods. For technical details which are not described in detail in the embodiments of the present invention, reference may be made to the methods provided in all the aforementioned embodiments of the present invention.
EXAMPLE five
FIG. 5 illustrates a schematic diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The processor 11 performs the various methods and processes described above, such as a data search method.
In some embodiments, the data search method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the data search method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data search method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Computer programs for implementing the methods of the present invention can be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A method of searching data, the method comprising:
randomly selecting a plurality of column points corresponding to the target data set from all data points corresponding to the target data set; the column points are stable points used for data support in the target data set;
recording data points existing in a preset range around each column point, and generating recording points corresponding to each column point according to the recording times corresponding to each data point and the distance between each data point and the column point;
constructing an index map corresponding to the target data set according to the column points and the record points corresponding to the column points respectively, and performing data search according to the index map;
the product of the number of the column points and the number of the corresponding recording points is larger than the total number of the data points corresponding to the target data set;
generating a recording point corresponding to each column point according to the recording times corresponding to each data point and the distance between each data point and the column point, wherein the recording points comprise: judging whether the current column point grade is a first grade or not; if not, acquiring a previous grade corresponding to the current column point grade and a target column point corresponding to the previous grade according to a plurality of preset column point grades; and acquiring each target recording point corresponding to the target column point, and generating a recording point corresponding to the current column point according to the distance between the current column point and each target recording point.
2. The method of claim 1, wherein randomly selecting a plurality of bins corresponding to the target dataset from all of the data points corresponding to the target dataset comprises:
randomly selecting a plurality of data points as original column points from all data points corresponding to the target data set;
and if the number of the original column points is larger than a preset threshold value, screening a final column point corresponding to the target data set from the plurality of original column points according to the data characteristics of each original column point.
3. The method of claim 2, wherein randomly selecting a plurality of data points as the original pillar points from all the data points corresponding to the target data set comprises:
and if the number of all the data points in the target data set is greater than the preset number, randomly selecting a plurality of data points corresponding to each column point grade according to a plurality of preset column point grades to serve as original column points corresponding to each column point grade.
4. The method of claim 1, wherein constructing an index map corresponding to the target data set according to each of the pillars and the record points corresponding to each of the pillars respectively comprises:
taking each column point as an index, and constructing a first index map corresponding to the target data set; or,
and taking the record points corresponding to the column points as indexes to construct a second index map corresponding to the target data set.
5. The method according to claim 4, wherein the constructing a second index map corresponding to the target data set by using the record points corresponding to the pillar points as indexes comprises:
determining neighbor points corresponding to the record points according to the distance between the record points under each column point;
and constructing a second index map corresponding to the target data set according to the neighbor points corresponding to the record points.
6. The method of claim 4, wherein performing a data search according to the index map comprises:
if the index map is a first index map, acquiring query points included in the data search request, and calculating the distance between the query points and each column point through the first index map;
determining alternative column points corresponding to the query points according to the distance calculation result;
and acquiring alternative recording points corresponding to the alternative column points, calculating the distance between the query point and each alternative recording point, and determining a data search result matched with the query point according to the distance calculation result.
7. An apparatus for searching data, the apparatus comprising:
the column point selection module is used for randomly selecting a plurality of column points corresponding to the target data set from all data points corresponding to the target data set; the column points are stable points used for data support in the target data set;
the recording point generating module is used for recording data points existing in a preset range around each column point and generating recording points corresponding to each column point according to the recording times corresponding to each data point and the distance between each data point and the column point;
the index map building module is used for building an index map corresponding to the target data set according to each column point and the record points corresponding to the column points respectively so as to search data according to the index map;
the product of the number of the column points and the number of the corresponding recording points is larger than the total number of the data points corresponding to the target data set;
the record point generating module is also used for judging whether the current column point grade is a first grade; if not, acquiring a previous grade corresponding to the current column point grade and a target column point corresponding to the previous grade according to a plurality of preset column point grades; and acquiring each target recording point corresponding to the target column point, and generating a recording point corresponding to the current column point according to the distance between the current column point and each target recording point.
8. An electronic device, the electronic device comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs when executed by the one or more processors cause the one or more processors to perform the data search method of any of claims 1-6.
CN202210763651.0A 2022-07-01 2022-07-01 Data searching method and device and electronic equipment Active CN114840721B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210763651.0A CN114840721B (en) 2022-07-01 2022-07-01 Data searching method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210763651.0A CN114840721B (en) 2022-07-01 2022-07-01 Data searching method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN114840721A CN114840721A (en) 2022-08-02
CN114840721B true CN114840721B (en) 2022-10-11

Family

ID=82574673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210763651.0A Active CN114840721B (en) 2022-07-01 2022-07-01 Data searching method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114840721B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280175A (en) * 2018-01-22 2018-07-13 大连大学 The row's of the falling space index method divided based on medical services region
WO2020263424A1 (en) * 2019-06-28 2020-12-30 Microsoft Technology Licensing, Llc Building a graph index and searching a corresponding dataset
CN113590645A (en) * 2021-06-30 2021-11-02 北京百度网讯科技有限公司 Searching method, searching device, electronic equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6134541A (en) * 1997-10-31 2000-10-17 International Business Machines Corporation Searching multidimensional indexes using associated clustering and dimension reduction information
CN109284352B (en) * 2018-09-30 2022-02-08 哈尔滨工业大学 Query method for evaluating indefinite-length words and sentences of class documents based on inverted index
CN110851563B (en) * 2019-10-08 2021-11-09 杭州电子科技大学 Neighbor document searching method based on coding navigable stretch chart
CN112287185A (en) * 2020-11-05 2021-01-29 杭州电子科技大学 Approximate nearest neighbor searching method combining VP tree and guiding nearest neighbor graph
CN112507149A (en) * 2020-11-13 2021-03-16 厦门大学 Construction method of dynamic k neighbor graph and rapid image retrieval method based on dynamic k neighbor graph

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280175A (en) * 2018-01-22 2018-07-13 大连大学 The row's of the falling space index method divided based on medical services region
WO2020263424A1 (en) * 2019-06-28 2020-12-30 Microsoft Technology Licensing, Llc Building a graph index and searching a corresponding dataset
CN113590645A (en) * 2021-06-30 2021-11-02 北京百度网讯科技有限公司 Searching method, searching device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114840721A (en) 2022-08-02

Similar Documents

Publication Publication Date Title
JP7403605B2 (en) Multi-target image text matching model training method, image text search method and device
JP2022137281A (en) Data query method, device, electronic device, storage medium, and program
CN114840721B (en) Data searching method and device and electronic equipment
US20220398244A1 (en) Query method and device and storage medium
EP4116889A2 (en) Method and apparatus of processing event data, electronic device, and medium
CN116881219A (en) Database optimization processing method and device, electronic equipment and storage medium
CN113868555A (en) Track retrieval method, device, equipment and storage medium
CN113779370B (en) Address retrieval method and device
CN114896418A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN114596196A (en) Method and device for filtering point cloud data, equipment and storage medium
CN114037060A (en) Pre-training model generation method and device, electronic equipment and storage medium
CN114138925A (en) Location point belonging area retrieval method, device, electronic equipment, medium and product
CN115168727B (en) User habit mining method and device and electronic equipment
US20220237474A1 (en) Method and apparatus for semanticization, electronic device and readable storage medium
US20230145408A1 (en) Method of processing feature information, electronic device, and storage medium
CN114037057B (en) Pre-training model generation method and device, electronic equipment and storage medium
CN115100162A (en) Representative working condition point determining method, device, equipment and medium
CN114579573B (en) Information retrieval method, information retrieval device, electronic equipment and storage medium
US20230145853A1 (en) Method of generating pre-training model, electronic device, and storage medium
US20230206075A1 (en) Method and apparatus for distributing network layers in neural network model
CN115510140A (en) Data extraction method, device, equipment and storage medium
US20230095947A1 (en) Method and apparatus for pushing resource, and storage medium
CN116702304A (en) Method and device for grouping foundation pit design schemes based on unsupervised learning
CN114897617A (en) Model evaluation method, device, equipment and storage medium for financial wind control scene
CN118467575A (en) Target query statement determination method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant