CN113535710B - Searching method, searching device, terminal equipment and storage medium - Google Patents

Searching method, searching device, terminal equipment and storage medium Download PDF

Info

Publication number
CN113535710B
CN113535710B CN202010322885.2A CN202010322885A CN113535710B CN 113535710 B CN113535710 B CN 113535710B CN 202010322885 A CN202010322885 A CN 202010322885A CN 113535710 B CN113535710 B CN 113535710B
Authority
CN
China
Prior art keywords
search
information
sub
searched
mapping table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010322885.2A
Other languages
Chinese (zh)
Other versions
CN113535710A (en
Inventor
陈浩宇
农革
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202010322885.2A priority Critical patent/CN113535710B/en
Publication of CN113535710A publication Critical patent/CN113535710A/en
Application granted granted Critical
Publication of CN113535710B publication Critical patent/CN113535710B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results

Abstract

The application is applicable to the technical field of information and provides a searching method, a searching device, terminal equipment and a storage medium, wherein the searching method comprises the following steps: when receiving information to be searched, identifying the search type of the information to be searched; if the search type is binary search, generating a plurality of sub-search information according to the information to be searched; searching by adopting the plurality of sub-search information respectively to obtain a search result matched with each sub-search information; and outputting the search result. By the method, efficient and accurate binary search can be realized.

Description

Searching method, searching device, terminal equipment and storage medium
Technical Field
The present application belongs to the field of information technology, and in particular, relates to a search method, a search device, a terminal device, and a storage medium.
Background
Modern computer systems typically store data in binary digits and exist in a variety of data representations, such as decimal, hexadecimal, single-byte, multi-byte, and the like. In general, data is input and output in units of bytes or characters, and is not directly processed in a binary bit manner. As well as in the field of data searching, the search patterns of mainstream search engines are typically in bytes or characters, and do not support binary bit searching. However, in some scenarios, binary search remains an urgent need. Although the part of the special system realizes the function of binary search, the result is often obtained by adopting a traversing mode, and the search efficiency is poor.
Data indexing is an effective means of improving data search efficiency. In the field of data indexing, suffix indexing does not need to word data, can create indexes for any type of data indifferently, and realizes 100% recall ratio. The suffix index is usually in bytes, and generally only a search pattern in bytes is supported, and binary search cannot be realized. Therefore, how to implement efficient and accurate binary search in combination with the data indexing technology is a problem to be solved at present.
Disclosure of Invention
In view of the above, the embodiments of the present application provide a searching method, apparatus, terminal device, and storage medium, which can implement efficient and accurate binary search.
A first aspect of an embodiment of the present application provides a search method, including:
when receiving information to be searched, identifying the search type of the information to be searched;
if the search type is binary search, generating a plurality of sub-search information according to the information to be searched;
searching by adopting the plurality of sub-search information respectively to obtain a search result matched with each sub-search information;
and outputting the search result.
A second aspect of an embodiment of the present application provides a search apparatus, including:
the search type identification module is used for identifying the search type of the information to be searched when the information to be searched is received;
the sub-search information generation module is used for generating a plurality of sub-search information according to the information to be searched if the search type is binary search;
the sub-search information searching module is used for searching by adopting the plurality of sub-search information respectively to obtain a search result matched with each sub-search information;
and the search result output module is used for outputting the search result.
A third aspect of an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the search method according to the first aspect when executing the computer program.
A fourth aspect of embodiments of the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements a search method as described in the first aspect above.
A fifth aspect of an embodiment of the present application provides a computer program product for, when run on a terminal device, causing the terminal device to perform the search method of the first aspect.
Compared with the prior art, the embodiment of the application has the following advantages:
according to the embodiment of the application, when the information to be searched is received, the search type of the information to be searched is identified, so that the information to be searched can be searched or processed according to different search types. Specifically, if the search type is binary search, generating a plurality of sub-search information according to the information to be searched, expanding the information to be searched into a byte mode, and searching by adopting the plurality of sub-search information in the byte mode to obtain a search result matched with each sub-search information; and for the information to be searched, the search type of which is byte search, the search can be directly performed in a byte search mode. The embodiment can solve the problem of binary search of any type of data by adopting the mode expansion and suffix index technology, and effectively improves the efficiency and performance of binary search.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings that are required to be used in the embodiments or the description of the prior art. It is evident that the drawings in the following description are only some embodiments of the present application and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a schematic diagram of a suffix index creation process according to one embodiment of the application;
FIG. 2 is a flow chart illustrating steps of a search method according to an embodiment of the present application;
FIG. 3 is a flow chart illustrating steps of another search method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an expansion of information to be searched into complete bytes according to one embodiment of the application;
FIG. 5 is a schematic diagram of a system architecture to which a search method according to an embodiment of the present application is applied;
FIG. 6 is a diagram of a search process of a search method according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a search apparatus according to one embodiment of the present application;
fig. 8 is a schematic diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
The technical scheme of the application is described below through specific examples.
The searching method disclosed by the embodiment of the application solves the binary searching problem of any type of data by adopting the mode expansion and suffix indexing technology. When implementing the search method of the embodiment of the present application, it is necessary to create an index for original data in units of bytes. Thus, the process of creating a suffix index for original data will be described first.
Referring to FIG. 1, a diagram of a suffix index creation process is shown, according to one embodiment of the application. According to the creation process shown in fig. 1, for the original data to be added, an index library may be first designated for it. If the designated index library exists, the original data can be added, and the added original data is stored as a local file; if the designated index library does not exist, the original data is stored as a local file after the corresponding index library is created. In the storage process, if the storage fails, failure information can be prompted to a user, otherwise metadata information corresponding to the stored original data can be recorded, and the metadata information is used as a description of the original data and can contain information such as the data size, the section ID, the section offset and the like of the original data. On the other hand, for the original data that has been stored, a suffix index of the original data may be created. The original data, the suffix index of the original data and the metadata together form complete suffix index information of the original data.
In this embodiment, the original data may be stored as a local file, and the original data and its trailing index information may be recorded by an index library, and the index information may be stored in units of bytes.
Referring to fig. 2, a flowchart illustrating steps of a searching method according to an embodiment of the present application may specifically include the following steps:
s201, when information to be searched is received, identifying the search type of the information to be searched;
in this embodiment, the information to be searched may refer to search content input by a user in one search task. The search content may have a corresponding search pattern, such as a byte pattern, a binary pattern, etc.
S202, if the search type is binary search, generating a plurality of sub-search information according to the information to be searched;
in this embodiment, when the information to be searched is in a binary mode, the search type is binary search; when the information to be searched is in byte mode, the search type is byte search.
For the information to be searched in the byte mode, the accurate search can be directly performed in the byte search mode, and for the information to be searched in the binary mode, sub-search information in a plurality of byte modes can be generated on the basis of the original information to be searched, and then the accurate search is performed in the byte search mode.
S203, searching by adopting the plurality of sub-search information respectively to obtain a search result matched with each sub-search information;
in this embodiment, each piece of sub-search information is byte-pattern search information generated by means of pattern expansion on the basis of information to be searched. Therefore, when searching, each piece of sub-search information can be searched in a byte search mode, so that corresponding search results are obtained.
S204, outputting the search result.
After the searching of all the sub-search information is completed in a byte searching mode, the result obtained by the searching can be output to a user.
It should be noted that, because each piece of sub-search information is generated by a mode expansion mode, the corresponding search result may be repeated, so that when the search result is output, all the search results may be de-duplicated, and then the de-duplicated search result is output to the user.
In the embodiment of the application, when the information to be searched is received, the search type of the information to be searched is identified, so that the information to be searched can be searched or processed according to different search types. Specifically, if the search type is binary search, generating a plurality of sub-search information according to the information to be searched, expanding the information to be searched into a byte mode, and then searching by using the plurality of sub-search information in the byte mode respectively to obtain a search result matched with each sub-search information; and for the information to be searched, the search type of which is byte search, the search can be directly performed in a byte search mode. The embodiment can solve the problem of binary search of any type of data by adopting the mode expansion and suffix index technology, and effectively improves the efficiency and performance of binary search.
Referring to fig. 3, a flowchart illustrating steps of another searching method according to an embodiment of the present application may specifically include the following steps:
s301, when information to be searched is received, identifying the search type of the information to be searched;
in this embodiment, the information to be searched may refer to search content input by a user in one search task. The search content may have a corresponding search pattern, such as a byte pattern, a binary pattern, etc.
S302, if the search type is binary search, expanding the information to be searched into complete bytes to obtain a plurality of sub-search information;
in this embodiment, when the information to be searched is in a binary mode, the search type is binary search; when the information to be searched is in byte mode, the search type is byte search.
For the binary mode information to be searched, which contains at least one bit, the binary mode information can be expanded into a complete byte by a mode expansion mode to obtain a plurality of sub-search information. Each sub-search information may be one complete byte.
In a specific implementation, binary notation symbols can be filled before bits contained in the information to be searched and/or after the bits contained in the information to be searched, the information to be searched is expanded into complete bytes, and a plurality of sub-search information is obtained, wherein the complete bytes contain a preset number of bits. That is, one sub-search information contains 8 bits.
FIG. 4 is a schematic diagram of the expansion of information to be searched into complete bytes according to one embodiment of the application. Taking the information to be searched of the current search task as "01" as an example, the search type of the search task is binary search, which indicates that all matching results with binary bits of "01" in the original data need to be searched.
According to the expansion method shown in fig. 4, binary notation symbols "0" and "1" can be respectively filled before and after "01", and expanded into a complete byte, such as "01000000", "01000001", and the like.
The number of bits of the sub-search information obtained by extension is necessarily an integer multiple of 8, and corresponds to one or more bytes.
In this embodiment, the number of sub-search information may be determined according to the number of search bits of the binary pattern information to be searched, that is, the number of occurrences of the symbols "0" and "1" in the binary pattern information to be searched. As shown in fig. 4, the number of search bits of the information to be searched "01" is n=2, and the number of sub-search information obtained by expanding the number of search bits is = (8-n+1) = (8-n) = (8-2+1) = (8-2) = 448. Where the symbol "≡" represents square.
S303, judging whether key value pairs corresponding to the sub-search information exist in a mapping table one by one according to each sub-search information;
in this embodiment, for the information "01" to be searched, if all the positions of the information "01" in the index library need to be found, all the cases of the information "01" after being expanded into byte patterns need to be considered, and at this time, each byte pattern after being expanded is possible to obtain a result meeting the search condition. Therefore, 448 sub-searches obtained by expansion need to be processed one by one.
In a specific implementation, for each piece of sub-search information, whether a key value pair corresponding to the sub-search information exists in the mapping table can be judged one by one, so that a search result corresponding to the sub-search information is obtained. The mapping table may be a key-value type storage, which is implemented through a Map container in a language such as Java or c++, wherein a key is search information, a value is a search result corresponding to the search information, and a search result is a coordinate of the search information in the original data.
If there is a key value pair corresponding to the sub-search information in the mapping table, S304 may be performed; if there is no key value pair corresponding to the sub-search information in the mapping table, S305 may be performed.
It should be noted that if there is no mapping table, a new mapping table is created, where the content of the new mapping table is initially empty, and a plurality of key value pair information may be recorded, where a key value pair may be used to record sub-search information and a search result matched with the sub-search information, where a key is the sub-search information and a value is a search result matched with the sub-search information.
S304, returning the key value pair, wherein the key value pair comprises sub-search information and a search result matched with the sub-search information;
in this embodiment, the mapping table is responsible for temporarily storing all search results of a search task, and before searching a plurality of sub-search information, whether the current sub-search information has a corresponding result can be determined through the mapping table, if yes, the result can be directly returned, so as to save the time cost of repeatedly searching the same sub-search information.
S305, searching an offset position corresponding to the sub-search information in a preset suffix index, and acquiring a search result matched with the sub-search information according to the offset position;
if the mapping table does not have the key value pair corresponding to the current sub-search information, that is, all keys of the mapping table are not equal to the byte pattern corresponding to the current sub-search information, the sub-search information of the byte pattern can be accurately searched.
When the sub-search information in byte mode is accurately searched, the offset position corresponding to the sub-search information can be searched in a preset suffix index, and the search result matched with the sub-search information is obtained according to the offset position.
In a specific implementation, since the suffix index of the original data is dictionary ordered, the offset position of a certain byte in the original data can be quickly matched in the suffix index by means of a binary search algorithm. If the offset position does not exist, the byte pattern is not existed in the original data; if the offset position exists, the matching result can be counted according to the offset position. There may be multiple match results, indicating that the byte pattern occurs multiple times in the original data. If the byte pattern hits in the mapping table, it is indicated that the byte pattern has been searched without repeated searching.
In this embodiment, after the search result matched with the sub-search information is obtained according to the offset position, the sub-search information may be used as a key, the search result matched with the sub-search information may be used as a value, and a new key value pair may be used as a new key value pair, and the new key value pair may be added to the mapping table, so as to complete updating of the mapping table, and reduce the time overhead of the subsequent search.
S306, de-duplicating the search results matched with the sub-search information to obtain target search results, and outputting the target search results.
In this embodiment, after searching for all sub-search information is completed, the obtained search result may include duplicate contents. Therefore, the search results corresponding to all the sub-search information can be de-duplicated, the target search results after de-duplication are obtained, and then the target search results are output to the user.
It should be noted that, if the search type of the information to be searched is byte search, the search may be directly performed in the manner of S303-S305. Firstly judging whether a key value pair corresponding to the information to be searched in the byte mode exists in the mapping table, and if the key value pair corresponding to the information to be searched exists in the mapping table, directly returning the key value pair; if the mapping table does not have the key value pair corresponding to the information to be searched, searching an offset position corresponding to the information to be searched in a preset suffix index, and acquiring a search result matched with the information to be searched according to the offset position.
In the embodiment of the application, the suffix index is created for the original data by taking bytes as a unit, so that the binary search is realized, and the original binary bit search mode is expanded into a plurality of byte modes by a mode expansion mode on the index by taking bytes as a unit, thereby realizing the binary search and improving the efficiency and the performance of the data search; the data search of the embodiment is realized by means of the suffix index, so that the traversal search of the data is avoided, and the recall ratio of 100% of any type of data is realized.
For ease of understanding, the search method according to the embodiments of the present application will be described below with reference to a specific example.
Fig. 5 is a schematic diagram of a system architecture to which the search method according to an embodiment of the present application is applied. The system architecture shown in fig. 5 includes a data indexing module, a data searching module and a pattern expanding module. The data index module is responsible for creating suffix index for original data, maintaining the original data and suffix index information thereof, and storing the index information in a byte unit; the data searching module is used for adaptively executing data searching tasks according to a searching mode by receiving and analyzing the data searching request, wherein the searching mode can be divided into a byte mode and a non-byte mode, and the non-byte mode realizes binary searching through mode expansion; the mode expansion module is responsible for expanding the binary search mode into a plurality of byte modes, and the data search module searches the search information of all the expanded byte modes one by one.
Referring to fig. 6, a schematic diagram of a search process of a search method according to an embodiment of the present application is shown on the basis of the architecture shown in fig. 5, and the process may include the following steps:
s601, inputting search information '01' by a user, and designating a search type as binary search;
s602, judging whether the search type is binary search. If the search type is binary search, go to S603; if the search type is byte search, turning to S605;
s603, performing mode expansion on the search information. For example, 0 and 1 are respectively filled before and after the search information to be expanded into a plurality of sub-search information in bytes, for example, the search information "01" can be expanded into a plurality of sub-search information such as "01000000", "01000001", etc.;
s604, knowing the search bit number n=2 (the number of occurrences of the search bit number is equal to 0 or 1) from the search information "01", the number of sub-search information obtained by expansion= (8-n+1) ×2++8-n) = (8-2+1) =2++8-2) =448;
s605, 448 pieces of sub-search information are processed one by one. Because for the search information "01", if all the positions of the search information "01" in the index library need to be found, all the conditions of the expansion of the "01" into byte patterns need to be considered, and at this time, the sub-search information of each byte pattern after the expansion is likely to acquire the results meeting the search conditions;
s606, searching all sub-search information (byte mode) and judging whether a mapping table exists or not. If so, go to S607; otherwise, go to S608;
s607, judging whether the mapping table hits the current sub-search information, if yes, turning to S611; otherwise, go to S609;
s608, creating a mapping table, wherein the mapping table is a key-value type storage, and is realized through a Map container of Java or C++ languages, the keys are search information, and the values are search results corresponding to the search information. The mapping table is responsible for temporarily storing all search results of a search task, and before searching a plurality of sub-search information, whether the current sub-search information has a corresponding result or not can be judged through the mapping table, if so, the result can be directly returned, so that the time cost of repeatedly searching the same sub-search information is saved;
s609, if the sub-search information (byte pattern) does not hit in the mapping table (i.e. all keys of the mapping table are not equal to the sub-search information), performing accurate search on the sub-search information of the byte pattern; because the suffix index of the original data is dictionary ordered, the offset position of a certain byte in the original data can be quickly matched in the suffix index by means of a binary search algorithm. If the offset position does not exist, the byte pattern is not existed in the original data; if the offset position exists, the matching result is counted according to the offset position. There may be multiple matching results, indicating that the sub-search information appears multiple times in the original data. If the sub-search information hits in the mapping table, it is indicated that the sub-search information has been searched, and the repeated search is not needed, and the process goes to S611;
s610, updating a mapping table, adding sub-search information of the current search and one or more matching results corresponding to the sub-search information to the mapping table, wherein keys are sub-search information, values are all matching results corresponding to the sub-search information, and the values are stored in a List container form;
s611, obtaining all matching results of the sub-search information in the original data, namely all positions of the sub-search information in the original data;
s612, judging whether all sub-search information is searched, if so, turning to S613; otherwise, go to S605 until all sub-search information searches are ended.
S613, if all the sub-search information is searched, removing the duplication of all the search results and adding the duplication to the search result set to obtain the binary search result of the original search information.
It should be noted that, the sequence number of each step in the above embodiment does not mean the sequence of execution sequence, and the execution sequence of each process should be determined by its function and internal logic, and should not limit the implementation process of the embodiment of the present application in any way.
Referring to fig. 7, a schematic diagram of a search apparatus according to an embodiment of the present application may specifically include the following modules:
a search type identifying module 701, configured to identify a search type of information to be searched when the information to be searched is received;
a sub-search information generating module 702, configured to generate a plurality of sub-search information according to the information to be searched if the search type is binary search;
a sub-search information search module 703, configured to search by using the plurality of sub-search information, respectively, to obtain a search result matched with each sub-search information;
and the search result output module 704 is used for outputting the search result.
In the embodiment of the present application, the sub-search information generation module 702 may specifically include the following sub-modules:
and the information to be searched expansion sub-module is used for expanding the information to be searched into complete bytes to obtain a plurality of sub-search information taking bytes as a unit.
In the embodiment of the present application, the information to be searched includes at least one bit, and the information expansion sub-module to be searched may specifically include the following units:
the information to be searched expanding unit is used for respectively filling binary notation symbols before bits contained in the information to be searched and/or after the bits contained in the information to be searched, expanding the information to be searched into complete bytes to obtain a plurality of sub-search information, wherein the complete bytes contain preset numbers of bits.
In the embodiment of the present application, the sub-search information search module 703 may specifically include the following sub-modules:
the mapping table judging submodule is used for judging whether key value pairs corresponding to the sub-search information exist in the mapping table one by one according to each sub-search information;
the mapping table returning sub-module is used for returning a key value pair corresponding to the sub-search information if the key value pair exists in the mapping table, wherein the key value pair comprises the sub-search information and a search result matched with the sub-search information;
a sub-search information searching sub-module, configured to search an offset position corresponding to the sub-search information in a preset suffix index if a key value pair corresponding to the sub-search information does not exist in the mapping table, and acquire a search result matched with the sub-search information according to the offset position;
and the new mapping table creation sub-module is used for creating a new mapping table comprising a plurality of key value pairs if the mapping table does not exist, wherein the content of the mapping table is empty at the beginning, and the key value pairs in the new mapping table are used for recording the sub-search information and the search results matched with the sub-search information, wherein the keys are the sub-search information and the values are the search results matched with the sub-search information.
In the embodiment of the present application, the sub-search information search module 703 may further include the following sub-modules:
and the mapping table updating sub-module is used for adding the new key value pair to the mapping table by taking the sub-search information as a key and taking a search result matched with the sub-search information as a value as a new key value pair.
In the embodiment of the present application, the search result output module 704 may specifically include the following sub-modules:
the search result de-duplication sub-module is used for de-duplicating the search results matched with the sub-search information to obtain target search results;
and the target search result output sub-module is used for outputting the target search result.
In an embodiment of the present application, the apparatus may further include the following modules:
the byte search mapping table judging module is used for judging whether a key value pair corresponding to the information to be searched exists in the mapping table if the search type is byte search;
the byte searching module is used for returning the key value pair if the key value pair corresponding to the information to be searched exists in the mapping table, wherein the key value pair comprises the information to be searched and a search result matched with the information to be searched; if the mapping table does not have the key value pair corresponding to the information to be searched, searching an offset position corresponding to the information to be searched in a preset suffix index, and acquiring a search result matched with the information to be searched according to the offset position.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference should be made to the description of the method embodiments.
Referring to fig. 8, a schematic diagram of a terminal device according to an embodiment of the present application is shown. As shown in fig. 8, the terminal device 800 of the present embodiment includes: a processor 810, a memory 820 and a computer program 821 stored in said memory 820 and executable on said processor 810. The processor 810, when executing the computer program 821, implements the steps of the various embodiments of the search method described above, such as steps S201 through S204 shown in fig. 2. Alternatively, the processor 810 may perform the functions of the modules/units of the apparatus embodiments described above, such as the functions of the modules 701 to 704 shown in fig. 7, when executing the computer program 821.
Illustratively, the computer program 821 may be partitioned into one or more modules/units that are stored in the memory 820 and executed by the processor 810 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which may be used to describe the execution of the computer program 821 in the terminal device 800. For example, the computer program 821 may be divided into a search type identification module, a sub-search information generation module, a sub-search information search module, and a search result output module, each of which specifically functions as follows:
the search type identification module is used for identifying the search type of the information to be searched when the information to be searched is received;
the sub-search information generation module is used for generating a plurality of sub-search information according to the information to be searched if the search type is binary search;
the sub-search information searching module is used for searching by adopting the plurality of sub-search information respectively to obtain a search result matched with each sub-search information;
and the search result output module is used for outputting the search result.
The terminal device 800 may be a computing device such as a desktop computer, a notebook computer, a palm computer, and a cloud server. The terminal device 800 may include, but is not limited to, a processor 810, a memory 820. It will be appreciated by those skilled in the art that fig. 8 is merely an example of a terminal device 800 and is not meant to be limiting as to the terminal device 800, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the terminal device 800 may also include input and output devices, network access devices, buses, etc.
The processor 810 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 820 may be an internal storage unit of the terminal device 800, for example, a hard disk or a memory of the terminal device 800. The memory 820 may also be an external storage device of the terminal device 800, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 800. Further, the memory 820 may also include both internal storage units and external storage devices of the terminal device 800. The memory 820 is used to store the computer program 821 and other programs and data required by the terminal device 800. The memory 820 may also be used to temporarily store data that has been output or is to be output.
The above embodiments are only for illustrating the technical solution of the present application, and are not limited thereto. Although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (8)

1. A search method, comprising:
when receiving information to be searched, identifying the search type of the information to be searched;
if the search type is binary search, expanding the information to be searched into complete bytes to obtain a plurality of sub-search information;
searching the plurality of sub-search information by adopting a byte searching mode respectively to obtain a search result matched with each sub-search information;
outputting the search result; the search result is original data corresponding to the information to be searched in an index library, the original data has complete suffix index information, the suffix index information comprises the original data, suffix indexes of the original data and metadata of the original data, the metadata is used for describing the original data, and the suffix index information is stored in a byte unit;
the searching the plurality of sub-search information by respectively adopting a byte search mode to obtain a search result matched with each sub-search information comprises the following steps:
judging whether key value pairs corresponding to the sub-search information exist in a mapping table one by one according to each sub-search information;
returning the key value pair if the key value pair corresponding to the sub-search information exists in the mapping table, wherein the key value pair comprises the sub-search information and a search result matched with the sub-search information;
if the mapping table does not have the key value pair corresponding to the sub-search information, searching an offset position corresponding to the sub-search information in a preset suffix index, and acquiring a search result matched with the sub-search information according to the offset position;
if the mapping table does not exist, a new mapping table is created, the content of the new mapping table is empty initially, a plurality of key value pairs can be stored, the key value pairs are used for recording the sub-search information and the search results matched with the sub-search information, wherein keys are the sub-search information, and the values are the search results matched with the sub-search information.
2. The method of claim 1, wherein the information to be searched comprises at least one bit, wherein the expanding the information to be searched into a complete byte, obtaining a plurality of sub-search information, comprises:
and filling binary notation symbols before bits contained in the information to be searched and/or after the bits contained in the information to be searched, expanding the information to be searched into complete bytes to obtain a plurality of sub-search information, wherein the complete bytes contain a preset number of bits.
3. The method of claim 1, further comprising, after obtaining search results matching the sub-search information based on the offset location:
and adding the new key value pair to the mapping table by taking the sub-search information as a key and taking a search result matched with the sub-search information as a value as a new key value pair.
4. The method of claim 1, wherein the outputting the search results comprises:
de-duplicating the search results matched with the sub-search information to obtain target search results;
and outputting the target search result.
5. The method of any one of claims 1-4, further comprising:
if the search type is byte search, judging whether a key value pair corresponding to the information to be searched exists in a mapping table or not;
if the mapping table has the key value pair corresponding to the information to be searched, returning the key value pair, wherein the key value pair comprises the information to be searched and a search result matched with the information to be searched;
if the mapping table does not have the key value pair corresponding to the information to be searched, searching an offset position corresponding to the information to be searched in a preset suffix index, and acquiring a search result matched with the information to be searched according to the offset position.
6. A search apparatus, comprising:
the search type identification module is used for identifying the search type of the information to be searched when the information to be searched is received;
the sub-search information generation module is used for expanding the information to be searched into complete bytes if the search type is binary search, so as to obtain a plurality of sub-search information;
the sub-search information searching module is used for searching the plurality of sub-search information by adopting a byte searching mode respectively to obtain a search result matched with each sub-search information;
the search result output module is used for outputting the search result; the search result is original data corresponding to the information to be searched in an index library, the original data has complete suffix index information, the suffix index information comprises the original data, suffix indexes of the original data and metadata of the original data, the metadata is used for describing the original data, and the suffix index information is stored in a byte unit;
the sub-search information search module comprises:
the mapping table judging submodule is used for judging whether key value pairs corresponding to the sub-search information exist in the mapping table one by one according to each sub-search information;
the mapping table returning sub-module is used for returning a key value pair corresponding to the sub-search information if the key value pair exists in the mapping table, wherein the key value pair comprises the sub-search information and a search result matched with the sub-search information;
a sub-search information searching sub-module, configured to search an offset position corresponding to the sub-search information in a preset suffix index if a key value pair corresponding to the sub-search information does not exist in the mapping table, and acquire a search result matched with the sub-search information according to the offset position;
and the new mapping table creation sub-module is used for creating a new mapping table comprising a plurality of key value pairs if the mapping table does not exist, wherein the content of the mapping table is empty at the beginning, and the key value pairs in the new mapping table are used for recording the sub-search information and the search results matched with the sub-search information, wherein the keys are the sub-search information and the values are the search results matched with the sub-search information.
7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the search method according to any of claims 1 to 5 when executing the computer program.
8. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the search method according to any one of claims 1 to 5.
CN202010322885.2A 2020-04-22 2020-04-22 Searching method, searching device, terminal equipment and storage medium Active CN113535710B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010322885.2A CN113535710B (en) 2020-04-22 2020-04-22 Searching method, searching device, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010322885.2A CN113535710B (en) 2020-04-22 2020-04-22 Searching method, searching device, terminal equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113535710A CN113535710A (en) 2021-10-22
CN113535710B true CN113535710B (en) 2023-12-15

Family

ID=78124056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010322885.2A Active CN113535710B (en) 2020-04-22 2020-04-22 Searching method, searching device, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113535710B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1890662A (en) * 2003-09-29 2007-01-03 千兆科技(深圳)有限公司 Content oriented index and search method and system
CN102945242A (en) * 2006-11-01 2013-02-27 起元技术有限责任公司 Managing storage method, system, and computer system
CN106354746A (en) * 2015-07-13 2017-01-25 富士通株式会社 Searching method, and searching device
CN107567621A (en) * 2015-05-06 2018-01-09 厄尔扬·韦斯特哥特科技公司 For performing the method, system and computer program product of numeric search
CN107977458A (en) * 2017-12-19 2018-05-01 深圳马可孛罗科技有限公司 A kind of Airport information filter method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9715525B2 (en) * 2013-06-28 2017-07-25 Khalifa University Of Science, Technology And Research Method and system for searching and storing data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1890662A (en) * 2003-09-29 2007-01-03 千兆科技(深圳)有限公司 Content oriented index and search method and system
CN102945242A (en) * 2006-11-01 2013-02-27 起元技术有限责任公司 Managing storage method, system, and computer system
CN107567621A (en) * 2015-05-06 2018-01-09 厄尔扬·韦斯特哥特科技公司 For performing the method, system and computer program product of numeric search
CN106354746A (en) * 2015-07-13 2017-01-25 富士通株式会社 Searching method, and searching device
CN107977458A (en) * 2017-12-19 2018-05-01 深圳马可孛罗科技有限公司 A kind of Airport information filter method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
改进的二进制搜索防碰撞算法;贾 浩;《微型机与应用》;第第 36 卷卷(第第 16 期期);23-29 *

Also Published As

Publication number Publication date
CN113535710A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
US11249999B2 (en) Memory efficient searching
CN109558525B (en) Test data set generation method, device, equipment and storage medium
CN107704202B (en) Method and device for quickly reading and writing data
CN110532347B (en) Log data processing method, device, equipment and storage medium
US10776427B2 (en) Efficient conditional state mapping in a pattern matching automaton
JP7052145B2 (en) Token matching in a large document corpus
CN111522574B (en) Differential packet generation method and related equipment
US10339096B2 (en) Efficient pattern matching
CN113535710B (en) Searching method, searching device, terminal equipment and storage medium
US10846598B2 (en) Pattern matching
CN110196952B (en) Program code search processing method, device, equipment and storage medium
CN114547086B (en) Data processing method, device, equipment and computer readable storage medium
CN112241336A (en) Method, apparatus and computer program product for backing up data
KR101828466B1 (en) Method and apparatus for providing an object-based storage interface on the storage device based on file system
CN112506651B (en) Method and equipment for data operation in large-data-volume environment
CN113609128A (en) Method and device for generating database entity class, terminal equipment and storage medium
CN116992883B (en) Entity alignment processing method and device
CN110674084A (en) Method, apparatus, and computer-readable storage medium for data protection
CN110471901B (en) Data importing method and terminal equipment
CN113642331B (en) Financial named entity identification method and system, storage medium and terminal
US11822803B2 (en) Method, electronic device and computer program product for managing data blocks
CN112035890B (en) Data integrity verification method and device
US10565197B2 (en) Search performance using smart bitmap operations
CN108228648B (en) Method and device for creating index
CN114880523A (en) Character string processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant