CN117453986A - Searching method, background server and searching system - Google Patents

Searching method, background server and searching system Download PDF

Info

Publication number
CN117453986A
CN117453986A CN202311744805.2A CN202311744805A CN117453986A CN 117453986 A CN117453986 A CN 117453986A CN 202311744805 A CN202311744805 A CN 202311744805A CN 117453986 A CN117453986 A CN 117453986A
Authority
CN
China
Prior art keywords
search
character
value
information
characteristic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311744805.2A
Other languages
Chinese (zh)
Inventor
章浩波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202311744805.2A priority Critical patent/CN117453986A/en
Publication of CN117453986A publication Critical patent/CN117453986A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/541Client-server
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/542Intercept
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/549Remote execution

Abstract

The present disclosure relates to the field of data processing, and in particular, to a search method, a background server, and a search system. The method is applied to a background server, first search is carried out on search information through a filter in the background server, and if the content corresponding to the search information exists, second search is carried out by the search server; and if the content corresponding to the search information does not exist, returning preset information for indicating that the content does not exist to the client. In the method, most invalid requests can be intercepted through the filter in the background server, so that the pressure of the search server is effectively reduced, and more efficient search service is provided for users.

Description

Searching method, background server and searching system
Technical Field
The present disclosure relates to the field of data processing, and in particular, to a search method, a background server, and a search system.
Background
Application searching refers to a technique of searching for an application program (or application software, app, application) according to demand. Application search services typically require processing of large amounts of application data, which is typically stored in a search server. Over time, application data is increasing, more and more data is stored in the search server, the performance pressure of the search server is increasing, the search becomes slower and slower, and the user experience is affected.
Disclosure of Invention
The application provides a searching method, a background server and a searching system, which solve the problem of high searching pressure of a searching server in the prior art.
In order to achieve the above purpose, the present application adopts the following technical scheme:
in a first aspect, a search method is provided and applied to a background server, and the method includes:
after receiving search information sent by a client, carrying out first search according to the search information to obtain a first result;
if the first result indicates that the content corresponding to the search information does not exist, preset information is returned to the client;
if the first result indicates that the content corresponding to the search information exists, the search information is sent to a search server, so that the search server performs second search according to the search information and returns a second result obtained by the second search;
and after receiving the second result returned by the search server, sending the second result to the client.
In the embodiment of the application, based on the principle that the existence may exist or not, the background server determines that the search information does not exist, and then the search server is not required to search. Correspondingly, a filter in the background server sends preset information to the client. By the method, most invalid requests can be intercepted through the background server, so that the pressure of the search server is effectively reduced, and more efficient search service is provided for users.
In an implementation manner of the first aspect, the performing a first search according to the search information, to obtain a first result, includes:
extracting first characteristic information in the search information;
acquiring a first storage position of the first characteristic information in a storage space;
if the first characteristic information is stored in the first storage position, the first result indicates that the content corresponding to the search information exists;
and if the first characteristic information is not stored in the first storage position, the first result indicates that the content corresponding to the search information does not exist.
In this embodiment of the present application, since the data size of the first feature information is generally smaller than the data size of the search information, the searching efficiency of the filter searching according to the first feature information is higher than that of the searching by the search server searching according to the search information. The filter is utilized to perform the first search on the search information, so that most invalid requests can be intercepted, the search efficiency can be improved, the search pressure of the search server can be reduced, and more efficient search service can be provided for users.
In an implementation manner of the first aspect, the extracting first feature information in the search information includes:
Carrying out hash processing according to the character position of each character in the search information in the character string corresponding to the search information to obtain a first character hash value;
and extracting the first characteristic information according to the first character hash value.
Since the arrangement sequence of the characters in the sentence is closely related to the semantics of the sentence, the hash processing in the embodiment of the application is equivalent to extracting the characteristic information of the search information through the character positions of the characters in the search information, and the semantic characteristics of the search information can be reflected to a certain extent, so that the search precision of the subsequent information search is improved.
In an implementation manner of the first aspect, the performing hash processing according to a character position of each character in the search information in a character string corresponding to the search information to obtain a first character hash value includes:
for the 1 st character in the search information, calculating a hash value of the character position of the 1 st character in a character string corresponding to the search information to obtain a first character hash value of the 1 st character; updating an initial memory value according to the first character hash value of the 1 st character; updating an initial shift bit number according to a first preset numerical value to obtain a shift bit number corresponding to the 1 st character;
For the ith character in the search information, calculating a hash value of the character position of the ith character in a character string corresponding to the search information to obtain a first character hash value of the ith character;
performing shift processing on a first character hash value of an ith character according to a shift bit number corresponding to the ith-1 character to obtain a shifted first character hash value of the ith character, wherein i is an integer greater than 1;
updating a first memory value according to the shifted first character hash value of the ith character to obtain a second memory value, wherein the first memory value is used for representing the shifted first character hash value of the first i-1 characters, and the second memory value is used for representing the shifted first character hash value of the first i characters;
and updating the shift bit number corresponding to the ith-1 character according to a first preset value to obtain the shift bit number corresponding to the ith character, wherein the first preset value is determined according to a data interval to which a first character hash value of the ith character before shifting belongs.
In this embodiment of the present application, as the positions of the characters increase, the first preset value also increases, so that the conflict between the hash values of the characters in the memory value can be effectively reduced.
In one implementation manner of the first aspect, the method further includes:
after the shift bit number corresponding to the ith character is obtained, if the shift bit number corresponding to the ith character is larger than a shift threshold value, acquiring data from the second memory value according to the shift threshold value to obtain a third memory value;
and reducing the shift bit number corresponding to the ith character according to the shift threshold value to obtain the updated shift bit number corresponding to the ith character.
In the embodiment of the application, when the shift bit number exceeds the shift threshold value, the shift bit number is reduced, so that the situation that the number of data bits after shift processing is too large due to the too large shift bit number can be effectively reduced; in addition, partial data is acquired from the current memory value, and the length of the memory value can be effectively controlled. Through the mode, the length of the memory value can be effectively controlled, so that the memory value is ensured not to exceed the memory space.
In an implementation manner of the first aspect, the obtaining, according to the shift threshold, data from the second memory value, to obtain a third memory value includes:
and obtaining the numerical value of M digits in the second memory value according to the order of the digits from high to low to obtain the third memory value, wherein M is determined according to the shift threshold.
In the embodiment of the present application, the higher numerical value corresponds to the hash value of the character position of the character that is the next in the character string, and the lower numerical value corresponds to the hash value of the character position of the character that is the previous in the character string. In the above mode, the high-order numerical value is reserved, the low-order numerical value is filtered, and the hash value of the character position of the character behind the character string is reserved as far as possible. In some application scenarios, the characters at the rear position in the character string are mostly keywords for searching, so that the memory value obtained by the method can more reserve the characteristic information of the keywords in the character string.
In an implementation manner of the first aspect, the extracting the first feature information according to the first character hash value includes:
performing amplification processing on a fourth memory value to obtain the processed fourth memory value, wherein the fourth memory value is used for representing a first character hash value of N shifted characters, and N is the total number of the characters in the search information;
and calculating the first characteristic information according to the fourth memory value.
In the embodiment of the application, the number of bits of the processed memory value is increased through the amplification processing of the memory value, so that the data conflict in the subsequent calculation process can be reduced.
In an implementation manner of the first aspect, the calculating the first feature information according to the fourth memory value includes:
performing exclusive-or operation according to the second preset value and the fourth memory value to obtain an exclusive-or value;
and calculating the first characteristic information according to the exclusive OR value.
In the exclusive-or operation, if the two values are the same, the exclusive-or result is 0; if the two values are different, the exclusive OR result is 1. And detecting whether the fourth memory value is the same as the second preset value or not by exclusive-or operation, and if the fourth memory value is the same as the second preset value, the exclusive-or value is 0. When the two groups of application data are the same and the corresponding fourth memory value is the same as the second preset value, the two groups of application data can be judged to be the same through the exclusive or value, and accordingly, the hash values corresponding to the two groups of application data are the same and the characteristic information is the same. When the exclusive or value is 0, the filter does not process the search information corresponding to the current first characteristic information, namely does not filter, and the search information is transmitted to the search server for processing. By the method, the probability of missed detection and false detection can be reduced, and the search precision is improved.
Since the first character hash value of each character in the search information can represent the characteristic information of the character position of the character in the character string, and the fourth memory value is equivalent to the characteristic information of the character position of all the characters in the search information, in the embodiment of the application, the first characteristic information is calculated according to the exclusive or value, which is equivalent to the characteristic information extracted according to the character positions of all the characters in the search information, and thus the extracted characteristic information can represent the position relation among the characters in the search information, and thus the semantics corresponding to the search information are represented.
In one implementation manner of the first aspect, the method further includes:
after the shift bit number corresponding to the ith character is obtained, if the shift bit number corresponding to the ith character is larger than a shift threshold value, performing amplification processing on the second preset numerical value to obtain the processed second preset numerical value;
performing an exclusive-or operation according to the second preset value and the fourth memory value to obtain an exclusive-or value, including:
and if the shift bit number corresponding to the ith character is larger than a shift threshold value, performing exclusive-or operation according to the processed second preset numerical value and the fourth memory value to obtain an exclusive-or value.
In this embodiment of the present application, when the number of shift bits is greater than the shift threshold, it indicates that the number of numerical bits in the current memory value is greater, and in this case, the second preset numerical value is subjected to amplification processing, which is equivalent to increasing the number of numerical bits of the second preset numerical value, and increasing the complexity of the second preset numerical value. In this way, collisions of data in subsequent computation can be reduced.
In an implementation manner of the first aspect, the acquiring a first storage location of the first feature information in a storage space includes:
Calculating a first hash value of the first characteristic information;
and calculating the first storage position according to the first hash value and the storage amount of the storage space.
The hash algorithm is a secure hash algorithm, and in the embodiment of the application, the storage position is determined through the hash algorithm, so that the collision probability of the characteristic information can be effectively reduced, and the accuracy of subsequent information searching is improved.
In an implementation manner of the first aspect, the calculating the first storage location according to the first hash value and the storage amount of the storage space includes:
and performing remainder operation according to the first hash value and the storage amount of the storage space to obtain the first storage position.
The method of the remainder operation can quickly and simply determine the storage position, and can ensure that the calculated storage position falls into the range of the storage space.
In one implementation manner of the first aspect, the method further includes:
receiving application data uploaded by a development terminal;
extracting second characteristic information in the application data;
calculating a second storage position of the second characteristic information in the storage space;
and storing the second characteristic information to the second storage position.
Because the second characteristic information can reflect the characteristics of the application data and the data volume is smaller, compared with storing the application data, storing the second characteristic information is beneficial to saving storage space.
In an implementation manner of the first aspect, the storing the second feature information in the second storage location includes:
and if the second storage position is not empty, replacing third characteristic information with the second characteristic information, wherein the third characteristic information is the data stored in the second storage position.
By the method, the second characteristic information of the latest application data can be ensured to be stored.
In one implementation of the first aspect, the second storage location includes a main location and a plurality of sub-locations; the method further comprises the steps of:
if the main position in the second storage position is not empty, judging whether the sub position in the second storage position is empty or not;
if each sub-position in the second storage position is not empty, judging that the second storage position is not empty;
and if any one of the sub-positions in the second storage position is empty, judging that the second storage position is empty.
Through the storage mode, each storage position in the storage space expands a plurality of sub-positions, so that more characteristic information can be accommodated; when the characteristic information of different application data is the same, the characteristic information corresponding to each of the different application data can be stored. Accordingly, in the subsequent searching process, as long as the characteristic information corresponding to the searching information exists in the storage space, the existence of the searching information is indicated, and the filtering precision of the filter can be ensured. In addition, as the characteristic information corresponding to each of a plurality of different application data is stored in the same storage position, when certain application data is deleted later, only one group of characteristic information in the storage position can be deleted correspondingly, other characteristic information stored in the storage position can be used for inquiring other search information, the inquiry precision is ensured, meanwhile, the flexible deletion of the data in the storage space can be realized, and the maintenance of the subsequent storage space is facilitated.
In one implementation manner of the first aspect, the method further includes:
adding one to the number of times of replacement after replacing the third feature information with the second feature information;
if the current replacing times reach the preset times, expanding the storage space to obtain the expanded storage space;
and calculating a third storage position of the feature information stored in the storage space before capacity expansion in the storage space after capacity expansion.
In the embodiment of the application, more characteristic information can be stored by performing capacity expansion processing on the storage space, and the same characteristic information stored before can be reserved as far as possible, so that the subsequent searching precision is improved, and the subsequent maintenance of the storage space is facilitated.
In a second aspect, a background server is provided, including:
the filter is used for carrying out first search according to the search information after receiving the search information sent by the client to obtain a first result; if the first result indicates that the content corresponding to the search information does not exist, preset information is returned to the client; the preset information is used for indicating that the content corresponding to the search information does not exist;
The service module is used for sending the search information to a search server when the first result indicates that the content corresponding to the search information exists, so that the search server performs second search according to the search information and returns a second result obtained by the second search; and after receiving the second result returned by the search server, sending the second result to the client.
In a third aspect, a search system is provided, comprising:
a search server and a background server as described in the second aspect.
In a fourth aspect, there is provided a computer readable storage medium storing a computer program which, when executed by one or more processors, implements the method of any of the first aspects.
In a fifth aspect, a computer program product is provided which, when run on a terminal device, enables the terminal device to implement the method according to any of the first aspects.
Drawings
Fig. 1 is a schematic view of a scenario of application search provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of an ES server infrastructure provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of data interaction of an ES-based search architecture provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of a search system provided by an embodiment of the present application;
FIG. 5 is an interactive flow diagram of a search method according to an embodiment of the present application;
FIG. 6 is a flowchart of a method for storing data by a filter according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a storage space provided by an embodiment of the present application;
fig. 8 is a schematic diagram of a storage space before and after capacity expansion according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a process of a first search provided by an embodiment of the present application;
fig. 10 is an interactive flow diagram of a search process provided in an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that in embodiments of the present application, "one or more" means one, two, or more than two; "and/or", describes an association relationship of the association object, indicating that three relationships may exist; for example, a and/or B may represent: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.
In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," "fourth," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Application searching refers to a technique of searching for an application program (or application software, app, application) according to demand. In some application scenarios, the application search is mobile terminal oriented. Taking a mobile phone as an example, an application with an application search function is installed in the mobile phone, and a user searches and downloads required application software through the application with the application search function.
Referring to fig. 1, a schematic view of an application search scenario is provided in an embodiment of the present application. As shown in fig. 1 (a), an interface 10 of an "application center" installed on a mobile phone (for convenience of description, an application having an application search function will be referred to as an "application center" in the embodiment of the present application). A search box 101 is included in the interface 10. In the case where the user does not input search information, some recommended applications may be displayed in the interface of the "application center", as shown in the interface 10, the applications recommended preferentially are "application one", "application two", and the applications recommended secondarily are "application three", "application four", and "application five".
When a user inputs a name or related information of application software to be searched in an "application center", the application center may search according to the search information input by the user and display the search result to the user. For example, when the user inputs "application one" in the search box 101 of the interface 10 shown in (a) of fig. 1, the mobile phone displays the interface 11 shown in (b) of fig. 1. An icon of "apply one" and an "install" button in the interface 11. The application center may also search for applications that are related or similar to the search information for display to the user. "application six", "application seven", and "application eight" as shown in interface 11 are applications related to "application one".
It should be noted that, the searching method provided in the embodiment of the present application may also be applied to other searching scenarios, such as searching information in a search website. The application scenario of the search method is not particularly limited in the embodiment of the application.
Application search services typically require processing of large amounts of application data, which is typically stored in a search server. For example, the search server may employ an elastic search (ES for short). The ES provides a distributed, highly extended, high real-time search and data analysis engine. An ES server may be considered a cluster, consisting of a plurality of nodes, each of which may be a server in the cluster. Each node stores cluster state information, such as all node information, all index information, and fragmented routing information.
Application data is typically stored in the index (indices) of the ES server, each of which can be viewed as a collection of documents that possess similar characteristics. An index may store a large amount of data beyond the hardware limitations of a single node. The ES may divide a complete index into multiple slices (shards). In this way, a large index can be split into multiple slices, distributed over different nodes, constituting a distributed search. The slices can be divided into a main slice and a secondary slice. One main tile may correspond to a plurality of auxiliary tiles. The master shard is responsible for processing write requests and storing data, and the replica shard is responsible for storing data. In order to ensure high availability, in some application scenarios, the master shard and the corresponding replica shard may also be distributed on different nodes.
Exemplary, referring to fig. 2, a schematic diagram of an ES server basic architecture provided in an embodiment of the present application is shown. By way of example and not limitation, as shown in FIG. 2, an ES cluster includes 3 nodes, and an index includes 3 primary shards (primary shards) and 3 secondary shards. The main slice 1 and the auxiliary slice 1 are located on the node 1, the main slice 2 and the auxiliary slice 2 are located on the node 2, and the main slice 3 and the auxiliary slice 3 are located on the node 3, so that a distributed architecture is formed.
The primary shards may synchronize data with the duplicate shards on different nodes, each corresponding to 1 duplicate shard. As shown in fig. 2, the main slice 1 corresponds to the copy slice 2, and the main slice 1 synchronizes data with the copy slice 2; the main slice 2 corresponds to the auxiliary slice 3, and the main slice 2 and the auxiliary slice 3 synchronize data; the main slice 3 corresponds to the auxiliary slice 1, and the main slice 3 synchronizes data with the auxiliary slice 1. In this way, the copy slices can improve the fault tolerance of the system, and when a certain slice of a certain node is damaged or lost, the copy slices can be recovered.
It should be noted that fig. 2 only shows a case where the index includes 3 main slices, and each main slice corresponds to one copy slice. In other examples, the index may include more or fewer primary shards, each of which may correspond to multiple secondary shards. In the embodiment of the present application, the number of slices of the index is not specifically limited.
It should be noted that fig. 2 only shows one example of the primary and secondary shard synchronization data, and in other examples, the primary shard of each node may also synchronize data with each secondary shard of a different node. This is not particularly limited in the embodiments of the present application.
As an example of storing data by the ES, the node 1 in the ES receives a write request from the client, finds a corresponding main slice 2 according to a routing algorithm, and forwards the write request to the node 2 corresponding to the main slice for processing, so that the node 2 writes application data into the main slice 2; after the writing of the main slice 2 is completed, the node 2 sends the writing request to the node 3 where the copy slice 3 corresponding to the main slice 2 is located in parallel, so that the node 3 writes the application data into the copy slice 3. After the writing of the copy fragments 3 is completed, the node 3 feeds back the writing result to the node 2, the node 2 feeds back the writing result to the node 1, and the node 1 feeds back the writing result to the client side, so that a complete writing process is completed. In the searching process, a certain node receives a query request of a client and sends the query request to a main fragment and a auxiliary fragment corresponding to the query request.
In the ES basic architecture, the nodes can be divided into hot nodes and cold nodes, and the hardware configuration corresponding to the hot nodes is higher than that corresponding to the cold nodes. The hot node is used for storing application data with higher searching frequency, and the cold node is used for storing application data with lower searching frequency.
Referring to fig. 3, a data interaction schematic diagram of an ES-based search architecture according to an embodiment of the present application is provided. By way of example and not limitation, as shown in FIG. 3, an ES server 31 is included in the search architecture, and a hot node 311 and a cold node 312 are included in the ES server 31.
The developer uploads application data to the ES server 31 through the development terminal or takes the application data off the shelf from the ES server 31. After receiving the application data uploaded by the development end, the ES server 31 preferentially stores the application data into the hot node 311; if the application data is not queried for more than a certain time, the hot node 311 migrates the application data into the cold node 312.
During the search, the user inputs search information at the client, which transmits the search information to the ES server 31. The ES server 31 searches the hot node 311 for application data related to the search information, and if the related application data is searched, returns the search result to the client; if the relevant application data is not searched, the application data related to the search information is searched in the cold node 312, and if the relevant application data is searched, the search result is returned to the client.
Over time, applications continue to increase, with more and more data being stored in the hot and cold nodes of the ES server, and with the ES server performance pressures becoming greater, searches become slower and slower, affecting user experience.
Based on this, the embodiment of the application provides a search architecture. Through the search architecture provided by the embodiment of the application, the search information of the user is filtered by the filter preferentially, and most invalid requests can be intercepted by the filter, so that the pressure of the ES server is effectively reduced, and more efficient search service is provided for the user.
In some examples, referring to fig. 4, a schematic diagram of a search system provided in an embodiment of the present application is shown. As shown in fig. 4, the search system may include a background server 41 and an ES server 42 (also referred to as a search server in the embodiment of the present application). For convenience of description, in the embodiment of the present application, a server corresponding to application software (search application)/website (search website) having a search function is referred to as a background server. For example, in the embodiment of fig. 1, the "application center" on the mobile phone is search software, and its corresponding server may be referred to as a background server. For another example, in some application scenarios, a user performs a content search using a search website, and accordingly, a server corresponding to the search website may be referred to as a background server.
As shown in fig. 4, the background server 41 may include a filter 411 and an ES service module 412 (also referred to as a service module in the embodiment of the present application). Wherein the filter 411 is in communication with the client for receiving search information sent by the client or sending search results to the client. The ES service module 412 communicates with the ES server 42 for invoking search services of the ES server 42. Note that, the ES service module 412 communicates with the ES server 42, which may mean that the ES service module 412 communicates with nodes in the ES server cluster.
In the searching process, a user inputs search information in a search application of a client, and the client sends the search information to a filter 411 in a background server of the search application; the filter 411 performs a first search according to the search information; if the first result of the first search indicates that the search information does not exist, the filter 411 feeds back the first result of the first search to the client; if the first result of the first search indicates that the search information exists, the search information is transmitted to the ES server 42 through the ES service module 412; the ES server 42 performs a second search according to the search information, and feeds back a second result of the second search (searched application data) to the ES service module 412 and to the client by the ES service module 42.
It should be noted that, in other examples, the background server 41 may not include a separate ES service module 412, and the filter 411 may interact with the ES server 42. In other examples, the ES server may also be deployed into a backend server.
For maintenance of application data, the filter remains synchronized with the application data in the ES server. In other words, as shown in fig. 4, the developer uploads the application data to the filter through the development terminal, and at the same time, the developer uploads the application data to the ES server through the development terminal; the developer deletes the application data from the filter through the developer, and simultaneously the developer deletes the application data from the ES server through the developer.
In comparison to the search architecture shown in fig. 3, a filter is added before the ES server in the search system shown in fig. 4. With such a search system, the search information of the user is filtered by the filter preferentially. Since 80% of the application data searched by the user is absent, only 20% of the application data actually exists. Therefore, based on the principle that the existence of the filter judgment is possible and the nonexistence of the filter judgment is not necessarily possible, most invalid requests can be intercepted through the filter, so that the pressure of an ES server is effectively relieved, and more efficient search service is provided for users.
Based on the search system shown in fig. 4, the embodiment of the application provides a search method. Referring to fig. 5, an interactive flow diagram of a search method provided in an embodiment of the present application is shown. By way of example and not limitation, as shown in fig. 5, the search method may include the steps of:
s501, the client transmits search information to a background server.
Correspondingly, a filter in the background server receives the search information sent by the client.
In the context of application searching as shown in FIG. 1, the search information may be the name, type, and/or other descriptive information of the application software. For example, the user wants to search for an application AA, which is named AA, belongs to the game class of application, and is a 3D game. The user may enter "AA", or "game", or "3D game", etc. within the search box of the search application.
In the context of content searching, a user may search for content of interest in a search website. The search information may be information such as keywords of the content. For example, a user may input "today's news" or the like within a search box of a search website, in which the user wants to search for the latest news stories.
S502, the background server performs first search according to the search information to obtain a first result.
In the embodiment of the present application, S502 is executed by a filter in the background server. The specific method of the filter to perform the first search according to the search information can be seen from the following description of the embodiment of fig. 9.
And S503, if the first result indicates that the content corresponding to the search information exists, the background server sends the search information to a search server so that the search server searches according to the search information.
And if the first search result indicates that the content corresponding to the search information does not exist, the background server sends preset information to the client.
The preset information is used for indicating that the content corresponding to the search information does not exist. In some examples, the preset information may be text information, a picture, a logo, or the like.
The search server may be an ES server as shown in fig. 4.
As shown in fig. 4, when the background server includes an ES service module, the implementation manner of S503 may be that the filter in the background server sends the search information to the ES service module, and then the ES service module sends the search information to the ES server. When the background server does not include the ES service module, the implementation manner of S503 may be that the filter in the background server sends the search information to the ES server.
S504, after receiving the search information sent by the background server, the search server performs a second search according to the search information to obtain a second result, and returns the second result to the background server.
As shown in fig. 4, when the ES service module is included in the background server, the search server may transmit the second result to the ES service module in the background server. When the ES service module is not included in the background server, the search server may send the second result to a filter in the background server.
It should be noted that, the method for the search server to perform the second search according to the search information may be a search method of the search server in the related art, and the search method of the search server in the embodiment of the present application is not specifically limited.
In this embodiment of the present application, the second result is a search result obtained by searching by the search server according to the search information. If the search server does not search the content corresponding to the search information, the second result may include preset information, where the preset information indicates that the content corresponding to the search information does not exist, and the preset information may be text information, a picture or a mark; if the search server retrieves the content corresponding to the search information, the second result may include the content corresponding to the search information.
Continuing with the example in S501. In the context of application searching, the second result may include all applications related to the search information and related information for each application (e.g., application size, number of downloads, details, etc.). In the context of content searching, the second result may include all content related to the search information, e.g., after the user enters "today's news" within the search box of the search website, the third search result may include all news content updated today (e.g., news summaries, news details, etc.).
S505, after receiving the second result returned by the search server, the background server sends the second result to the client.
As shown in fig. 4, when an ES service module is included in the background server, S505 may be performed by the ES service module in the background server; or the second result is forwarded to the filter in the background server by the ES service module, and S505 is performed by the filter. When the ES server module is not included in the background server, S505 may be performed by a filter in the background server.
Of course, the background server may also include a communication module for data interaction with the client. Specifically, the client sends the search information to the communication module, and the communication module forwards the search information to the filter; when the filter detects that the search information does not exist, the filter sends a first result to the communication module, and the communication module forwards the first result to the client; when the filter detects that the search information exists, the search information is sent to the search server, the search server sends a second result to the communication module, and the second result is forwarded to the client by the communication module.
In the embodiment of the present application, based on the principle that there is a possibility that there is no existence, there is no need to search by the ES server when the filter in the background server determines that the search information does not exist. Correspondingly, a filter in the background server sends preset information to the client. By the method, most invalid requests can be intercepted through the background server, so that the pressure of the search server is effectively reduced, and more efficient search service is provided for users.
For ease of illustration, the process of storing data by the filter in the background server will be described.
Referring to fig. 6, a flowchart of a method for storing data by using a filter according to an embodiment of the present application is shown. By way of example and not limitation, as shown in FIG. 6, a method of storing data by a filter may include the steps of:
s601, the filter receives application data uploaded by the development terminal.
As shown in fig. 4, the developer uploads the application data to the filter in the background server through the development terminal, and accordingly, the filter receives the application data uploaded by the development terminal, and then stores the application data according to steps S602 to S604.
S602, the filter extracts second feature information in the application data.
Wherein the data amount of the second characteristic information is smaller than the data amount of the application data.
The method of extracting the second characteristic information may be found in the following description of the embodiments of steps I-II.
S603, the filter calculates a second storage location of the second feature information in the storage space.
In some storage modes, the second feature values may be randomly stored in the storage space, or the second feature values may be stored in the storage space in a sequential order. However, in the above manner, it is difficult to form a mapping relationship between the second feature value and the storage location, which is not beneficial to subsequent information searching.
To solve the above problem, in some embodiments, the manner of calculating the second storage location in step S603 may include: calculating a second hash value of the second characteristic information; and calculating a second storage position according to the second hash value and the storage amount of the storage space.
In some implementations of computing the second hash value, the second hash value may be computed using a secure hash algorithm (Secure Hash Algorithm, SHA). SHA includes algorithms of SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512. The SHA-256 is a safe and reliable encryption algorithm, and can provide higher collision resistance, namely, the probability that different input data generate the same hash value is lower. In the embodiment of the application, the SHA-256 can be adopted to calculate the second hash value of the second characteristic information, so that the collision probability of the characteristic information can be effectively reduced, and the accuracy of subsequent information searching is improved.
In this embodiment of the present application, the storage manner may be: and performing remainder operation according to the second hash value and the storage amount of the storage space to obtain a second storage position.
In the embodiment of the present application, the storage amount of the storage space may refer to the number of storage positions in the storage space. Of course, the amount of memory in the memory space is also understood to be the amount of data in the memory space that can store data. Under this understanding, the number of storage locations in the storage space can be calculated from the unit data amount of storable data per storage location and the storage amount of the storage space.
The remainder operation is also called modulo operation, which refers to the calculation of the remainder after dividing two numbers. In the hash algorithm, the remainder operation is to divide the hash value by the storage amount of the storage space, and the obtained remainder is used as the storage position. For example, assuming that the second hash value is 1025, the number of storage locations in the storage space is 1024, and the remainder of the storage amount of the second hash value is 1, the 1 st storage location in the storage space is used to store the second characteristic information corresponding to the second hash value 1025.
The method of the remainder operation can quickly and simply determine the storage position, and can ensure that the calculated storage position falls into the range of the storage space.
In some implementations, the storage amount Q of the storage space may be set to an index of 2. In this manner, the remainder operation is performed according to the second hash value and the storage amount of the storage space, which is equivalent to taking the last Q bits of the second hash value. The calculation mode is simpler, and is beneficial to improving the processing efficiency.
In the embodiment of the application, the storage position of the second characteristic information is determined according to the second hash value of the second characteristic information and the storage amount of the storage space, so that a clear mapping relation can be formed between the characteristic information of the application data and the storage position of the storage space, and subsequent information searching is facilitated.
S604, the filter stores the second characteristic information in a second storage location.
Because the second characteristic information can reflect the characteristics of the application data and the data volume is smaller, compared with storing the application data, storing the second characteristic information is beneficial to saving storage space.
In some application scenarios, there may be situations where the characteristic information of different application data is the same. In this case, the storage locations corresponding to the feature information of the different application data are the same. For example, the second characteristic information of the application data XX extracted according to step S602 is the same as the second characteristic information of the extracted application data YY, and the second hash value calculated from the same second characteristic information is also the same, and therefore, the storage location determined from the same second hash value is also the same.
One solution is to store no more current second characteristic information if data is already stored in the storage location calculated from the second characteristic information. In this way, the feature information of the latest application data is discarded. This approach is prone to the following problems: when the subsequent information is searched, whether the characteristic information on the storage position is the characteristic information corresponding to the current search information cannot be judged, so that the filtering precision of the filter is reduced; in addition, since the feature information in a certain storage location may correspond to multiple sets of application data, it is inconvenient to delete the feature information in the storage location in the following process, which is not beneficial to maintenance of the storage space.
To solve the above problem, in some embodiments, step S604 may include:
if the second storage position is empty, storing the second characteristic information into the second storage position;
and if the second storage position is not empty, replacing third characteristic information with the second characteristic information, wherein the third characteristic information is the data stored in the second storage position.
By the method, the second characteristic information of the latest application data can be ensured to be stored.
In order to store the same feature information as much as possible, embodiments of the present application provide a structure of a storage space. In an embodiment of the present application, the storage space may include a plurality of main locations and at least one sub-location under each main location. Each main location is identical to the storage location corresponding to the sub-location under the main location. With this structure of the storage space, it is equivalent to expanding each storage location in the storage space so as to store multiple sets of the same feature information.
Referring to fig. 7, a schematic diagram of a storage space provided in an embodiment of the present application is shown. As shown in fig. 7, the storage space includes a plurality of main locations 71 and 4 sub-locations 72 under each main location. For ease of illustration, only 4 sub-positions under one main position are shown in fig. 7.
In some implementations, the data in the storage space may be stored in the form of a matrix array. For example, a primary location may correspond to a row in a matrix array and a secondary location may correspond to a column in the matrix array.
In the embodiment of the present application, the number of main positions and the number of sub-positions in each main position are not particularly limited. It will be appreciated that the greater the number of sub-locations per main location, the more identical feature information that can be carried, but the greater the capacity requirements for storage space. Therefore, the number of sub-positions per main position can be set according to actual requirements.
Based on the structure of the storage space shown in fig. 7, one implementation of step S604 may include:
if the main position in the second storage position is empty, storing the second characteristic information into the main position of the second storage position;
if the main position in the second storage position is not empty, judging whether the sub position in the second storage position is empty or not;
if any one of the sub-positions in the second storage position is empty, storing the second characteristic information into the idle sub-position in the second storage position;
if each sub-location in the second storage location is not empty, then the second storage location is determined to be not empty.
The process of determining whether the sub-positions are empty may be sequentially determined according to the arrangement order of the sub-positions, or may be determined in parallel without according to the arrangement order.
Taking the storage space shown in fig. 7 as an example, the second storage position calculated by the pseudo design is P, and whether the main position P0 on the second storage position P is empty is first determined; if P0 is empty, storing the second characteristic information into P0; if P0 is not empty, judging whether the first sub-position P1 of the second storage position P is empty or not; if P1 is empty, storing the second characteristic information into P1; if P1 is not empty, judging whether a second sub-position P2 of the second storage position P is empty or not; if P2 is empty, storing second characteristic information into P2; if P2 is not empty, judging whether a third sub-position P3 of the second storage position P is empty or not; if P3 is empty, storing the second characteristic information into P3; if P3 is not empty, judging whether a fourth sub-position P4 of the second storage position P is empty or not; if P4 is empty, storing the second characteristic information into P4; if P4 is not empty, it is determined that the second storage location P is not empty.
In this embodiment, if it is determined that the second storage location is not empty, the second feature information is replaced with the third feature information as described in the above step. In some implementations, the second characteristic information may be randomly replaced with any one set of third characteristic information on the second storage location, such as the third characteristic information on the main location or the third characteristic information on any one sub-location. In other implementations, the primary position and the secondary position may be sequentially replaced in order.
Through the storage mode, each storage position in the storage space expands a plurality of sub-positions, so that more characteristic information can be accommodated; when the characteristic information of different application data is the same, the characteristic information corresponding to each of the different application data can be stored. Accordingly, in the subsequent searching process, as long as the characteristic information corresponding to the searching information exists in the storage space, the existence of the searching information is indicated, and the filtering precision of the filter can be ensured. In addition, as the characteristic information corresponding to each of a plurality of different application data is stored in the same storage position, when certain application data is deleted later, only one group of characteristic information in the storage position can be deleted correspondingly, other characteristic information stored in the storage position can be used for inquiring other search information, the inquiry precision is ensured, meanwhile, the flexible deletion of the data in the storage space can be realized, and the maintenance of the subsequent storage space is facilitated.
To ensure that previously stored feature information (e.g., third feature information replaced by second feature information) is not deleted, in some embodiments the method may further comprise:
after replacing the third feature information with the second feature information, adding one to the number of times of replacement;
If the current replacement times do not reach the preset times, replacing the fourth characteristic information with the third characteristic information, and adding one to the replacement times, wherein the fourth characteristic information is data except the third characteristic information in the data stored in the second storage position;
if the current replacement times reach the preset times, expanding the storage space to obtain an expanded storage space; and calculating a third storage position of the feature information stored in the storage space before capacity expansion in the storage space after capacity expansion.
Illustratively, continuing with the example of FIG. 7 above, assume that the preset number of times is 5. When the second storage position P is not empty, the second characteristic information is replaced with the characteristic information (third characteristic information) on the main position P0 of the second storage position P, and the replacing frequency is 1 at this time and does not reach the preset frequency; the characteristic information (third characteristic information) on the main position P0 is replaced with the characteristic information (fourth characteristic information) on the sub position P1 of the second storage position P, and the replacing times are 2 at the moment and do not reach the preset times; the characteristic information (fourth characteristic information) on the sub-position P1 is replaced with the characteristic information on the sub-position P2 of the second storage position P, and the replacing times are 3 and do not reach the preset times; the characteristic information on the sub-position P2 is replaced with the characteristic information on the sub-position P3 of the second storage position P, and the replacing times are 4 at the moment and do not reach the preset times; the characteristic information on the sub-position P3 is replaced with the characteristic information on the sub-position P4 of the second storage position P, and the replacing times are 5 at the moment and reach the preset times; expanding the storage space to obtain an expanded storage space; and calculating a third storage position of the characteristic information in the sub-position P4 in the expanded storage space, and storing the characteristic information in the sub-position P4 of the original second storage position P in a new third storage position.
The preset times can be preset according to actual requirements. If the preset times are smaller, the probability of storing the characteristic information is possibly reduced; if the preset times are larger, the probability of storing the feature information can be improved, but the processing time of the algorithm can also be increased.
In this embodiment of the present application, the preset times may be set according to the number of main positions and sub positions corresponding to each storage position in the storage space. Taking fig. 7 as an example, each storage location includes 1 main location and 4 sub locations, the preset number of times may be set to 5 (sum of the number of main locations and sub locations). I.e. 5 times of replacement, indicating that the current storage location is full, both the main location and the sub location have stored data.
In the embodiment of step S603, the storage position corresponding to the feature information is calculated by performing a remainder operation according to the hash value of the feature information and the storage amount of the storage space, so as to obtain the storage position corresponding to the feature information. Because the expansion processing is performed on the storage space, the storage capacity of the expanded storage space is changed, and correspondingly, the corresponding storage position of the characteristic information in the expanded storage space is also changed. Therefore, it is necessary to recalculate the storage locations of all the feature information stored in the storage space, which correspond to the feature information in the expanded storage space.
In some examples, referring to fig. 8, a schematic diagram of a storage space before and after capacity expansion is provided in an embodiment of the present application. As shown in fig. 8 (a), the storage space before expansion includes 1024 storage locations. As shown in fig. 8 (b), the number of storage locations included in the expanded storage space is 2048. Assuming that the hash value of the feature information ZZ is 1025, according to the calculation method of the storage position in S603, the corresponding storage position in the storage space before expansion is 1, and the corresponding storage position in the storage space after expansion is 1025. As shown in fig. 8, the storage location of the feature information ZZ in the storage space before expansion is different from the storage location in the storage space after expansion.
In the embodiment of the application, the capacity expansion of the storage space refers to increasing the number of storage positions in the storage space. In some implementations, the number of primary locations may be increased, but the number of child locations under each primary location is unchanged. In other implementations, the number of sub-positions per main position may be increased, but the number of main positions is unchanged. In other implementations, the number of primary locations and the number of secondary locations per primary location may be increased simultaneously. The capacity of the storage space can be expanded according to actual requirements.
In the embodiment of the application, more characteristic information can be stored by performing capacity expansion processing on the storage space, and the same characteristic information stored before can be reserved as far as possible, so that the subsequent searching precision is improved, and the subsequent maintenance of the storage space is facilitated.
The method of extracting the second characteristic information in step S602 is described below.
In some embodiments, the manner in which the second feature information is extracted in step S602 may include:
I. and carrying out hash processing according to the character position of each character in the application data in the character string corresponding to the application data to obtain a second character hash value.
II. And extracting second characteristic information according to the second character hash value.
Since the arrangement sequence of the characters in the sentence is closely related to the semantics of the sentence, the hash processing in the embodiment of the application data is equivalent to extracting the characteristic information of the application data through the character positions of the characters in the application data, so that the semantic characteristics of the application data can be reflected to a certain extent, and the search precision of the subsequent information search is improved.
In some embodiments, the shifting process in step I may include:
for the 1 st character in the application data, calculating a hash value of the character position of the 1 st character in the character string corresponding to the application data to obtain a second character hash value of the 1 st character; updating the initial memory value according to the hash value of the second character of the 1 st character; and updating the initial shift bit number according to the first preset numerical value to obtain the shift bit number corresponding to the 1 st character.
Calculating a hash value of the character position of the ith character in the character string corresponding to the application data for the jth character in the application data to obtain a second character hash value of the ith character;
performing shift processing on the second character hash value of the j-th character according to the shift bit number corresponding to the j-1-th character to obtain a shifted second character hash value of the j-th character, wherein j is an integer greater than 1;
updating a fifth memory value according to the second character hash value of the j-th character after the shift to obtain a sixth memory value, wherein the fifth memory value is used for representing the second character hash value of the first j-1 characters after the shift, and the sixth memory value is used for representing the second character hash value of the first j characters after the shift;
and updating the shift bit number corresponding to the j-1 th character according to the first preset numerical value to obtain the shift bit number corresponding to the j-1 th character.
In some examples, the initial shift bit number may be set to 0. The initial memory value may be 0000.
In this embodiment of the present application, the first preset value is determined according to a data interval to which the hash value of the second character of the j-th character before shifting belongs. For example, three data intervals are set, wherein interval 1 represents a second character hash value of less than 128, interval 2 represents a second character hash value of greater than 128 and less than 2048, and interval 3 represents a second character hash value of greater than 2048. Each data interval corresponds to a first preset value, for example, the first preset value corresponding to the interval 1 is 8, the first preset value corresponding to the interval 2 is 16, and the first preset value corresponding to the interval 3 is 24. With the increase of the character positions, the first preset value is increased, so that the conflict between hash values of all characters in the memory value can be effectively reduced.
In some implementations of updating the fifth memory value, it may include: and performing OR operation on the second character hash value of the j-th character after the shift and the fifth memory value, wherein the obtained operation result is a sixth memory value.
In the embodiment of the present application, the fifth memory value refers to the memory value before updating, and the sixth memory value refers to the memory value after updating. For example, when j=2, the fifth memory value refers to a memory value obtained by updating the initial memory value according to the hash value of the second character of the 1 st character; when j=3, the fifth memory value refers to a memory value obtained by updating the memory value according to the shifted second character hash value of the 2 nd character (i.e., the sixth memory value when j=2). And so on.
Illustratively, taking the data interval and the first preset value as examples, the second character hash value of the 1 st character is 0001, and the second character hash value of the character does not need to be subjected to shift processing; and performing OR operation according to the hash value 0001 of the second character of the 1 st character and the initial memory value 0000, wherein the obtained memory value is 0001. The calculated shift bit number corresponding to the 1 st character is 8; shifting the second character hash value 0010 of the 2 nd character by 8 bits to the left to obtain a shifted second character hash value 0010 0000 0000 of the 2 nd character; and performing OR operation on the shifted second character hash value of the 2 nd character and the memory value 0001 (fifth memory value) to obtain an updated memory value 0010 0000 0001 (sixth memory value). The calculated shift number corresponding to the 2 nd character is 16. And so on.
The above is merely an example of updating the memory value, and the method of calculating the hash value, the obtained hash value, and the like are not particularly limited.
In some implementations, after obtaining the shift number corresponding to the jth character, the method may further include:
if the shift bit number corresponding to the jth character is smaller than or equal to the shift threshold value, continuing to shift the second character hash value of the j+1th character according to the shift bit number corresponding to the jth character;
if the shift bit number corresponding to the j-th character is larger than the shift threshold value, acquiring data from the sixth memory value according to the shift threshold value to obtain a seventh memory value; and reducing the shift bit number corresponding to the j-th character according to the shift threshold value to obtain the updated shift bit number corresponding to the j-th character.
In the embodiment of the application, when the shift bit number exceeds the shift threshold value, the shift bit number is reduced, so that the situation that the number of data bits after shift processing is too large due to the too large shift bit number can be effectively reduced; in addition, partial data is acquired from the current memory value, and the length of the memory value can be effectively controlled. Through the mode, the length of the memory value can be effectively controlled, so that the memory value is ensured not to exceed the memory space.
Optionally, one way to obtain the data from the sixth memory value according to the shift threshold is to obtain the value on the M digits in the sixth memory value according to the order of the digits from high to low, so as to obtain the seventh memory value.
Wherein M is determined from a shift threshold. In some implementations, M may be equal to the shift threshold. For example, assuming that the shift threshold is 32 and m=32, and the current sixth memory value is 0101 0000 0100 0000 0011 0000 0010 0000 0001, the numerical value of the upper 32 digits is taken in the order of the upper digits from the upper digit to the lower digit, and the obtained seventh memory value is 0101 0000 0100 0000 0011 0000 0010 0000.
As is clear from the above embodiments, the higher numerical value corresponds to the hash value of the character position of the character next in the character string, and the lower numerical value corresponds to the hash value of the character position of the character next in the character string. In the above mode, the high-order numerical value is reserved, the low-order numerical value is filtered, and the hash value of the character position of the character behind the character string is reserved as far as possible. In some application scenarios, the characters at the rear position in the character string are mostly keywords for searching, so that the memory value obtained by the method can more reserve the characteristic information of the keywords in the character string.
In some embodiments, extracting the second feature information according to the second character hash value in step II may include:
amplifying the eighth memory value to obtain a processed eighth memory value, wherein the eighth memory value is used for representing a shifted second character hash value of N characters, and N is the total number of the characters in the application data; and calculating second characteristic information according to the eighth memory value.
If the shift threshold is not exceeded, that is, the low-order numerical value in the memory value is not filtered, the eighth memory value includes hash values of the character positions corresponding to the N characters in the application data. If the shift threshold is exceeded in the shift process, that is, the filtering process is performed on the low-order numerical value in the memory value, the eighth memory value includes hash values of the character positions corresponding to the N characters in the application data, but the hash values of the character positions of the N characters may be used to represent the feature information of the character positions of the N characters in the application data after the shift. Wherein N is a positive integer less than N.
In some implementations, the amplifying the eighth memory value may include: and calculating the complement of the eighth memory value, and shifting the complement of the eighth memory value left by L bits to obtain the processed eighth memory value. Wherein L is a preset value. For example, assuming that the eighth memory value is 0010 0000 and l=8, the eighth memory value after processing is 0010 0000 0000 0000.
In the embodiment of the application, the number of bits of the processed memory value is increased through the amplification processing of the memory value, so that the data conflict in the subsequent calculation process can be reduced.
In some implementations, the second feature information is calculated according to the eighth memory value by:
performing exclusive-or operation according to the second preset value and the eighth memory value to obtain an exclusive-or value; and calculating the second characteristic information according to the exclusive OR value.
The second preset value may be preset by a developer. In some implementations, the second preset value may be set to a prime number. Since prime numbers cannot be decomposed any more, the reliability of subsequent calculation is improved.
In the exclusive-or operation, if the two values are the same, the exclusive-or result is 0; if the two values are different, the exclusive OR result is 1. For example, assume that the second preset value is 0000 0111, the eighth memory value is 0010 0000, and the exclusive-or value obtained by the exclusive-or operation of the second preset value and the eighth memory value is 0010 0111. Assuming that the second preset value is 0000 0111, the eighth memory value is 0000 0111, and the exclusive-or value of the two exclusive-or operations is 0000 0000.
And detecting whether the eighth memory value is the same as the second preset value or not by exclusive-or operation, and if the eighth memory value is the same as the second preset value, the exclusive-or value is 0. When the two groups of application data are the same and the corresponding eighth memory values are the same as the second preset value, the two groups of application data can be judged to be the same through the exclusive or value, and accordingly, the hash values and the characteristic information corresponding to the two groups of application data are the same. For this case, two sets of application data may be stored alternatively. In this way, the chance of duplication of storage can be reduced.
Since the second character hash value of each character in the application data can represent the characteristic information of the character position of the character in the character string, and the eighth memory value is equivalent to the characteristic information of the character positions of all the characters in the application data, in the embodiment of the application, the second characteristic information is calculated according to the exclusive or value, which is equivalent to the characteristic information extracted according to the character positions of all the characters in the application data, and thus the extracted characteristic information can represent the position relation among the characters in the application data, and thus the semantics corresponding to the application data are represented.
In some implementations, after obtaining the shift bit number corresponding to the jth character, if the shift bit number corresponding to the jth character is greater than a shift threshold, performing amplification processing on the second preset numerical value to obtain the processed second preset numerical value;
correspondingly, performing exclusive-or operation according to the second preset value and the eighth memory value to:
if the shift bit number corresponding to the j-th character is larger than the shift threshold value, performing exclusive-or operation according to the processed second preset numerical value and the eighth memory value to obtain an exclusive-or value;
and if the shift bit number corresponding to the j-th character is smaller than or equal to the shift threshold value, the second preset numerical value does not need to be amplified, and correspondingly, the exclusive-or operation is carried out according to the second preset numerical value (namely the second preset numerical value which is not amplified) and the eighth memory value, so that the exclusive-or value is obtained.
In this embodiment of the present application, when the number of shift bits is greater than the shift threshold, it indicates that the number of numerical bits in the current memory value is greater, and in this case, the second preset numerical value is subjected to amplification processing, which is equivalent to increasing the number of numerical bits of the second preset numerical value, and increasing the complexity of the second preset numerical value. In this way, collisions of data in subsequent computation can be reduced.
As an example of the method of extracting the second feature information at S602, the code implementation logic is as follows:
public String featureFun(CharSequence input, Integer seed ) {
int j= 0;
int shift;// shift bit number
int charHash;// character hash value
Long buffer=0L;// memory value
The// seed is a second preset value
Step I shift processing
for (shift = 0; j<input.length(); ++j)
{
charHash = HashCode.fromString(Character.toString(input.charAt(j))).asInt();
Character hash value of jth character in character string is obtained
if (charHash<128)
{
buffer|=charhash < < shift;// shift left, update memory value
shift+=8;// update shift bit number
}
else if (charHash<2048)
{
buffer |= charHash<<shift;
shift += 16 ;
} else
{
buffer |= charHash<<shift;
shift += 24;
}
Determining whether the shift number exceeds a shift threshold
if (shift>32)
{
seed=inter. Rotateleft (seed, 13);// updating the second preset value
buffer > > = 32;// buffer only holds information of the higher order character
shift- =32;// reduction of shift number
}
}
charhash=inter
seed +=charHash;// exclusive OR of the second predetermined value and the eighth memory value
return hashcode.from int (seed > > > 16). ToString (); }// calculate hash value
Based on the process of storing data by the filter described in fig. 6, the method of the first search in S502 is described below. Referring to fig. 9, a schematic process of the first search provided in the embodiment of the present application is shown. By way of example and not limitation, as shown in fig. 9, the first search may include the steps of:
s901, first feature information in the search information is extracted.
In some embodiments, step S901 may include:
performing shift processing according to the character position of each character in the search information in the character string corresponding to the search information to obtain a processed character position; and extracting first characteristic information according to the processed character position.
As an implementation of the shift process, it includes:
for the ith character in the search information, calculating a hash value of the character position of the 1 st character in a character string corresponding to the search information to obtain a first character hash value of the 1 st character;
performing shift processing on a first character hash value of an ith character according to a shift bit number corresponding to the ith-1 character to obtain a shifted first character hash value of the ith character, wherein i is an integer greater than 1;
Updating a first memory value according to the shifted first character value of the ith character to obtain a second memory value, wherein the first memory value is used for representing the first character hash value of the shifted first i-1 characters, and the second memory value is used for representing the first character hash value of the shifted first i characters;
and updating the shift bit number corresponding to the ith-1 character according to a first preset value to obtain the shift bit number corresponding to the ith character, wherein the first preset value is determined according to a data interval to which a first character hash value of the ith character before shifting belongs.
In some embodiments, the method further comprises:
after the number of shift bits corresponding to the ith character is obtained, if the number of shift bits corresponding to the ith character is larger than a shift threshold value, acquiring data from the second memory value according to the shift threshold value to obtain a third memory value;
and reducing the shift bit number corresponding to the ith character according to the shift threshold value to obtain the updated shift bit number corresponding to the ith character.
In one implementation manner of obtaining the third memory value, the numerical value of the M digits in the second memory value is obtained according to the order from high digit to low digit, so as to obtain the third memory value, wherein the M is determined according to the shift threshold.
In some implementations, extracting the first feature information from the first character position may include:
amplifying the fourth memory value to obtain a processed fourth memory value, wherein the fourth memory value is used for representing the first character hash value of the shifted N characters, and N is the total number of the characters in the search information; and calculating the first characteristic information according to the fourth memory value.
In some implementations, calculating the first characteristic information from the fourth memory value may include: performing exclusive-or operation according to the second preset value and the fourth memory value to obtain an exclusive-or value; the first characteristic information is calculated from the exclusive or value.
In some implementations, the method further comprises:
after the number of shift bits corresponding to the ith character is obtained, if the number of shift bits corresponding to the ith character is larger than a shift threshold, performing amplification processing on the second preset value to obtain the processed second preset value. Correspondingly, performing exclusive-or operation according to the second preset value and the fourth memory value to obtain an exclusive-or value, including: and performing exclusive-or operation according to the current second preset value and the fourth memory value to obtain an exclusive-or value.
The method for extracting the first feature information in the search information in step S901 is the same as the method for extracting the second feature information in the application data in step S602 in the embodiment of fig. 6, and the description in the embodiment of S602 may be referred to specifically, and will not be repeated here.
The second feature information of the application data is extracted by the filter during the data storage process, and the method is the same as the method adopted for extracting the first feature information of the search information during the search process, and parameters involved in the method are the same. For example, the parameters include: in the method for extracting the feature information, a first preset value, a shift threshold value, an M value and a second preset value are related in the process of calculating the feature information.
S902, a first storage position of the first characteristic information in the storage space is acquired.
In some embodiments, S902 may include:
calculating a first hash value of the first characteristic information; and calculating the first storage position according to the first hash value and the storage amount of the storage space.
In one implementation, a remainder operation is performed according to the first hash value and the storage amount of the storage space, so as to obtain a first storage position.
The manner of acquiring the first storage location of the first feature information in the storage space in step S902 is the same as the manner of calculating the second storage location of the second feature information in the storage space in step S603 in the embodiment of fig. 6, and the description in the embodiment of S603 may be referred to, and will not be repeated here.
S903, if the first feature information is stored in the first storage location, the first result indicates that the content corresponding to the search information exists.
If the first feature information is not stored in the first storage location, the first result indicates that the content corresponding to the search information does not exist in S904.
Exemplary, referring to fig. 10, an interactive flow diagram of a search process provided in an embodiment of the present application is shown. As shown in fig. 10, the search process may include the steps of:
s1001, the client transmits the search information to the background server.
S1002, a filter in a background server receives search information and extracts first characteristic information in the search information.
S1003, a filter in the background server acquires a first storage location of the first feature information in the storage space.
Steps S1002-S1003 are the same as steps S901-S902 described above, see for a specific description in the embodiments of steps S901-S902 described above.
S1004, a filter in the background server judges whether the first storage stores the first characteristic information.
S1005, if the first storage location does not store the first characteristic information, the filter in the background server returns preset information to the client.
The preset information is used for indicating that the content corresponding to the search information does not exist.
S1006, if the first storage location stores the first feature information, the filter in the background server sends the search information to the search server through the service module in the background server.
S1007, the search server receives the search information and performs a second search.
And S1008, the search server returns a second result of the second search to the client through a service module in the background server.
In this embodiment of the present application, since the data size of the first feature information is generally smaller than the data size of the search information, the searching efficiency of the filter searching according to the first feature information is higher than that of the searching by the ES server according to the search information. The filter is utilized to perform the first search on the search information, so that most invalid requests can be intercepted, the search efficiency can be improved, the search pressure of the ES server can be reduced, and more efficient search service can be provided for users.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.
The embodiments of the present application also provide a computer readable storage medium storing a computer program, where the computer program can implement the steps in the above-mentioned method embodiments when executed by a processor.
The embodiments of the present application also provide a computer program product enabling a terminal device to carry out the steps of the above-described respective method embodiments when the computer program product is run on the terminal device.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a first device, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunication signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. Finally, it should be noted that: the foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (17)

1. A search method, applied to a background server, the method comprising:
After receiving search information sent by a client, carrying out first search according to the search information to obtain a first result;
if the first result indicates that the content corresponding to the search information does not exist, preset information is returned to the client;
if the first result indicates that the content corresponding to the search information exists, the search information is sent to a search server, so that the search server performs second search according to the search information and returns a second result obtained by the second search;
and after receiving the second result returned by the search server, sending the second result to the client.
2. The method of claim 1, wherein performing a first search based on the search information to obtain a first result comprises:
extracting first characteristic information in the search information;
acquiring a first storage position of the first characteristic information in a storage space;
if the first characteristic information is stored in the first storage position, the first result indicates that the content corresponding to the search information exists;
and if the first characteristic information is not stored in the first storage position, the first result indicates that the content corresponding to the search information does not exist.
3. The method of claim 2, wherein the extracting the first characteristic information in the search information comprises:
carrying out hash processing according to the character position of each character in the search information in the character string corresponding to the search information to obtain a first character hash value;
and extracting the first characteristic information according to the first character hash value.
4. The method of claim 3, wherein the performing hash processing according to the character position of each character in the search information in the character string corresponding to the search information to obtain the first character hash value includes:
for the ith character in the search information, calculating a hash value of the character position of the ith character in a character string corresponding to the search information to obtain a first character hash value of the ith character;
performing shift processing on a first character hash value of an ith character according to a shift bit number corresponding to the ith-1 character to obtain a shifted first character hash value of the ith character, wherein i is an integer greater than 1;
updating a first memory value according to the shifted first character hash value of the ith character to obtain a second memory value, wherein the first memory value is used for representing the shifted first character hash value of the first i-1 characters, and the second memory value is used for representing the shifted first character hash value of the first i characters;
And updating the shift bit number corresponding to the ith-1 character according to a first preset value to obtain the shift bit number corresponding to the ith character, wherein the first preset value is determined according to a data interval to which a first character hash value of the ith character before shifting belongs.
5. The method according to claim 4, wherein the method further comprises:
after the shift bit number corresponding to the ith character is obtained, if the shift bit number corresponding to the ith character is larger than a shift threshold value, acquiring data from the second memory value according to the shift threshold value to obtain a third memory value;
and reducing the shift bit number corresponding to the ith character according to the shift threshold value to obtain the updated shift bit number corresponding to the ith character.
6. The method of claim 5, wherein the obtaining data from the second memory value according to the shift threshold value to obtain a third memory value comprises:
and obtaining the numerical value of M digits in the second memory value according to the order of the digits from high to low to obtain the third memory value, wherein M is determined according to the shift threshold.
7. The method of claim 4, wherein extracting the first feature information from the first character hash value comprises:
Performing amplification processing on a fourth memory value to obtain the processed fourth memory value, wherein the fourth memory value is used for representing a first character hash value of N shifted characters, and N is the total number of the characters in the search information;
and calculating the first characteristic information according to the fourth memory value.
8. The method of claim 7, wherein calculating the first characteristic information from the fourth memory value comprises:
performing exclusive-or operation according to the second preset value and the fourth memory value to obtain an exclusive-or value;
and calculating the first characteristic information according to the exclusive OR value.
9. The method of claim 8, wherein the method further comprises:
after the shift bit number corresponding to the ith character is obtained, if the shift bit number corresponding to the ith character is larger than a shift threshold value, performing amplification processing on the second preset numerical value to obtain the processed second preset numerical value;
performing an exclusive-or operation according to the second preset value and the fourth memory value to obtain an exclusive-or value, including:
and if the shift bit number corresponding to the ith character is larger than a shift threshold value, performing exclusive-or operation according to the processed second preset numerical value and the fourth memory value to obtain an exclusive-or value.
10. The method of claim 2, wherein the obtaining the first storage location of the first characteristic information in the storage space comprises:
calculating a first hash value of the first characteristic information;
and calculating the first storage position according to the first hash value and the storage amount of the storage space.
11. The method of claim 10, wherein the computing the first storage location from the first hash value and the amount of storage of the storage space comprises:
and performing remainder operation according to the first hash value and the storage amount of the storage space to obtain the first storage position.
12. The method of any one of claims 1 to 11, wherein the method further comprises:
receiving application data uploaded by a development terminal;
extracting second characteristic information in the application data;
calculating a second storage position of the second characteristic information in the storage space;
and storing the second characteristic information to the second storage position.
13. The method of claim 12, wherein the storing the second characteristic information to the second storage location comprises:
And if the second storage position is not empty, replacing third characteristic information with the second characteristic information, wherein the third characteristic information is the data stored in the second storage position.
14. The method of claim 13, wherein the second storage location comprises a main location and a plurality of sub locations;
the method further comprises the steps of:
if the main position in the second storage position is not empty, judging whether the sub position in the second storage position is empty or not;
if each sub-position in the second storage position is not empty, judging that the second storage position is not empty;
and if any one of the sub-positions in the second storage position is empty, judging that the second storage position is empty.
15. The method of claim 13, wherein the method further comprises:
adding one to the number of times of replacement after replacing the third feature information with the second feature information;
if the current replacing times reach the preset times, expanding the storage space to obtain the expanded storage space;
and calculating a third storage position of the feature information stored in the storage space before capacity expansion in the storage space after capacity expansion.
16. A background server, the background server comprising:
the filter is used for carrying out first search according to the search information after receiving the search information sent by the client to obtain a first result; if the first result indicates that the content corresponding to the search information does not exist, preset information is returned to the client; the preset information is used for indicating that the content corresponding to the search information does not exist;
the service module is used for sending the search information to a search server when the first result indicates that the content corresponding to the search information exists, so that the search server performs second search according to the search information and returns a second result obtained by the second search; and after receiving the second result returned by the search server, sending the second result to the client.
17. A search system, comprising:
a search server and a background server as claimed in claim 16.
CN202311744805.2A 2023-12-19 2023-12-19 Searching method, background server and searching system Pending CN117453986A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311744805.2A CN117453986A (en) 2023-12-19 2023-12-19 Searching method, background server and searching system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311744805.2A CN117453986A (en) 2023-12-19 2023-12-19 Searching method, background server and searching system

Publications (1)

Publication Number Publication Date
CN117453986A true CN117453986A (en) 2024-01-26

Family

ID=89589438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311744805.2A Pending CN117453986A (en) 2023-12-19 2023-12-19 Searching method, background server and searching system

Country Status (1)

Country Link
CN (1) CN117453986A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010166344A (en) * 2009-01-16 2010-07-29 Nippon Telegr & Teleph Corp <Ntt> Authorization method, authorization system, first server, and program
CN102063446A (en) * 2009-11-13 2011-05-18 中国移动通信集团四川有限公司 Method for creating inverted index and inverted indexing device
CN102479207A (en) * 2010-11-29 2012-05-30 阿里巴巴集团控股有限公司 Information search method, system and device
CN103136342A (en) * 2013-02-04 2013-06-05 百度在线网络技术(北京)有限公司 Searching method, system and searching server of application programs (APP)
CN104751050A (en) * 2015-04-13 2015-07-01 成都睿峰科技有限公司 Client application program management method
CN105516232A (en) * 2014-10-20 2016-04-20 中兴通讯股份有限公司 SAN storage system application software management method, management server, host and system
CN107145502A (en) * 2017-03-20 2017-09-08 中山大学 A kind of method of mass picture storage and search
CN107180042A (en) * 2016-03-09 2017-09-19 阿里巴巴集团控股有限公司 Flow statistical method, the apparatus and system of search engine
CN108268208A (en) * 2016-12-30 2018-07-10 清华大学 A kind of distributed memory file system based on RDMA
CN110134886A (en) * 2019-05-21 2019-08-16 Oppo广东移动通信有限公司 A kind of resource searching result presentation method, device and computer readable storage medium
CN112347355A (en) * 2020-11-11 2021-02-09 广州酷狗计算机科技有限公司 Data processing method, device, server and storage medium
CN113254767A (en) * 2021-05-24 2021-08-13 深圳和锐网络科技有限公司 Big data searching method and device, computer equipment and storage medium
CN115271861A (en) * 2022-07-22 2022-11-01 苏州浪潮智能科技有限公司 Request filtering method, device, equipment and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010166344A (en) * 2009-01-16 2010-07-29 Nippon Telegr & Teleph Corp <Ntt> Authorization method, authorization system, first server, and program
CN102063446A (en) * 2009-11-13 2011-05-18 中国移动通信集团四川有限公司 Method for creating inverted index and inverted indexing device
CN102479207A (en) * 2010-11-29 2012-05-30 阿里巴巴集团控股有限公司 Information search method, system and device
CN103136342A (en) * 2013-02-04 2013-06-05 百度在线网络技术(北京)有限公司 Searching method, system and searching server of application programs (APP)
CN105516232A (en) * 2014-10-20 2016-04-20 中兴通讯股份有限公司 SAN storage system application software management method, management server, host and system
CN104751050A (en) * 2015-04-13 2015-07-01 成都睿峰科技有限公司 Client application program management method
CN107180042A (en) * 2016-03-09 2017-09-19 阿里巴巴集团控股有限公司 Flow statistical method, the apparatus and system of search engine
CN108268208A (en) * 2016-12-30 2018-07-10 清华大学 A kind of distributed memory file system based on RDMA
CN107145502A (en) * 2017-03-20 2017-09-08 中山大学 A kind of method of mass picture storage and search
CN110134886A (en) * 2019-05-21 2019-08-16 Oppo广东移动通信有限公司 A kind of resource searching result presentation method, device and computer readable storage medium
CN112347355A (en) * 2020-11-11 2021-02-09 广州酷狗计算机科技有限公司 Data processing method, device, server and storage medium
CN113254767A (en) * 2021-05-24 2021-08-13 深圳和锐网络科技有限公司 Big data searching method and device, computer equipment and storage medium
CN115271861A (en) * 2022-07-22 2022-11-01 苏州浪潮智能科技有限公司 Request filtering method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马文龙;朱妤晴;蒋德钧;熊劲;张立新;孟潇;包云岗;: "Key-Value型NoSQL本地存储系统研究", 计算机学报, no. 08, 1 June 2017 (2017-06-01) *

Similar Documents

Publication Publication Date Title
Fu et al. Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement
CN106815350B (en) Dynamic ciphertext multi-keyword fuzzy search method in cloud environment
US7454405B2 (en) File management program, file management process, and file management apparatus
US9547706B2 (en) Using colocation hints to facilitate accessing a distributed data storage system
US8762353B2 (en) Elimination of duplicate objects in storage clusters
US9690823B2 (en) Synchronizing copies of an extent in an append-only storage system
AU2013210018B2 (en) Location independent files
CN102741800A (en) Storage system for eliminating duplicated data
WO2017020576A1 (en) Method and apparatus for file compaction in key-value storage system
US9405643B2 (en) Multi-level lookup architecture to facilitate failure recovery
US20100306234A1 (en) Cache synchronization
US20160092125A1 (en) Constructing an index to facilitate accessing a closed extent in an append-only storage system
EP4231167A1 (en) Data storage method and apparatus based on blockchain network
CN110674247A (en) Barrage information intercepting method and device, storage medium and equipment
US20160092124A1 (en) Append-only storage system supporting open and closed extents
CN117453986A (en) Searching method, background server and searching system
CN115292737B (en) Multi-keyword fuzzy search encryption method and system and electronic equipment
CN115587114A (en) System and query method
US20130218851A1 (en) Storage system, data management device, method and program
CN112765169A (en) Data processing method, device, equipment and storage medium
CN110866144B (en) Song retrieval method and device
KR100319761B1 (en) Frame-partitioned parallel processing method for database retrieval using signature file
US20240078249A1 (en) Database synchronization using resizable invertible bloom filters with database snapshots
US11829398B2 (en) Three-dimensional probabilistic data structure
CN116756177B (en) Multi-table index maintenance method and system for mysql database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination