WO2018077138A1 - Data configuration method, index management method, related apparatus and computing device - Google Patents

Data configuration method, index management method, related apparatus and computing device Download PDF

Info

Publication number
WO2018077138A1
WO2018077138A1 PCT/CN2017/107343 CN2017107343W WO2018077138A1 WO 2018077138 A1 WO2018077138 A1 WO 2018077138A1 CN 2017107343 W CN2017107343 W CN 2017107343W WO 2018077138 A1 WO2018077138 A1 WO 2018077138A1
Authority
WO
WIPO (PCT)
Prior art keywords
search
domain
field
data file
data
Prior art date
Application number
PCT/CN2017/107343
Other languages
French (fr)
Chinese (zh)
Inventor
王楠楠
刘若曦
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018077138A1 publication Critical patent/WO2018077138A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Definitions

  • the present application relates to the field of data search, and in particular, to a data configuration method, an index management method, a related device, and a computing device.
  • the full-text search technology is a general-purpose search technology at present, which is used to search for the required information according to the search field and the like as a search portal.
  • search portal a search portal
  • Solr is a popular enterprise search system, and its functions include full-text search, hit mark, faceted search, dynamic clustering, database integration, and rich text. (such as Word, PDF) processing. This application only uses Solr as an example to describe the search system.
  • the search system generally saves information by using a data file (English: document, abbreviation: doc) as a basic unit.
  • the data in the doc is generally saved as a domain (English: field) + domain value structure, where the domain is used to indicate the type of the corresponding domain value, and the domain value is used to record the specific value of the corresponding domain.
  • the domain of the doc in the search system is defined by the field configuration (English: schema) configuration file.
  • schema English: schema
  • the search system holds a huge amount of doc data, and the huge amount of doc data covers a larger number of domains.
  • the current search system relies on the technicians manually setting multiple domains in the schema configuration file. Since the technician cannot know in advance which domain will be searched multiple times by the user, the set domain may be very likely in practical applications. Less search by users. As a result, the use frequency of the domain search function is not high, and the speed and efficiency improvement by the domain search is limited, and the search performance of the search system does not meet the user requirements.
  • the present application provides a data configuration method, an index management method, a related device, and a computing device for improving search performance of a search system.
  • the first aspect of the present application provides a data configuration method suitable for a search system.
  • the search system includes multiple data files, and each data file includes one or more domains and domain values corresponding to the one or more domains.
  • the domain is used to indicate the type of the corresponding domain value.
  • the domain value is used to record the specific value of the corresponding domain.
  • the search system also includes a schema configuration file for defining a domain of data files in the search system.
  • the data configuration method includes: receiving a plurality of search requests issued by a user, wherein each of the plurality of search requests includes one or more search fields for requesting to search for a data file that includes the search field.
  • the domain in the schema configuration file It is not a domain set by the technician, but a hotspot domain determined by the search system according to the search request issued by the user immediately. Since the hotspot domain is a domain with a high search frequency in the search request, the hotspot domain has a high probability to be searched again by the user in the subsequent time.
  • the multiple search requests may carry the domain described by the search field, where the response message of the multiple search requests carries the domain described by the search field.
  • the search system determines, according to each search request and/or a response message corresponding to each search request, a domain to which each search field in the search field included in the plurality of search requests belongs.
  • the search system extracts a domain field carried in the search request and/or the response message according to the interval identifier in each search request and/or the response message corresponding to each search request, and further determines the multiple search request locations.
  • the domain to which each search field in the search field is included.
  • the search system determines the domain in which each search field belongs, including the previous one or more domains with the most search fields as the hotspot domain.
  • the search system determines a domain value format of each hotspot domain according to a search field included in each hotspot domain in the one or more hotspot domains.
  • the search system determines, in a format that the search fields included in each of the one or more hotspot domains are consistently matched, a domain value format for searching each hotspot domain.
  • the search system counts the search data corresponding to each domain in the domain defined by the schema configuration file every domain value period.
  • the search data includes one or more of the searched times, the searched frequency, and the search hit ratio of the corresponding domain in the current preset period.
  • the search system deletes one or more domains in the schema configuration file whose search data is lower than the threshold to implement dynamic elimination of non-hotspot domains in the schema configuration file.
  • the second aspect of the present application provides an index management method suitable for a search system.
  • the search system includes multiple data files, and each data file includes one or more domains and domain values corresponding to the one or more domains.
  • the domain is used to indicate the type of the corresponding domain value.
  • the domain value is used to record the specific value of the corresponding domain.
  • the search system further includes a default index corresponding to the plurality of data files, where each default index includes a storage location of the corresponding data file in the search system, and the method includes: receiving a plurality of search requests sent by the user, the multiple In the search request, each search request includes one or more search fields for requesting to find a data file containing the search field.
  • the domain value format of each domain in the domain to which each search field belongs is determined according to the search field included in the domain to which each search field belongs.
  • a data file containing a field conforming to the field value format is determined in the search system as the first data file.
  • a new index corresponding to each second data file is generated, and the new index includes a storage location of the corresponding second data file in the search system.
  • the search field in the search request has a high probability of hitting the new index generated by the index management device. Since the data volume of the newly created index is much smaller than the data volume of the default index, searching for the new index can greatly save the workload of the search system and improve the search speed and efficiency of the search system, and improve the search index. Search system search performance.
  • the multiple search requests may carry the domain described by the search field, and the multiple search requests are ringing
  • the message should carry the domain described in the search field.
  • the search system determines, according to each search request and/or a response message corresponding to each search request, a domain to which each search field in the search field included in the plurality of search requests belongs.
  • the search system extracts a domain field carried in the search request and/or the response message according to the interval identifier in each search request and/or the response message corresponding to each search request, and further determines the multiple search request locations.
  • the domain to which each search field in the search field is included.
  • the search system determines a domain value format of each domain in the domain to which each search field belongs according to the search field included in each domain in the domain to which each search field belongs.
  • the search system receives the target search request sent by the user, where the target search request includes a target search field, and is used to request to find a data file that includes the target search field.
  • the search system finds a new index corresponding to the target search field. If the new index corresponding to the target search field is found, the data file containing the target search field is obtained according to the newly created index.
  • the search system does not find the new index corresponding to the target search field, the default index corresponding to the target search field is searched to obtain a data file that includes the target search field.
  • the search system when a new data file is imported into the search system, if the new data file includes a field that conforms to the domain value format, the search system generates a second data file corresponding to the new data file.
  • the second data file corresponding to the generated new data file includes: a field in the new data file that conforms to the domain value format and a domain to which the field belongs.
  • the search system creates a corresponding new index for the second data file corresponding to the new data file, including the storage location of the second data file corresponding to the new data file in the search system.
  • the search system counts the search parameters of each new index every preset period.
  • the search parameter of each new index includes one or more of the number of searched times, the searched frequency, and the search hit ratio of the domain corresponding to the new index in the current preset period.
  • the search system deletes one or more new indexes whose search parameters are below the threshold to implement dynamic retirement of non-hotspot indexes in the newly created index.
  • the third aspect of the present application provides a data configuration apparatus suitable for a search system.
  • the search system includes multiple data files, and each data file includes one or more domains and domain values corresponding to the one or more domains.
  • the domain is used to indicate the type of the corresponding domain value.
  • the domain value is used to record the specific value of the corresponding domain.
  • the search system also includes a schema configuration file for defining a domain of data files in the search system.
  • the data configuration apparatus includes: an information receiving module, configured to receive a plurality of search requests issued by the user, each of the plurality of search requests includes a search field, configured to request to find a data file that includes the search field.
  • the domain determining module is configured to determine a domain to which each search field belongs in the search field included in the plurality of search requests.
  • a hotspot determination module is configured to determine one or more hotspot domains in a domain to which each search field belongs.
  • the configuration modification module is configured to add the one or more hotspot domains to the schema configuration file, and update the data files in the search system according to the schema configuration file to which the one or more hotspot domains are added.
  • the domain in the schema configuration file of the present application is not a domain set by a technician, but is a hotspot domain determined by the data configuration device according to a search request sent by the user immediately. Since the hotspot domain is a domain with a higher search frequency among the N search requests, the hotspot domain has a high probability to be searched again by the user in the subsequent time.
  • the domain determining module is specifically configured to: according to each search request, and/or a response message of the search system for each search request, determine a domain to which each search field in the search field included in the multiple search requests belongs .
  • the domain determining module is specifically configured to: determine, according to each search request, and/or a search box in the response message of each search request, each search in the search field included in the multiple search requests.
  • the hotspot determining module is specifically configured to: determine, in the domain to which each search field belongs, the first one or more domains including the search field to be the hotspot domain.
  • the data configuration apparatus further includes a format determining module, configured to determine a domain value format of each hotspot domain according to a search field included in each hotspot domain in the one or more hotspot domains.
  • a format determining module configured to determine a domain value format of each hotspot domain according to a search field included in each hotspot domain in the one or more hotspot domains.
  • the format determining module is specifically configured to: determine, in a format that the search fields included in each hotspot domain in the one or more hotspot domains are consistent, a domain value format of each hotspot domain.
  • the configuration modification module is further configured to: periodically search, in a preset period, search data of each domain in the domain defined by the schema configuration file, where the search data includes a domain defined by the schema configuration file in a current preset period.
  • search data includes a domain defined by the schema configuration file in a current preset period.
  • One or more of the number of searches, the frequency of search, and the search hit rate delete one or more domains whose search data is below the threshold in the schema configuration file.
  • the fourth aspect of the present application provides an index management apparatus suitable for a search system.
  • the search system includes multiple data files, and each data file includes one or more domains and domain values corresponding to the one or more domains.
  • the domain is used to indicate the type of the corresponding domain value.
  • the domain value is used to record the specific value of the corresponding domain.
  • the search system further includes a default index corresponding to the plurality of data files, and each default index includes a storage location of the corresponding data file in the search system.
  • the index management apparatus includes: a receiving information module, configured to receive a plurality of search requests issued by the user, where each of the plurality of search requests includes a search field for requesting to search for a data file that includes the search field.
  • the determining domain module is configured to determine a domain to which each search field belongs in the search field included in the plurality of search requests.
  • the determining format module is configured to determine a domain value format of each domain in a domain to which each search field belongs according to a search field included in each domain in a domain to which each search field belongs.
  • a file determining module configured to determine, as the first data file, a data file in the search system that includes a field that conforms to the domain value format determined in the determining format module.
  • a file generating module configured to generate a second data file corresponding to each first data file, where each second data file includes a field corresponding to the field value format included in the corresponding first data file, and the domain conforms to the domain The domain to which the field of the value format belongs.
  • the index management module is configured to generate a new index corresponding to each second data file, where the new index includes a storage location of the corresponding second data file in the search system.
  • the determining domain module is specifically configured to: according to each search request, and/or a response message of the search system for each search request, determine a domain to which each search field in the search field included in the multiple search requests belongs .
  • the determining domain module is specifically configured to: determine, according to each search request, and/or a search box in the response message of each search request, each search in the search field included in the multiple search requests.
  • the determining format module is specifically configured to determine, in a format that the search fields included in each domain in each domain of each search field are consistent, a domain value of each domain in the domain to which each search field belongs. format.
  • the receiving information module is further configured to: receive a target search request sent by the user, and the target search request A target search field is included for requesting to find a data file containing the target search field.
  • the index management device further includes a file search module, configured to: search for a new index corresponding to the target search field; if the new index corresponding to the target search field is found, obtain the target search field according to the new index corresponding to the target search field. Data file.
  • the file search module is further configured to: if the new index corresponding to the target search field is not found, look for a default index corresponding to the target search field.
  • the file generating module is further configured to: when the new data file is imported in the search system, if the new data file includes a field that determines a format corresponding to the domain value determined by the format module, the new data file is generated.
  • the second data file corresponding to the new data file includes: a field in the new data file that conforms to the domain value format, and a field in the new data file that conforms to the domain value format Domain.
  • the index management module is further configured to: generate a new index corresponding to the second data file corresponding to the new data file, where the new index corresponding to the second data file corresponding to the new data file includes: a second corresponding to the new data file The location where the data file is saved in the search system.
  • the index management module is further configured to: collect, according to a preset period, a search parameter of each new index, where the search parameter includes the number of times of searching, the frequency of search, and the search for each new index in the current preset period. One or more of the hit ratios; delete one or more new indexes whose search parameters are below the threshold.
  • a fifth aspect of the present application provides a computing device, including a processor, a memory, and a communication interface, wherein the processor is configured to execute the data configuration method provided by the first aspect of the present application by calling program code stored in a memory. .
  • a sixth aspect of the present application provides a computing device, including a processor, a memory, and a communication interface, wherein the processor is configured to execute the index management method provided by the second aspect of the present application by calling program code stored in a memory. .
  • the seventh aspect of the present application provides a computer program product, which may be a software installation package, and when the software installation package is executed by the computing device, perform the first aspect or any implementation manner of the first aspect of the present application.
  • the eighth aspect of the present application provides a computer program product, which may be a software installation package, when the software installation package is executed by the computing device, performing the second aspect or the second aspect of the present application.
  • the index management method provided.
  • the ninth aspect of the present application provides a storage medium, where the program code is stored, and when the program code is run by the computing device, the data configuration method provided by the first aspect of the present application is executed.
  • the storage medium includes, but is not limited to, a flash memory, a hard disk (English: hard disk drive, HDD), or a solid state drive (English: solid state drive, abbreviated as SSD).
  • a tenth aspect of the present application provides a storage medium, where the program code is stored, and when the program code is run by the computing device, the index management method provided by the second aspect of the present application is executed.
  • the storage medium includes, but is not limited to, a flash memory, an HDD, or an SSD.
  • Figure 1 (a) is a schematic diagram of an implementation of the search system
  • Figure 1 (b) is a schematic diagram of another implementation of the search system
  • FIG. 2 is a structural diagram of an embodiment of a computing device provided by the present application.
  • FIG. 3 is a flowchart of an embodiment of a data configuration method provided by the present application.
  • FIG. 4 is a flowchart of an embodiment of an index management method provided by the present application.
  • FIG. 5 is a structural diagram of an embodiment of a data configuration apparatus provided by the present application.
  • FIG. 6 is a structural diagram of an embodiment of an index management apparatus provided by the present application.
  • the present application provides a data configuration method, an index management method, a related device, and a computing device, which will be separately described below.
  • the search system is deployed on the search device and interacts with the user through the search device, see Figure 1(a).
  • the search device 100 mainly includes a communication unit 101, a processing unit 102, and a storage unit 103.
  • the storage unit 103 is configured to store data to be saved by the search system, for example, a data file, an index, and the like for storing the search system.
  • the communication unit 101 is configured to search for information interaction between the system and the user, for example, to receive a search request issued by the user, and reply the user with a response message of the search request.
  • the processing unit 102 is configured to perform a data processing operation, for example, for performing a search operation according to a search request of the user, and generating a response message of the search request according to the search result.
  • Figure 1(a) shows a scenario in which a search system is deployed on a single search device.
  • the search system can also be deployed on multiple search devices, see Figure 1(b).
  • the search system shown in FIG. 1(b) includes a plurality of search devices 100 as shown in FIG. 1(a), and data to be saved by the search system is distributed and stored in a storage unit in each search device 100, each Information exchange between the station search devices 100 through the communication unit 101 is performed.
  • the processing unit of the plurality of search devices 100 may perform a distributed search operation according to the search request of the user, and feed back the search result to generate a response message to the user.
  • the search device in FIG. 1(a) and FIG. 1(b) can be implemented by the computing device 200 shown in FIG. 2, and its organizational structure includes a processor 201, a memory 202, a communication interface 203, and a bus 204.
  • the processor 201 can be an implementation of the processing unit 102
  • the memory can be an implementation of the storage unit 103
  • the communication interface 203 can be an implementation of the communication unit 101.
  • the processor 201, the memory 202, and the communication interface 203 can implement communication connection with each other through the bus 204, and can also implement communication by other means such as wireless transmission.
  • the memory 202 may include a volatile memory (English: volatile memory), such as random-access memory (abbreviation: RAM); the memory may also include non-volatile memory (English: non-volatile memory) For example, read-only memory (English: read-only memory, abbreviated: ROM), flash memory (English: flash memory), HDD or SSD; the memory 202 may also include a combination of the above types of memory.
  • volatile memory such as random-access memory (abbreviation: RAM)
  • non-volatile memory English: non-volatile memory
  • read-only memory English: read-only memory, abbreviated: ROM
  • flash memory English: flash memory
  • HDD or SSD the memory 202 may also include a combination of the above types of memory.
  • the computing device 200 is running, the memory 202 loads data files, indexes, etc. therein for use by the processor 201.
  • the program code of the software may be saved in the memory 202 and executed by the processor 201.
  • the processor 201 can be a central processing unit (English: central processing unit, abbreviation: CPU), or can be an application specific integrated circuit (English: application specific integrated circuits, ASIC), field programmable gate array (English: field-programmable gate Array, abbreviation: FPGA) Such as components with data processing functions to achieve. While the computing device 200 is running, the processor 201 calls the program code in the memory 202 to perform a data processing operation.
  • CPU central processing unit
  • ASIC application specific integrated circuits
  • FPGA field programmable gate array
  • the communication interface 203 serves as an interaction interface between the user and the search system, and is used to deliver the search request issued by the user to the processor 201, and deliver the response message generated by the processor 201 to the user.
  • the search system generally uses doc as the basic unit to store information. After the search system is running, a corresponding default index is created for the saved doc, and the default index records the location information of the doc stored in the search system.
  • the user sends a search request when searching, and the search request includes a search field for requesting the search system to find the doc containing the search field.
  • the search system can obtain the doc containing the search field from the location recorded in the matching default index by looking up the default index that matches the search field.
  • the search system sends the doc back to the user in the response message.
  • the data in doc is generally saved as a structural form of domain + domain values.
  • Each doc can include a domain and its corresponding domain value, and can also include multiple domains and their corresponding domain values.
  • the domain is used to indicate the type of the corresponding domain value, and the domain value is used to record the specific value of the corresponding domain.
  • the structure of the doc in the search system is managed by a field configuration (English: schema) configuration file.
  • the schema configuration file is a file in xml format, which is generally stored in the conf directory and is used to define the structure of the doc in the search system. Specifically, the fields in the doc are also defined by the schema configuration file.
  • the content field is defined in the schema configuration file: the content field is used to indicate the type of "content”. If doc 1: (“hostname: node 1, IP: 192.199.0.1”) is saved in the search system, doc 1 includes the content field, and the domain value of the content field of doc 1 is "hostname: node 1, IP: 192.199 .0.1”.
  • doc 2 (“hostname: node 1, IP: 192.199.0.1”, “hostname: node 1", “IP: 192.199.0.1”), then doc 2 includes the content domain, hostname domain, and IP.
  • the domain value of the content field is "hostname:node 1, IP:192.199.0.1”
  • the domain value of the hostname domain is node 1 (that is, the host name is node 1)
  • the domain value of the IP domain is 192.199.0.1 ( That is, the IP address is 192.199.0.1).
  • the search system can implement the domain search function. Search by domain is often faster and more efficient than direct search. For example, suppose the user sends a search request to "IP: 192.199.0.1". If the schema configuration file defines an IP domain in addition to the content field, the search system directly searches for a doc including an IP domain and an IP domain with a domain value of 192.199.0.1. If the schema configuration file only defines the content field and no IP domain is defined, the search system needs to search whether the domain value of the content field of each doc includes the "IP:192.199.0.1" field. Obviously, defining multiple domains through the schema configuration file enables the search system to perform domain-by-domain search, which can reduce the number of docs searched and the length of search fields, thereby improving the speed and efficiency of the search operation.
  • the domain of the schema configuration file is generally set by the technician.
  • the artificially set domain does not necessarily be close to the user's immediate need for the search system, so the speed and efficiency of the domain search cannot be improved.
  • the present application provides a data configuration method, and the search device 100 shown in FIG. 1(a) and FIG. 1(b) and the computing device 200 shown in FIG. 2 execute the method at runtime, which is basically See Figure 3 for the process:
  • N search requests sent by a user where the N search requests include M search fields.
  • the search system receives N search requests sent by the user, where N is a positive integer greater than 1.
  • the N search request is not specifically limited.
  • the N search requests may be search requests received by the search system within a preset time period.
  • the N search requests may be The latest N search requests sent by the user received by the search system.
  • Each of the N search requests includes a search field for searching for a doc containing the search field.
  • each search request may include one search field, and may also include multiple search fields.
  • a search request includes multiple search fields
  • the multiple search fields may be connected by "AND”, “OR” or other logical connection words, and are used to indicate “and”, “or”, etc., and are not used in this application. limited.
  • the search fields included in different search requests may be the same or different.
  • the M search requests include a total of M different search fields as an example, where M is a positive integer.
  • the search system determines the domain to which each of the M search fields belongs, that is, determines the type of each of the M search fields.
  • the search field may already carry the domain to which it belongs.
  • the search system can directly determine the domain to which the search field belongs based on the search field in the search request. For example: If the search request is "IP: 192.199.0.1", then it is obvious that IP is the domain to which the search field belongs.
  • the search field may not carry the domain it belongs to.
  • the search system cannot directly determine the domain to which the search field belongs based on the search field in the search request.
  • the search system can determine the domain to which the search field belongs according to the response. For example, if the search field sent by the user is only "192.199.0.1", and the corresponding message of the search request carries doc 1: ("hostname: node 1, IP: 192.199.0.1"), it is obvious that IP is the domain to which the search field belongs.
  • the search system may determine the domain to which the search field belongs based on the interval in the search request or response message. For example, if the search request is "IP: 192.199.0.1", the search system can determine that the field IP in front of the interval ":" in the search request is the domain to which the search field belongs. For another example, if the search request is "192.199.0.1” and the response message carries doc 1: ("hostname: node 1, IP: 192.199.0.1”), the search system can determine that the response message is located at 192.199. The field IP preceded by 0.1 adjacent spacer ":” and located after the spacer ",” is the domain to which the search field belongs.
  • the search system determines the domain to which the search field belongs according to the format of the doc. This application only uses the intervals ":" and "," as an example. In some search systems, it may also be based on other The spacer is used to determine the domain to which the search field belongs, which is not limited in this application.
  • the search system may determine, in the domain to which the M search fields belong, the first K domains including the maximum number of search fields as the hotspot domain.
  • the search system receives N search requests sent by the user, and determines a domain to which the M search fields in the N search requests belong, and then in the domain to which the M search fields belong. Determine K hotspot domains and add the K hotspots to the schema configuration file. In this way, after the search system subsequently receives the search field belonging to the hotspot domain, the domain search can be directly performed.
  • the domain in the schema configuration file is not a domain set by a technician, but is a hotspot domain determined by the search system according to a search request sent by the user immediately.
  • the hotspot domain is a domain with a higher search frequency among the N search requests, the hotspot domain has a high probability to be searched again by the user in the subsequent time.
  • the frequency of searching by domain can be increased, and the speed and efficiency of the domain search can be fully utilized to further improve the search performance of the search system.
  • the search system needs to update the system according to the schema configuration file to which the hotspot domain is added.
  • the saved doc and the default index of the doc After the doc and its default index update are complete, the search system can perform subsequent search operations.
  • the search system may further perform the steps:
  • the domain value format of the K hotspot domains is determined according to the search fields included in each of the K hotspot domains.
  • the domain value format of the hotspot domain indicates the format in which the domain value of the hotspot domain conforms.
  • the search system may determine a format in which the search fields included in each hotspot domain conform to each other as the domain value format of each hotspot domain. For example, if the hotspot domain is "IP", the IP domain includes two search fields: "192.199.0.1” and "192.199.0.2". The search system then determines "192.199.0.*" as the domain value format of the IP domain, where * indicates a fuzzy match.
  • the field value format may be in the form of a regular expression or other forms, which is not limited in this application.
  • step 305 may also precede step 304.
  • the search system receives the first search request newly sent by the user, where the first search request includes the first search field. If the first search field does not carry the domain to which it belongs, the search system determines whether the first search field conforms to the format of the hotspot domain. If the first search field conforms to the format of the first domain in the hotspot domain, the first search field is considered to belong to the first domain, and the search system may perform a domain-by-domain search operation.
  • the search system may count the search data corresponding to each domain in the domain defined by the schema configuration file every preset period.
  • the search data includes one or more of the searched times, the searched frequency, and the search hit ratio of the corresponding domain in the current preset period.
  • Search system deletes search data in schema configuration file One or more domains below the threshold to implement dynamic retirement of non-hotspot domains in the schema configuration file.
  • the data configuration method shown in FIG. 3 achieves performance improvement of the search system by dynamically modifying the schema configuration file according to the user's search request.
  • the following describes an index management method to achieve the same purpose of improving the performance of the search system.
  • the search device 100 shown in FIG. 1(a) and FIG. 1(b) and the computing device 200 shown in FIG. 2 execute the method at runtime.
  • the basic flow is shown in FIG. 4:
  • the search system receives N search requests sent by the user, where N is a positive integer.
  • the N search request is not specifically limited.
  • the N search requests may be search requests received by the search system within a preset time period.
  • the N search requests may be The latest N search requests sent by the user received by the search system.
  • Each of the N search requests includes a search field for searching for a doc containing the search field.
  • each search request may include one search field, and may also include multiple search fields.
  • a plurality of search fields in a search request may be connected by "AND”, “OR” or other logical connection words, and are used to indicate "and", "or”, etc., which are not limited in the present application.
  • the search fields included in different search requests may be the same or different.
  • the M search requests include a total of M different search fields as an example, where M is a positive integer.
  • the search system determines the domain to which each of the M search fields belongs, that is, determines the type of each of the M search fields.
  • the search field may already carry the domain to which it belongs.
  • the search system can directly determine the domain to which the search field belongs based on the search field in the search request. For example: If the search request is "IP: 192.199.0.1", then it is obvious that IP is the domain to which the search field belongs.
  • the search field may not carry the domain it belongs to.
  • the search system cannot directly determine the domain to which the search field belongs based on the search field in the search request.
  • the search system can determine the domain to which the search field belongs according to the response. For example, if the search field sent by the user is only "192.199.0.1", and the corresponding message of the search request carries doc 1: ("hostname: node 1, IP: 192.199.0.1"), it is obvious that IP is the domain to which the search field belongs.
  • the search system may determine the domain to which the search field belongs based on the interval in the search request or response message. For example, if the search request is "IP: 192.199.0.1", the field IP in front of the interval ":” in the search request is the domain to which the search field belongs. For another example, if the search request is "192.199.0.1” and the response message carries doc 1: ("hostname: node 1, IP: 192.199.0.1"), the response message is located adjacent to 192.199.0.1. The field IP before the separator ":” and located after the spacer ",” is the domain to which the search field belongs. The search system determines the domain to which the search field belongs according to the format of the doc. This application only uses the intervals ":” and "," as an example. In some search systems, it may also be based on other The spacer is used to determine the domain to which the search field belongs, which is not limited in this application.
  • different search fields may have the same or different domains.
  • M search fields may belong to L different domains as an example for description.
  • the search system determines the domain value format of each domain according to the search fields included in each of the L domains. It can be considered that fields that conform to the field value format of the L domains have a higher probability of being searched in subsequent runs of the search system.
  • the search system may determine, in a format that the search fields included in each of the L domains are consistently matched, a domain value format of each domain. For example, if there are domains "IP" in the L domains, the IP domain includes two search fields: "192.199.0.1” and "192.199.0.2". The search system then determines "192.199.0.*" as the domain value format of the IP domain, where * indicates a fuzzy match.
  • the search system determines the domain value format of each domain in the L domains
  • the first doc is determined in the doc saved by the search system, where the first doc is: including the domain value corresponding to any one of the L domains.
  • the doc of the formatted field It can be understood that the number of the first docs may be one or more.
  • the search system has determined that the domain value format of the IP domain is "192.199.0.*”. Then, since doc 1: ("hostname: node 1, IP: 192.199.0.1”) includes the field "192.199.0.1", and "192.199.0.1” conforms to the field value format "192.199.0.*", the doc is 1 times Determined as the first doc.
  • the search system determines the first doc.
  • the search system may use the domain value format of the L domains as a search field, and directly search the default index of the search system to obtain the first doc.
  • a corresponding second doc is generated according to the field in the first doc that conforms to the field value format.
  • the second doc includes: a field corresponding to the field value format in the corresponding first doc, and a field of the field.
  • the search system has determined that the domain value format of the IP domain is "192.199.0.*", and the first doc is: ("hostname: node 1, IP: 192.199.0.1"), where the first doc includes fields. "192.199.0.1” conforms to the field value format "192.199.0.*”, and the search system generates a corresponding second doc according to the first doc: ("IP: 192.199.0.1").
  • the number of the first docs may be one or more
  • the number of the second docs may also be one or more.
  • the second doc is generated according to part of the doc (ie, the first doc) in the original doc of the search system
  • the number of the second doc is much smaller than the number of original docs of the search system;
  • the second doc includes only one field and one field value field, and its length is also smaller than the original doc of most search systems.
  • the data volume of the new index of the second doc is much smaller than the data volume of the default index of the original doc of the search system.
  • the search system receives N search requests sent by the user; determines L domains to which the M search fields belong to the N search requests; and determines a domain value format of the L domains; A first doc containing a field conforming to the field value format is determined in the doc saved by the search system; a second doc is generated according to the first doc; and a new index is established for the second doc. Since the second doc conforms to the domain value format of the L domains, There is a greater probability of being searched for in subsequent runs of the search system. Thus, when the search system receives the newly issued search request from the user, the search field in the search request has a high probability of hitting the new index. Since the data volume of the newly created index is much smaller than the data volume of the default index, searching for the new index can greatly save the workload of the search system and improve the search speed and efficiency of the search system, and improve the search index. Search system search performance.
  • the search system receives the target search request newly sent by the user, where the target search request includes a target search field for requesting to search for a doc including the target search field.
  • the search system searches for a new index that matches the target search field. If a new index matching the target search field is found, the search system obtains the second doc corresponding to the newly created index, and carries the acquired second doc. The response message is fed back to the user. If a new index that matches the target search field is not found, the search system looks for a default index that matches the target search field.
  • the search system when a new doc is imported into the search system, if the new doc includes a field that conforms to the field value format determined in step 403, the search system generates a second doc corresponding to the new doc.
  • the specific method for generating the second doc corresponding to the new doc is similar to step 405, and is not described here.
  • the search system After generating the second doc corresponding to the new doc, the search system creates a new index for the second doc corresponding to the new doc, and generates a default index for the new doc.
  • the search system may count the search parameters of each new index every preset period.
  • the search parameter of each new index includes one or more of the searched times, the searched frequency, and the search hit ratio of the domain corresponding to the new index in the current preset period.
  • the search system deletes one or more new indexes whose search parameters are below the threshold to implement dynamic retirement of non-hotspot indexes in the newly created index.
  • the data configuration device for implementing the data configuration method shown in FIG. 3 is introduced.
  • the basic structure of the data configuration device includes:
  • the information receiving module 501 is configured to receive a plurality of search requests sent by the user, where each of the plurality of search requests includes a search field for requesting to search for a data file that includes the search field.
  • the domain determining module 502 is configured to determine a domain to which each search field belongs in the search field included in the plurality of search requests.
  • the hotspot determining module 503 is configured to determine one or more hotspot domains in a domain to which each search field belongs.
  • the configuration modification module 504 is configured to add the one or more hotspot domains to the schema configuration file, and update the data files in the search system according to the schema configuration file to which the one or more hotspot domains are added.
  • the information receiving module 501 receives a plurality of search requests sent by the user, and the domain determining module 502 determines a domain to which each search field in the search field included in the plurality of search requests belongs.
  • the determining module 503 determines the hotspot domain in the domain to which the search field belongs, and the configuration modification module 504 adds the determined hotspot domain to the schema configuration file.
  • the domain in the schema configuration file is not a domain set by a technician, but is a hotspot domain determined by the data configuration device according to a search request sent by the user immediately.
  • the hotspot domain is a domain with a high search frequency among the N search requests, so the hotspot domain has a high probability to be searched again by the user in the subsequent time.
  • the frequency of searching by domain can be increased, and the speed and efficiency of the domain search can be fully utilized to further improve the search performance of the search system.
  • the domain determining module 502 is specifically configured to: according to each search request, and/or a response message of the search system for each search request, determine, by each search field included in the search field included in the multiple search requests. area.
  • the domain determining module 502 is specifically configured to: determine, according to each search request, and/or a slot in the response message of the search request for each search request, each of the search fields included in the multiple search requests. The domain to which the search field belongs.
  • the hotspot determining module 503 is specifically configured to: determine, in the domain to which each search field belongs, the first one or more domains including the search field to be the hotspot domain.
  • the data configuration apparatus further includes a format determining module 505, configured to determine a domain value format of each hotspot domain according to a search field included in each hotspot domain in the one or more hotspot domains.
  • a format determining module 505 configured to determine a domain value format of each hotspot domain according to a search field included in each hotspot domain in the one or more hotspot domains.
  • the format determining module 505 is specifically configured to determine, in a format that the search fields included in each hotspot domain in the one or more hotspot domains are consistently matched, a domain value format of each hotspot domain.
  • the configuration modification module 504 is further configured to: in each preset period, count the search data of each domain in the domain defined by the schema configuration file, where the search data includes a domain defined by the schema configuration file in the current preset period. One or more of the number of searches, the frequency of search, and the search hit rate; delete one or more domains whose search data is below the threshold in the schema configuration file.
  • the basic structure of the index management apparatus includes:
  • the receiving information module 601 is configured to receive a plurality of search requests sent by the user, where each of the plurality of search requests includes a search field, configured to request to find a data file that includes the search field;
  • a determining domain module 602 configured to determine a domain to which each search field belongs in the search field included in the plurality of search requests;
  • a determining format module 603, configured to determine a domain value format of each domain in a domain to which each search field belongs according to a search field included in each domain in a domain to which each search field belongs;
  • a file determining module 604 configured to determine, in the search system, a data file that includes a field corresponding to the field value format determined in the determining format module 603, as the first data file;
  • a file generating module 605 configured to generate a second data file corresponding to each first data file, where each second data file includes a field corresponding to the field value format included in the corresponding first data file, and the The domain to which the field in the field value format belongs;
  • the index management module 606 is configured to generate a new index corresponding to each second data file, where the new index includes a storage location of the corresponding second data file in the search system.
  • the receiving information module 601 receives a plurality of search requests sent by the user, and the determining domain module 602 determines a domain to which each search field in the search field included in the plurality of search requests belongs, and determines a format.
  • Module 603 determines the domain value format for these fields, and file determination module 604 will include in the search system
  • the data file of the field conforming to the field value format is determined as the first data file
  • the file generation module 605 generates the second data file corresponding to the first data file
  • the index management module 606 generates a new index of the second data file.
  • the search system receives the newly issued search request from the user, the search field in the search request has a high probability of hitting the new index. Since the data volume of the newly created index is much smaller than the data volume of the default index, searching for the new index can greatly save the workload of the search system and improve the search speed and efficiency of the search system, and improve the search index. Search system search performance.
  • the determining domain module 602 is specifically configured to: according to each search request, and/or a response message of the search system for each search request, determine, by each search field included in the search field included in the multiple search requests. area.
  • the determining domain module 602 is specifically configured to: determine, according to each search request, and/or a slot in the response message of the search request for each search request, each of the search fields included in the multiple search requests. The domain to which the search field belongs.
  • the determining format module 603 is specifically configured to determine, according to a format that the search fields included in each domain in each domain in which the search field belongs, a domain of each domain in each domain in which the search field belongs. Value format.
  • the receiving information module 601 is further configured to: receive a target search request that is sent by the user, where the target search request includes a target search field, and is used to request to find a data file that includes the target search field.
  • the index management device further includes a file search module 607, configured to: find a new index corresponding to the target search field; if the new index corresponding to the target search field is found, obtain the target search according to the new index corresponding to the target search field.
  • the data file for the field is not limited to:
  • the file search module 607 is further configured to: if the new index corresponding to the target search field is not found, look for a default index corresponding to the target search field.
  • the file generating module 605 is further configured to: when the new data file is imported in the search system, if the new data file includes a field that determines the format of the field value determined by the format module 603, the new data is generated. a second data file corresponding to the data file, where the second data file corresponding to the new data file includes: a field in the new data file that conforms to the format of the field value, and a format corresponding to the domain value in the new data file The domain to which the field belongs.
  • the index management module 606 is further configured to: generate a new index corresponding to the second data file corresponding to the new data file, where the new index corresponding to the second data file corresponding to the new data file includes: the corresponding corresponding to the new data file The location where the two data files are saved in the search system.
  • the index management module 606 is further configured to: collect, according to a preset period, a search parameter of each new index, where the search parameter includes the number of times of searching, the frequency to be searched, and the search frequency of each new index in the current preset period. Search for one or more of the hit ratios; delete one or more new indexes whose search parameters are below the threshold.
  • each module in the embodiments shown in FIG. 5 and FIG. 6 may be a software module, and is stored in the memory 202 of the computing device shown in FIG. 2 in the form of program code. And executed by the processor 201.
  • the modules in the embodiments shown in FIG. 5 and FIG. 6 may be hardware modules, for example, may be a CPU, a hardware chip, or a combination of a CPU and a hardware chip, as shown in FIG. 2 .
  • the processor 201 of the illustrated computing device performs the methods provided herein.
  • the present application also provides a computer program product, which may be a software installation package that performs the method illustrated in FIG. 3 or FIG. 4 when executed by the computing device.
  • Examples of docs, search requests, search fields, and response messages exemplified in the specification of the present application are only used to introduce the technical solutions of the present application by way of example, and do not make actual formats of docs, search requests, search fields, and response messages. Any restrictions.
  • doc1 is: ("hostname: node1, IP: 192.199.0.1").
  • doc 1 can also be in other formats that match the search system settings.
  • domain and domain values can be separated by ":" spacers, or space separators or other spacers can be used. Separate; data from different domains can be separated by a "," spacer, or by a ";" spacer or other separator.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data configuration method, for improving the search performance of a search system. The data configuration method comprises: receiving a plurality of search requests issued by a user, each search request among the plurality of search requests comprising one or more search fields for requesting to search for a document containing the search field(s); determining the field to which each search field, among the search fields comprised in the plurality of search requests, belongs, and determining therein one or more hotspot fields; adding the one or more hotspot fields into a schema configuration document, and updating the document in the search system according to the schema configuration document with the added hotspot fields. Further provided are an index management method, a related apparatus and a computing device.

Description

数据配置方法、索引管理方法、相关装置以及计算设备Data configuration method, index management method, related device, and computing device 技术领域Technical field
本申请涉及数据搜索领域,尤其涉及一种数据配置方法、索引管理方法、相关装置以及计算设备。The present application relates to the field of data search, and in particular, to a data configuration method, an index management method, a related device, and a computing device.
背景技术Background technique
全文搜索技术是一种现阶段通用的搜索技术,用于根据搜索字段等内容作为搜索入口来搜索得到所需信息。随着大数据技术的蓬勃发展,用户数据呈现爆发式的膨胀,故现阶段对全文搜索技术的高效性和快捷性的要求也越来越高。The full-text search technology is a general-purpose search technology at present, which is used to search for the required information according to the search field and the like as a search portal. With the rapid development of big data technology, user data is exploding, so the requirements for the efficiency and speed of full-text search technology are getting higher and higher.
依赖于全文搜索技术的搜索系统主要包括Solr、Elastic等,其中Solr是当前较为流行的企业级搜索系统,其功能包括全文搜索、命中标示、分面搜索、动态聚类、数据库集成,以及富文本(如Word、PDF)的处理等。本申请仅以Solr为例来对搜索系统进行说明。Search systems relying on full-text search technology mainly include Solr, Elastic, etc. Among them, Solr is a popular enterprise search system, and its functions include full-text search, hit mark, faceted search, dynamic clustering, database integration, and rich text. (such as Word, PDF) processing. This application only uses Solr as an example to describe the search system.
搜索系统一般以数据文件(英文:document,简称:doc)为基本单位来保存信息。doc中的数据一般被保存为域(英文:field)+域值的结构形式,其中,域用于表示对应的域值的类型,域值用于记录对应的域的具体取值。The search system generally saves information by using a data file (English: document, abbreviation: doc) as a basic unit. The data in the doc is generally saved as a domain (English: field) + domain value structure, where the domain is used to indicate the type of the corresponding domain value, and the domain value is used to record the specific value of the corresponding domain.
搜索系统中的doc的域由字段配置(英文:schema)配置文件来定义。通过在schema配置文件中定义不同的域,能够实现按域搜索,提高搜索的速度和效率。The domain of the doc in the search system is defined by the field configuration (English: schema) configuration file. By defining different domains in the schema configuration file, domain-by-domain search can be implemented to improve search speed and efficiency.
但是,搜索系统中保存有巨量的doc数据,该巨量的doc数据所涵盖的域的数量更为庞大。现阶段的搜索系统依赖于技术人员在schema配置文件中人为的设定多个域,由于技术人员无法预先获知哪个域会被用户多次搜索,故设定的域在实际应用中完全有可能很少被用户搜索。这样就导致按域搜索功能的使用频率不高,进而按域搜索带来的速度和效率的提升很有限,搜索系统的搜索性能达不到用户要求。However, the search system holds a huge amount of doc data, and the huge amount of doc data covers a larger number of domains. The current search system relies on the technicians manually setting multiple domains in the schema configuration file. Since the technician cannot know in advance which domain will be searched multiple times by the user, the set domain may be very likely in practical applications. Less search by users. As a result, the use frequency of the domain search function is not high, and the speed and efficiency improvement by the domain search is limited, and the search performance of the search system does not meet the user requirements.
发明内容Summary of the invention
本申请提供了一种数据配置方法、索引管理方法、相关装置以及计算设备,用于提高搜索系统的搜索性能。The present application provides a data configuration method, an index management method, a related device, and a computing device for improving search performance of a search system.
本申请第一方面提供了一种数据配置方法,适用于搜索系统。其中,搜索系统中包括多个数据文件,每个数据文件中包括一个或多个域和该一个或多个域对应的域值。域用于表示对应的域值的类型,域值用于记录对应的域的具体取值。搜索系统中还包括schema配置文件,用于定义所述搜索系统中的数据文件的域。所述数据配置方法包括:接收用户下发的多条搜索请求,该多条搜索请求中,每条搜索请求均包括一个或多个搜索字段,用于请求查找包含该搜索字段的数据文件。确定该多条搜索请求所包括的搜索字段中,每个搜索字段所属的域,并在其中确定一个或多个热点域。将该一个或多个热点域添加到schema配置文件中,并根据添加了热点域的schema配置文件更新搜索系统中的数据文件。本申请提供的数据配置方法中,schema配置文件中的域 不是由技术人员人为设定的域,而是由搜索系统根据用户即时下发的搜索请求确定的热点域。由于热点域是搜索请求中搜索频率较高的域,故在后续时间内热点域有很大的概率能够被用户再次搜索。通过向schema配置文件中添加热点域,能够提高按域搜索的使用频率,进而充分发挥了按域搜索带来的速度和效率的提升,进一步提高了搜索系统的搜索性能。The first aspect of the present application provides a data configuration method suitable for a search system. The search system includes multiple data files, and each data file includes one or more domains and domain values corresponding to the one or more domains. The domain is used to indicate the type of the corresponding domain value. The domain value is used to record the specific value of the corresponding domain. The search system also includes a schema configuration file for defining a domain of data files in the search system. The data configuration method includes: receiving a plurality of search requests issued by a user, wherein each of the plurality of search requests includes one or more search fields for requesting to search for a data file that includes the search field. Determining the domain to which each search field belongs in the search field included in the plurality of search requests, and determining one or more hotspot domains therein. Add the one or more hotspot domains to the schema configuration file and update the data files in the search system according to the schema configuration file to which the hotspot domain is added. In the data configuration method provided by this application, the domain in the schema configuration file It is not a domain set by the technician, but a hotspot domain determined by the search system according to the search request issued by the user immediately. Since the hotspot domain is a domain with a high search frequency in the search request, the hotspot domain has a high probability to be searched again by the user in the subsequent time. By adding a hotspot domain to the schema configuration file, the frequency of searching by domain can be increased, and the speed and efficiency of the domain search can be fully utilized to further improve the search performance of the search system.
可选的,该多条搜索请求中可以携带有搜索字段所述的域,该多条搜索请求的响应消息中携带有搜索字段所述的域。搜索系统根据每条搜索请求和/或每条搜索请求对应的响应消息,确定该多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Optionally, the multiple search requests may carry the domain described by the search field, where the response message of the multiple search requests carries the domain described by the search field. The search system determines, according to each search request and/or a response message corresponding to each search request, a domain to which each search field in the search field included in the plurality of search requests belongs.
可选的,搜索系统根据每条搜索请求和/或每条搜索请求对应的响应消息中的间隔符,提取出搜索请求和/或响应消息中携带的域字段,进而确定该多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Optionally, the search system extracts a domain field carried in the search request and/or the response message according to the interval identifier in each search request and/or the response message corresponding to each search request, and further determines the multiple search request locations. The domain to which each search field in the search field is included.
可选的,搜索系统将每个搜索字段所属的域中,包括搜索字段最多的前一个或多个域确定为热点域。Optionally, the search system determines the domain in which each search field belongs, including the previous one or more domains with the most search fields as the hotspot domain.
可选的,搜索系统在确定了一个或多个热点域包括之后,根据该一个或多个热点域中每个热点域所包括的搜索字段,确定每个热点域的域值格式。Optionally, after determining that the one or more hotspot domains are included, the search system determines a domain value format of each hotspot domain according to a search field included in each hotspot domain in the one or more hotspot domains.
可选的,搜索系统将该一个或多个热点域中每个热点域所包括的搜索字段所共同符合的格式,确定为搜索每个热点域的域值格式。Optionally, the search system determines, in a format that the search fields included in each of the one or more hotspot domains are consistently matched, a domain value format for searching each hotspot domain.
可选的,搜索系统每隔域值周期,统计schema配置文件所定义的域中,每个域对应的的搜索数据。其中,搜索数据包括对应的域在当前预置周期内的被搜索次数、被搜索频率、搜索命中率中的一项或多项。搜索系统删除schema配置文件中搜索数据低于阈值的一个或多个域,以实现schema配置文件中非热点域的动态淘汰。Optionally, the search system counts the search data corresponding to each domain in the domain defined by the schema configuration file every domain value period. The search data includes one or more of the searched times, the searched frequency, and the search hit ratio of the corresponding domain in the current preset period. The search system deletes one or more domains in the schema configuration file whose search data is lower than the threshold to implement dynamic elimination of non-hotspot domains in the schema configuration file.
本申请第二方面提供了一种索引管理方法,适用于搜索系统。其中,搜索系统中包括多个数据文件,每个数据文件中包括一个或多个域和该一个或多个域对应的域值。域用于表示对应的域值的类型,域值用于记录对应的域的具体取值。搜索系统还包括该多个数据文件对应的默认索引,每个默认索引中包括其对应的数据文件在搜索系统中的保存位置,所述方法包括:接收用户下发的多条搜索请求,该多条搜索请求中,每条搜索请求均包括一个或多个搜索字段,用于请求查找包含该搜索字段的数据文件。确定该多条搜索请求所包括的搜索字段中,每个搜索字段所属的域。根据每个搜索字段所属的域所包括的搜索字段,确定每个搜索字段所属的域中每个域的域值格式。将搜索系统中,包含有符合该域值格式的字段的数据文件,确定为第一数据文件。生成每个第一数据文件对应的第二数据文件,其中,每个第二数据文件包括:其对应的第一数据文件所包含的符合域值格式的字段,以及该符合域值格式的字段所属的域。生成每个第二数据文件对应的新建索引,新建索引中包括其对应的第二数据文件在搜索系统中的保存位置。当搜索系统接收到用户新下发的搜索请求时,搜索请求中的搜索字段有较大概率能够命中索引管理装置生成的新建索引。由于新建索引的数据体量要远远小于默认索引的数据体量,故查找新建索引与直接查找默认索引相比,能够大幅度节约搜索系统的工作量,提高搜索系统的搜索速度和效率,提升搜索系统的搜索性能。The second aspect of the present application provides an index management method suitable for a search system. The search system includes multiple data files, and each data file includes one or more domains and domain values corresponding to the one or more domains. The domain is used to indicate the type of the corresponding domain value. The domain value is used to record the specific value of the corresponding domain. The search system further includes a default index corresponding to the plurality of data files, where each default index includes a storage location of the corresponding data file in the search system, and the method includes: receiving a plurality of search requests sent by the user, the multiple In the search request, each search request includes one or more search fields for requesting to find a data file containing the search field. Determining the domain to which each search field belongs in the search field included in the plurality of search requests. The domain value format of each domain in the domain to which each search field belongs is determined according to the search field included in the domain to which each search field belongs. A data file containing a field conforming to the field value format is determined in the search system as the first data file. Generating a second data file corresponding to each of the first data files, where each second data file includes: a field corresponding to the field value format included in the corresponding first data file, and a field corresponding to the field value format Domain. A new index corresponding to each second data file is generated, and the new index includes a storage location of the corresponding second data file in the search system. When the search system receives the newly issued search request from the user, the search field in the search request has a high probability of hitting the new index generated by the index management device. Since the data volume of the newly created index is much smaller than the data volume of the default index, searching for the new index can greatly save the workload of the search system and improve the search speed and efficiency of the search system, and improve the search index. Search system search performance.
可选的,该多条搜索请求中可以携带有搜索字段所述的域,该多条搜索请求的响 应消息中携带有搜索字段所述的域。搜索系统根据每条搜索请求和/或每条搜索请求对应的响应消息,确定该多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Optionally, the multiple search requests may carry the domain described by the search field, and the multiple search requests are ringing The message should carry the domain described in the search field. The search system determines, according to each search request and/or a response message corresponding to each search request, a domain to which each search field in the search field included in the plurality of search requests belongs.
可选的,搜索系统根据每条搜索请求和/或每条搜索请求对应的响应消息中的间隔符,提取出搜索请求和/或响应消息中携带的域字段,进而确定该多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Optionally, the search system extracts a domain field carried in the search request and/or the response message according to the interval identifier in each search request and/or the response message corresponding to each search request, and further determines the multiple search request locations. The domain to which each search field in the search field is included.
可选的,搜索系统根据该每个搜索字段所属的域中每个域所包括的搜索字段,确定每个搜索字段所属的域中每个域的域值格式。Optionally, the search system determines a domain value format of each domain in the domain to which each search field belongs according to the search field included in each domain in the domain to which each search field belongs.
可选的,搜索系统接收用户下发的目标搜索请求,目标搜索请求中包括目标搜索字段,用于请求查找包含该目标搜索字段的数据文件。搜索系统查找该目标搜索字段对应的新建索引。若查找到该目标搜索字段对应的新建索引,则根据查找到的新建索引,获取包含该目标搜索字段的数据文件。Optionally, the search system receives the target search request sent by the user, where the target search request includes a target search field, and is used to request to find a data file that includes the target search field. The search system finds a new index corresponding to the target search field. If the new index corresponding to the target search field is found, the data file containing the target search field is obtained according to the newly created index.
可选的,若搜索系统未查找到该目标搜索字段对应的新建索引,则查找该目标搜索字段对应的默认索引,以获取包含该目标搜索字段的数据文件。Optionally, if the search system does not find the new index corresponding to the target search field, the default index corresponding to the target search field is searched to obtain a data file that includes the target search field.
可选的,当有新的数据文件导入搜索系统时,若该新的数据文件中包含符合域值格式的字段,则搜索系统生成该新的数据文件对应的第二数据文件。生成该新的数据文件对应的第二数据文件中包括:该新的数据文件中符合域值格式的字段以及该字段所属的域。搜索系统为该新的数据文件对应的第二数据文件建立对应的新建索引,其中包括该新的数据文件对应的第二数据文件在搜索系统中的保存位置。Optionally, when a new data file is imported into the search system, if the new data file includes a field that conforms to the domain value format, the search system generates a second data file corresponding to the new data file. The second data file corresponding to the generated new data file includes: a field in the new data file that conforms to the domain value format and a domain to which the field belongs. The search system creates a corresponding new index for the second data file corresponding to the new data file, including the storage location of the second data file corresponding to the new data file in the search system.
可选的,搜索系统每隔预置周期,统计每个新建索引的搜索参数。每个新建索引的搜索参数包括该新建索引对应的域在当前预置周期内的被搜索次数、被搜索频率、搜索命中率中的一项或多项。搜索系统删除搜索参数低于阈值的一个或多个新建索引,以实现新建索引中非热点索引的动态淘汰。Optionally, the search system counts the search parameters of each new index every preset period. The search parameter of each new index includes one or more of the number of searched times, the searched frequency, and the search hit ratio of the domain corresponding to the new index in the current preset period. The search system deletes one or more new indexes whose search parameters are below the threshold to implement dynamic retirement of non-hotspot indexes in the newly created index.
本申请第三方面提供了一种数据配置装置,适用于搜索系统。其中,搜索系统中包括多个数据文件,每个数据文件中包括一个或多个域和该一个或多个域对应的域值。域用于表示对应的域值的类型,域值用于记录对应的域的具体取值。搜索系统中还包括schema配置文件,用于定义所述搜索系统中的数据文件的域。该数据配置装置包括:信息接收模块,用于接收用户下发的多条搜索请求,该多条搜索请求中每条搜索请求均包括搜索字段,用于请求查找包含该搜索字段的数据文件。域确定模块,用于确定该多条搜索请求所包括的搜索字段中,每个搜索字段所属的域。热点确定模块,用于在每个搜索字段所属的域中确定一个或多个热点域。配置修改模块,用于将该一个或多个热点域添加到schema配置文件中,并根据添加了该一个或多个热点域的schema配置文件更新搜索系统中的数据文件。本申请schema配置文件中的域不是由技术人员人为设定的域,而是由数据配置装置根据用户即时下发的搜索请求确定的热点域。由于热点域是该N个搜索请求中搜索频率较高的域,故在后续时间内热点域有很大的概率能够被用户再次搜索。通过向schema配置文件中添加热点域,能够提高按域搜索的使用频率,进而充分发挥了按域搜索带来的速度和效率的提升,进一步提高了搜索系统的搜索性能。提供的数据配置装置The third aspect of the present application provides a data configuration apparatus suitable for a search system. The search system includes multiple data files, and each data file includes one or more domains and domain values corresponding to the one or more domains. The domain is used to indicate the type of the corresponding domain value. The domain value is used to record the specific value of the corresponding domain. The search system also includes a schema configuration file for defining a domain of data files in the search system. The data configuration apparatus includes: an information receiving module, configured to receive a plurality of search requests issued by the user, each of the plurality of search requests includes a search field, configured to request to find a data file that includes the search field. The domain determining module is configured to determine a domain to which each search field belongs in the search field included in the plurality of search requests. A hotspot determination module is configured to determine one or more hotspot domains in a domain to which each search field belongs. The configuration modification module is configured to add the one or more hotspot domains to the schema configuration file, and update the data files in the search system according to the schema configuration file to which the one or more hotspot domains are added. The domain in the schema configuration file of the present application is not a domain set by a technician, but is a hotspot domain determined by the data configuration device according to a search request sent by the user immediately. Since the hotspot domain is a domain with a higher search frequency among the N search requests, the hotspot domain has a high probability to be searched again by the user in the subsequent time. By adding a hotspot domain to the schema configuration file, the frequency of searching by domain can be increased, and the speed and efficiency of the domain search can be fully utilized to further improve the search performance of the search system. Data configuration device
可选的,域确定模块具体用于:根据每条搜索请求,和/或搜索系统对每条搜索请求的响应消息,确定该多条搜索请求所包括的搜索字段中每个搜索字段所属的域。 Optionally, the domain determining module is specifically configured to: according to each search request, and/or a response message of the search system for each search request, determine a domain to which each search field in the search field included in the multiple search requests belongs .
可选的,域确定模块具体用于:根据每条搜索请求,和/或搜索系统对每条搜索请求的响应消息中的间隔符,确定该多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Optionally, the domain determining module is specifically configured to: determine, according to each search request, and/or a search box in the response message of each search request, each search in the search field included in the multiple search requests. The domain to which the field belongs.
可选的,热点确定模块具体用于:将该每个搜索字段所属的域中,包括搜索字段最多的前一个或多个域确定为热点域。Optionally, the hotspot determining module is specifically configured to: determine, in the domain to which each search field belongs, the first one or more domains including the search field to be the hotspot domain.
可选的,数据配置装置还包括格式确定模块,用于根据该一个或多个热点域中每个热点域所包括的搜索字段,确定每个热点域的域值格式。Optionally, the data configuration apparatus further includes a format determining module, configured to determine a domain value format of each hotspot domain according to a search field included in each hotspot domain in the one or more hotspot domains.
可选的,格式确定模块具体用于:将该一个或多个热点域中每个热点域所包括的搜索字段所共同符合的格式,确定为该每个热点域的域值格式。Optionally, the format determining module is specifically configured to: determine, in a format that the search fields included in each hotspot domain in the one or more hotspot domains are consistent, a domain value format of each hotspot domain.
可选的,配置修改模块还用于:每隔预置周期,统计schema配置文件所定义的域中每个域的搜索数据,该搜索数据包括schema配置文件所定义的域在当前预置周期内的被搜索次数、被搜索频率、搜索命中率中的一项或多项;在schema配置文件中删除搜索数据低于阈值的一个或多个域。Optionally, the configuration modification module is further configured to: periodically search, in a preset period, search data of each domain in the domain defined by the schema configuration file, where the search data includes a domain defined by the schema configuration file in a current preset period. One or more of the number of searches, the frequency of search, and the search hit rate; delete one or more domains whose search data is below the threshold in the schema configuration file.
本申请第四方面提供了一种索引管理装置,适用于搜索系统。其中,搜索系统中包括多个数据文件,每个数据文件中包括一个或多个域和该一个或多个域对应的域值。域用于表示对应的域值的类型,域值用于记录对应的域的具体取值。搜索系统还包括该多个数据文件对应的默认索引,每个默认索引中包括其对应的数据文件在搜索系统中的保存位置。该索引管理装置包括:接收信息模块,用于接收用户下发的多条搜索请求,该多条搜索请求中,每条搜索请求均包括搜索字段,用于请求查找包含该搜索字段的数据文件。确定域模块,用于确定该多条搜索请求所包括的搜索字段中,每个搜索字段所属的域。确定格式模块,用于根据每个搜索字段所属的域中每个域所包括的搜索字段,确定每个搜索字段所属的域中每个域的域值格式。文件确定模块,用于将搜索系统中,包含符合确定格式模块中确定的域值格式的字段的数据文件,确定为第一数据文件。文件生成模块,用于生成每个第一数据文件对应的第二数据文件,每个第二数据文件中包括对应的第一数据文件所包含的符合域值格式的字段,以及该符合所述域值格式的字段所属的域。索引管理模块,用于生成每个第二数据文件对应的新建索引,该新建索引中包括对应的第二数据文件在搜索系统中的保存位置。当搜索系统接收到用户新下发的搜索请求时,搜索请求中的搜索字段有较大概率能够命中索引管理装置生成的新建索引。由于新建索引的数据体量要远远小于默认索引的数据体量,故查找新建索引与直接查找默认索引相比,能够大幅度节约搜索系统的工作量,提高搜索系统的搜索速度和效率,提升搜索系统的搜索性能。The fourth aspect of the present application provides an index management apparatus suitable for a search system. The search system includes multiple data files, and each data file includes one or more domains and domain values corresponding to the one or more domains. The domain is used to indicate the type of the corresponding domain value. The domain value is used to record the specific value of the corresponding domain. The search system further includes a default index corresponding to the plurality of data files, and each default index includes a storage location of the corresponding data file in the search system. The index management apparatus includes: a receiving information module, configured to receive a plurality of search requests issued by the user, where each of the plurality of search requests includes a search field for requesting to search for a data file that includes the search field. The determining domain module is configured to determine a domain to which each search field belongs in the search field included in the plurality of search requests. The determining format module is configured to determine a domain value format of each domain in a domain to which each search field belongs according to a search field included in each domain in a domain to which each search field belongs. And a file determining module, configured to determine, as the first data file, a data file in the search system that includes a field that conforms to the domain value format determined in the determining format module. a file generating module, configured to generate a second data file corresponding to each first data file, where each second data file includes a field corresponding to the field value format included in the corresponding first data file, and the domain conforms to the domain The domain to which the field of the value format belongs. The index management module is configured to generate a new index corresponding to each second data file, where the new index includes a storage location of the corresponding second data file in the search system. When the search system receives the newly issued search request from the user, the search field in the search request has a high probability of hitting the new index generated by the index management device. Since the data volume of the newly created index is much smaller than the data volume of the default index, searching for the new index can greatly save the workload of the search system and improve the search speed and efficiency of the search system, and improve the search index. Search system search performance.
可选的,确定域模块具体用于:根据每条搜索请求,和/或搜索系统对每条搜索请求的响应消息,确定该多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Optionally, the determining domain module is specifically configured to: according to each search request, and/or a response message of the search system for each search request, determine a domain to which each search field in the search field included in the multiple search requests belongs .
可选的,确定域模块具体用于:根据每条搜索请求,和/或搜索系统对每条搜索请求的响应消息中的间隔符,确定该多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Optionally, the determining domain module is specifically configured to: determine, according to each search request, and/or a search box in the response message of each search request, each search in the search field included in the multiple search requests. The domain to which the field belongs.
可选的,确定格式模块具体用于:将每个搜索字段所属的域中每个域所包括的搜索字段所共同符合的格式,确定为每个搜索字段所属的域中每个域的域值格式。Optionally, the determining format module is specifically configured to determine, in a format that the search fields included in each domain in each domain of each search field are consistent, a domain value of each domain in the domain to which each search field belongs. format.
可选的,接收信息模块还用于:接收用户下发的目标搜索请求,该目标搜索请求 中包括目标搜索字段,用于请求查找包含该目标搜索字段的数据文件。Optionally, the receiving information module is further configured to: receive a target search request sent by the user, and the target search request A target search field is included for requesting to find a data file containing the target search field.
索引管理装置还包括文件搜索模块,用于:查找目标搜索字段对应的新建索引;若查找到该目标搜索字段对应的新建索引,则根据目标搜索字段对应的新建索引对应,获取包含该目标搜索字段的数据文件。The index management device further includes a file search module, configured to: search for a new index corresponding to the target search field; if the new index corresponding to the target search field is found, obtain the target search field according to the new index corresponding to the target search field. Data file.
可选的,文件搜索模块还用于:若未查找到目标搜索字段对应的新建索引,则查找该目标搜索字段对应的默认索引。Optionally, the file search module is further configured to: if the new index corresponding to the target search field is not found, look for a default index corresponding to the target search field.
可选的,文件生成模块还用于:当搜索系统中导入新的数据文件时,若该新的数据文件中包含确定符合格式模块所确定的域值格式的字段,则生成该新的数据文件对应的第二数据文件,该新的数据文件对应的第二数据文件中包括:该新的数据文件中符合该域值格式的字段,以及该新的数据文件中符合该域值格式的字段所属的域。Optionally, the file generating module is further configured to: when the new data file is imported in the search system, if the new data file includes a field that determines a format corresponding to the domain value determined by the format module, the new data file is generated. Corresponding second data file, the second data file corresponding to the new data file includes: a field in the new data file that conforms to the domain value format, and a field in the new data file that conforms to the domain value format Domain.
索引管理模块还用于:生成新的数据文件对应的第二数据文件对应的新建索引,该新的数据文件对应的第二数据文件对应的新建索引中包括:该新的数据文件对应的第二数据文件在搜索系统中的保存位置。The index management module is further configured to: generate a new index corresponding to the second data file corresponding to the new data file, where the new index corresponding to the second data file corresponding to the new data file includes: a second corresponding to the new data file The location where the data file is saved in the search system.
可选的,索引管理模块还用于:隔预置周期,统计每个新建索引的搜索参数,其中,搜索参数包括每个新建索引在当前预置周期内的被搜索次数、被搜索频率、搜索命中率中的一项或多项;删除搜索参数低于阈值的一个或多个新建索引。Optionally, the index management module is further configured to: collect, according to a preset period, a search parameter of each new index, where the search parameter includes the number of times of searching, the frequency of search, and the search for each new index in the current preset period. One or more of the hit ratios; delete one or more new indexes whose search parameters are below the threshold.
本申请第五方面提供了一种计算设备,包括处理器、存储器以及通信接口,其特征在于,通过调用存储器中存储的程序代码,所述处理器用于执行本申请第一方面提供的数据配置方法。A fifth aspect of the present application provides a computing device, including a processor, a memory, and a communication interface, wherein the processor is configured to execute the data configuration method provided by the first aspect of the present application by calling program code stored in a memory. .
本申请第六方面提供了一种计算设备,包括处理器、存储器以及通信接口,其特征在于,通过调用存储器中存储的程序代码,所述处理器用于执行本申请第二方面提供的索引管理方法。A sixth aspect of the present application provides a computing device, including a processor, a memory, and a communication interface, wherein the processor is configured to execute the index management method provided by the second aspect of the present application by calling program code stored in a memory. .
本申请第七方面提供了一种计算机程序产品,该该计算机程序产品可以为一个软件安装包,该软件安装包被计算设备运行时,执行本申请第一方面或第一方面任一种实现方式提供的数据配置方法。The seventh aspect of the present application provides a computer program product, which may be a software installation package, and when the software installation package is executed by the computing device, perform the first aspect or any implementation manner of the first aspect of the present application. The data configuration method provided.
本申请第八方面提供了一种计算机程序产品,该该计算机程序产品可以为一个软件安装包,该软件安装包被计算设备运行时,执行本申请第二方面或第二方面任一种实现方式提供的索引管理方法。The eighth aspect of the present application provides a computer program product, which may be a software installation package, when the software installation package is executed by the computing device, performing the second aspect or the second aspect of the present application. The index management method provided.
本申请第九方面提供了一种存储介质,该存储介质中存储了程序代码,该程序代码被计算设备运行时,执行本申请第一方面提供的数据配置方法。该存储介质包括但不限于快闪存储器、硬盘(英文:hard disk drive,缩写:HDD)或固态硬盘(英文:solid state drive,缩写:SSD)。The ninth aspect of the present application provides a storage medium, where the program code is stored, and when the program code is run by the computing device, the data configuration method provided by the first aspect of the present application is executed. The storage medium includes, but is not limited to, a flash memory, a hard disk (English: hard disk drive, HDD), or a solid state drive (English: solid state drive, abbreviated as SSD).
本申请第十方面提供了一种存储介质,该存储介质中存储了程序代码,该程序代码被计算设备运行时,执行本申请第二方面提供的索引管理方法。该存储介质包括但不限于快闪存储器、HDD或SSD。A tenth aspect of the present application provides a storage medium, where the program code is stored, and when the program code is run by the computing device, the index management method provided by the second aspect of the present application is executed. The storage medium includes, but is not limited to, a flash memory, an HDD, or an SSD.
附图说明DRAWINGS
图1(a)为搜索系统一种实现方式示意图;Figure 1 (a) is a schematic diagram of an implementation of the search system;
图1(b)为搜索系统另一种实现方式示意图; Figure 1 (b) is a schematic diagram of another implementation of the search system;
图2为本申请提供的计算设备一个实施例结构图;2 is a structural diagram of an embodiment of a computing device provided by the present application;
图3为本申请提供的数据配置方法一个实施例流程图;FIG. 3 is a flowchart of an embodiment of a data configuration method provided by the present application;
图4为本申请提供的索引管理方法一个实施例流程图;4 is a flowchart of an embodiment of an index management method provided by the present application;
图5为本申请提供的数据配置装置一个实施例结构图;FIG. 5 is a structural diagram of an embodiment of a data configuration apparatus provided by the present application; FIG.
图6为本申请提供的索引管理装置一个实施例结构图。FIG. 6 is a structural diagram of an embodiment of an index management apparatus provided by the present application.
具体实施方式detailed description
本申请提供了一种数据配置方法、索引管理方法、相关装置和计算设备,以下将分别进行描述。The present application provides a data configuration method, an index management method, a related device, and a computing device, which will be separately described below.
搜索系统部署在搜索设备上,通过搜索设备与用户进行交互,请参阅图1(a)。搜索设备100主要包括通信单元101、处理单元102以及存储单元103。其中,存储单元103用于存储搜索系统要保存的数据,例如用于存储搜索系统的数据文件、索引等。通信单元101用于搜索系统与用户之间的信息交互,例如用于接收用户下发的搜索请求,并向用户回复该搜索请求的响应消息。处理单元102用于执行数据处理操作,例如用于根据用户的搜索请求进行搜索操作,并根据搜索结果生成该搜索请求的响应消息。The search system is deployed on the search device and interacts with the user through the search device, see Figure 1(a). The search device 100 mainly includes a communication unit 101, a processing unit 102, and a storage unit 103. The storage unit 103 is configured to store data to be saved by the search system, for example, a data file, an index, and the like for storing the search system. The communication unit 101 is configured to search for information interaction between the system and the user, for example, to receive a search request issued by the user, and reply the user with a response message of the search request. The processing unit 102 is configured to perform a data processing operation, for example, for performing a search operation according to a search request of the user, and generating a response message of the search request according to the search result.
图1(a)展示的是在单台搜索设备上部署搜索系统的场景。在实际应用中,搜索系统也可以部署在多台搜索设备上,请参阅图1(b)。图1(b)所示的搜索系统包括多台如图1(a)中所示的搜索设备100,搜索系统所要保存的数据分布式的保存在每台搜索设备100中的存储单元中,每台搜索设备100之间通过通信单元101进行信息交互。当用户向其中一台搜索设备100下发搜索请求时,该多台搜索设备100的处理单元可以根据用户的搜索请求进行分布式的搜索操作,并将搜索结果汇总后生成响应消息反馈给用户。Figure 1(a) shows a scenario in which a search system is deployed on a single search device. In practical applications, the search system can also be deployed on multiple search devices, see Figure 1(b). The search system shown in FIG. 1(b) includes a plurality of search devices 100 as shown in FIG. 1(a), and data to be saved by the search system is distributed and stored in a storage unit in each search device 100, each Information exchange between the station search devices 100 through the communication unit 101 is performed. When the user sends a search request to one of the search devices 100, the processing unit of the plurality of search devices 100 may perform a distributed search operation according to the search request of the user, and feed back the search result to generate a response message to the user.
图1(a)与图1(b)中的搜索设备可以通过图2所示的计算设备200来实现,其组织结构包括:处理器201、存储器202、通信接口203,还可以包括总线204。其中,处理器201可以为处理单元102的一种实现方式,存储器可以为存储单元103的一种实现方式,通信接口203可以为通信单元101的一种实现方式。The search device in FIG. 1(a) and FIG. 1(b) can be implemented by the computing device 200 shown in FIG. 2, and its organizational structure includes a processor 201, a memory 202, a communication interface 203, and a bus 204. The processor 201 can be an implementation of the processing unit 102, the memory can be an implementation of the storage unit 103, and the communication interface 203 can be an implementation of the communication unit 101.
其中,处理器201、存储器202和通信接口203可以通过总线204实现彼此之间的通信连接,也可以通过无线传输等其他手段实现通信。The processor 201, the memory 202, and the communication interface 203 can implement communication connection with each other through the bus 204, and can also implement communication by other means such as wireless transmission.
存储器202可以包括易失性存储器(英文:volatile memory),例如随机存取存储器(英文:random-access memory,缩写:RAM);存储器也可以包括非易失性存储器(英文:non-volatile memory),例如只读存储器(英文:read-only memory,缩写:ROM),快闪存储器(英文:flash memory),HDD或SSD;存储器202还可以包括上述种类的存储器的组合。计算设备200在运行时,存储器202加载其中的数据文件、索引等数据以供处理器201使用。在通过软件来实现本申请提供的技术方案时,软件的程序代码可以保存在存储器202中,并由处理器201来调用执行。The memory 202 may include a volatile memory (English: volatile memory), such as random-access memory (abbreviation: RAM); the memory may also include non-volatile memory (English: non-volatile memory) For example, read-only memory (English: read-only memory, abbreviated: ROM), flash memory (English: flash memory), HDD or SSD; the memory 202 may also include a combination of the above types of memory. While the computing device 200 is running, the memory 202 loads data files, indexes, etc. therein for use by the processor 201. When the technical solution provided by the present application is implemented by software, the program code of the software may be saved in the memory 202 and executed by the processor 201.
处理器201可以为中央处理器(英文:central processing unit,缩写:CPU),也可以由专用集成电路(英文:application specific integrated circuits,缩写ASIC)、现场可编程门阵列(英文:field-programmable gate array,缩写:FPGA) 等具有数据处理功能的元器件来实现。计算设备200在运行时,处理器201调用存储器202中的程序代码,执行数据处理操作。The processor 201 can be a central processing unit (English: central processing unit, abbreviation: CPU), or can be an application specific integrated circuit (English: application specific integrated circuits, ASIC), field programmable gate array (English: field-programmable gate Array, abbreviation: FPGA) Such as components with data processing functions to achieve. While the computing device 200 is running, the processor 201 calls the program code in the memory 202 to perform a data processing operation.
通信接口203作为用户和搜索系统之间的交互接口,用于将用户下发的搜索请求传递给处理器201,并将处理器201生成的响应消息传递给用户。The communication interface 203 serves as an interaction interface between the user and the search system, and is used to deliver the search request issued by the user to the processor 201, and deliver the response message generated by the processor 201 to the user.
搜索系统一般以doc为基本单位来保存信息。搜索系统在运行后,为保存的doc建立对应的默认索引,默认索引中记录有doc保存在搜索系统中的位置信息。用户在搜索时下发搜索请求,搜索请求中包括搜索字段,用于请求搜索系统查找包含该搜索字段的doc。搜索系统通过查找与搜索字段相匹配的默认索引,就能够从匹配的默认索引中记录的位置处获取包含搜索字段的doc。搜索系统将doc携带在响应消息中反馈给用户。The search system generally uses doc as the basic unit to store information. After the search system is running, a corresponding default index is created for the saved doc, and the default index records the location information of the doc stored in the search system. The user sends a search request when searching, and the search request includes a search field for requesting the search system to find the doc containing the search field. The search system can obtain the doc containing the search field from the location recorded in the matching default index by looking up the default index that matches the search field. The search system sends the doc back to the user in the response message.
doc中的数据一般被保存为域+域值的结构形式,每个doc可以包括一个域和其对应的域值,也可以包括多个域和其对应的域值。其中,域用于表示对应的域值的类型,域值用于记录对应的域的具体取值。The data in doc is generally saved as a structural form of domain + domain values. Each doc can include a domain and its corresponding domain value, and can also include multiple domains and their corresponding domain values. The domain is used to indicate the type of the corresponding domain value, and the domain value is used to record the specific value of the corresponding domain.
搜索系统中的doc的结构通过字段配置(英文:schema)配置文件来进行管理。schema配置文件是一种xml格式的文件,一般保存在conf目录下,用于定义搜索系统中doc的结构形式。具体的,doc中的域也由schema配置文件来定义。The structure of the doc in the search system is managed by a field configuration (English: schema) configuration file. The schema configuration file is a file in xml format, which is generally stored in the conf directory and is used to define the structure of the doc in the search system. Specifically, the fields in the doc are also defined by the schema configuration file.
以schema配置文件中定义了content域为例:content域用于表示“内容”这一类型。若搜索系统中保存有doc 1:(“hostname:node 1,IP:192.199.0.1”),则doc 1包括content域,且doc 1的content域的域值为“hostname:node 1,IP:192.199.0.1”。For example, the content field is defined in the schema configuration file: the content field is used to indicate the type of "content". If doc 1: ("hostname: node 1, IP: 192.199.0.1") is saved in the search system, doc 1 includes the content field, and the domain value of the content field of doc 1 is "hostname: node 1, IP: 192.199 .0.1”.
又举例来说,假设schema配置文件中除了定义content域之外,还定义有hostname域和IP域。其中hostname域用于表示“主机名”这一类型,IP域用于表示“IP地址”这一类型。若搜索系统中保存有doc 2:(“hostname:node 1,IP:192.199.0.1”,“hostname:node 1”,“IP:192.199.0.1”),则doc 2包括content域、hostname域以及IP域,其中content域的域值为“hostname:node 1,IP:192.199.0.1”,hostname域的域值为node 1(即表示主机名为node 1),IP域的域值为192.199.0.1(即表示IP地址为192.199.0.1)。For another example, suppose that the schema configuration file defines a hostname field and an IP domain in addition to the content field. The hostname field is used to indicate the type of "hostname", and the IP domain is used to indicate the type of "IP address". If the search system holds doc 2: ("hostname: node 1, IP: 192.199.0.1", "hostname: node 1", "IP: 192.199.0.1"), then doc 2 includes the content domain, hostname domain, and IP. Domain, where the domain value of the content field is "hostname:node 1, IP:192.199.0.1", the domain value of the hostname domain is node 1 (that is, the host name is node 1), and the domain value of the IP domain is 192.199.0.1 ( That is, the IP address is 192.199.0.1).
schema配置文件定义了域后,搜索系统便能够实现按域搜索的功能。按域搜索往往比直接搜索更为迅速高效。举例来说,假设用户下发搜索请求为“IP:192.199.0.1”。若schema配置文件除了定义content域之外,还定义有IP域,则搜索系统直接查找包括IP域且IP域的域值为192.199.0.1的doc即可。若schema配置文件仅仅定义了content域,没有定义IP域,则搜索系统需要搜索每个doc的content域的域值中是否包括“IP:192.199.0.1”字段。显而易见的,通过schema配置文件定义多个域,使得搜索系统进行按域搜索,能够减少搜索的doc数量和搜索字段长度,进而提升搜索操作的速度和效率。After the schema configuration file defines the domain, the search system can implement the domain search function. Search by domain is often faster and more efficient than direct search. For example, suppose the user sends a search request to "IP: 192.199.0.1". If the schema configuration file defines an IP domain in addition to the content field, the search system directly searches for a doc including an IP domain and an IP domain with a domain value of 192.199.0.1. If the schema configuration file only defines the content field and no IP domain is defined, the search system needs to search whether the domain value of the content field of each doc includes the "IP:192.199.0.1" field. Obviously, defining multiple domains through the schema configuration file enables the search system to perform domain-by-domain search, which can reduce the number of docs searched and the length of search fields, thereby improving the speed and efficiency of the search operation.
现阶段的技术中,schema配置文件的域一般是技术人员人为设定的。人为设定的域不一定能够贴近用户对搜索系统的即时需求,故无法发挥出按域搜索带来的速度和效率的提升。 In the current technology, the domain of the schema configuration file is generally set by the technician. The artificially set domain does not necessarily be close to the user's immediate need for the search system, so the speed and efficiency of the domain search cannot be improved.
为了解决上述问题,本申请提供了一种数据配置方法,图1(a)与图1(b)所示的搜索设备100与图2所示的计算设备200在运行时执行该方法,其基本流程请参阅图3:In order to solve the above problem, the present application provides a data configuration method, and the search device 100 shown in FIG. 1(a) and FIG. 1(b) and the computing device 200 shown in FIG. 2 execute the method at runtime, which is basically See Figure 3 for the process:
301、接收用户下发的N条搜索请求,该N条搜索请求包括M个搜索字段。301. Receive N search requests sent by a user, where the N search requests include M search fields.
搜索系统接收用户下发的N条搜索请求,N为大于1的正整数。本申请对该N条搜索请求不做具体限定,举例来说,该N条搜索请求可以是搜索系统在预设时间段内接收到的搜索请求,又举例来说,该N条搜索请求可以是搜索系统接收到的用户下发的最新的N条搜索请求。The search system receives N search requests sent by the user, where N is a positive integer greater than 1. The N search request is not specifically limited. For example, the N search requests may be search requests received by the search system within a preset time period. For example, the N search requests may be The latest N search requests sent by the user received by the search system.
该N条搜索请求中,每条搜索请求均包括搜索字段,用于搜索包含该搜索字段的doc。其中,每条搜索请求中可以包括一个搜索字段,也可以包括多个搜索字段。Each of the N search requests includes a search field for searching for a doc containing the search field. Wherein, each search request may include one search field, and may also include multiple search fields.
若一条搜索请求包括多个搜索字段,则该多个搜索字段可以用“AND”、“OR”或其他逻辑连接词相连,用于表示“和”、“或”等关系,本申请中不做限定。If a search request includes multiple search fields, the multiple search fields may be connected by "AND", "OR" or other logical connection words, and are used to indicate "and", "or", etc., and are not used in this application. limited.
该N条搜索请求中,不同的搜索请求所包括的搜索字段可以相同也可以不同。本申请中以该N条搜索请求中共包括M个不同的搜索字段为例进行说明,其中M为正整数。Among the N search requests, the search fields included in different search requests may be the same or different. In the present application, the M search requests include a total of M different search fields as an example, where M is a positive integer.
302、确定该M个搜索字段所属的域。302. Determine a domain to which the M search fields belong.
搜索系统确定该M个搜索字段中每个搜索字段所属的域,即确定该M个搜索字段中每个搜索字段的类型。The search system determines the domain to which each of the M search fields belongs, that is, determines the type of each of the M search fields.
在有些场景中,搜索字段中可能已经携带了其所属的域。在这种场景下,搜索系统可以根据搜索请求中的搜索字段直接确定搜索字段所属的域。举例来说:若搜索请求为“IP:192.199.0.1”,则显而易见的,IP是该搜索字段所属的域。In some scenarios, the search field may already carry the domain to which it belongs. In this scenario, the search system can directly determine the domain to which the search field belongs based on the search field in the search request. For example: If the search request is "IP: 192.199.0.1", then it is obvious that IP is the domain to which the search field belongs.
在有些场景中,搜索字段中也可能没有携带其所属的域。在这种场景下,搜索系统无法根据搜索请求中的搜索字段直接确定搜索字段所属的域。但由于搜索请求的相应消息中携带有完整的doc,而doc中携带有域,故搜索系统可以根据响应确定搜索字段所属的域。举例来说:用户下发的搜索字段仅为“192.199.0.1”,而该搜索请求的相应消息中携带有doc 1:(“hostname:node 1,IP:192.199.0.1”),则显而易见的,IP是该搜索字段所属的域。In some scenarios, the search field may not carry the domain it belongs to. In this scenario, the search system cannot directly determine the domain to which the search field belongs based on the search field in the search request. However, since the corresponding message of the search request carries the complete doc, and the doc carries the domain, the search system can determine the domain to which the search field belongs according to the response. For example, if the search field sent by the user is only "192.199.0.1", and the corresponding message of the search request carries doc 1: ("hostname: node 1, IP: 192.199.0.1"), it is obvious that IP is the domain to which the search field belongs.
可选的,搜索系统可以根据搜索请求或响应消息中的间隔符来确定搜索字段所属的域。举例来说,若搜索请求为“IP:192.199.0.1”,则搜索系统可以确定搜索请求中间隔符“:”前面的字段IP即为搜索字段所属的域。又举例来说,若搜索请求为“192.199.0.1”,响应消息中携带有doc 1:(“hostname:node 1,IP:192.199.0.1”)中,则搜索系统可以确定响应消息中位于192.199.0.1相邻的间隔符“:”前面,且位于间隔符“,”后面的字段IP即为搜索字段所属的域。搜索系统具体根据哪种间隔符来确定搜索字段所属的域与doc的格式有关,本申请仅以间隔符“:”和“,”为例进行说明,在某些搜索系统中,也可能根据其它间隔符来确定搜索字段所属的域,本申请中不做限定。Optionally, the search system may determine the domain to which the search field belongs based on the interval in the search request or response message. For example, if the search request is "IP: 192.199.0.1", the search system can determine that the field IP in front of the interval ":" in the search request is the domain to which the search field belongs. For another example, if the search request is "192.199.0.1" and the response message carries doc 1: ("hostname: node 1, IP: 192.199.0.1"), the search system can determine that the response message is located at 192.199. The field IP preceded by 0.1 adjacent spacer ":" and located after the spacer "," is the domain to which the search field belongs. The search system determines the domain to which the search field belongs according to the format of the doc. This application only uses the intervals ":" and "," as an example. In some search systems, it may also be based on other The spacer is used to determine the domain to which the search field belongs, which is not limited in this application.
303、确定K个热点域。303. Determine K hotspot domains.
搜索系统确定了该M个搜索字段所属的域后,在该M个搜索字段所属的域中选择 K个热点域,其中K为正整数。After the search system determines the domain to which the M search fields belong, select in the domain to which the M search fields belong K hotspots, where K is a positive integer.
该M个搜索字段中,不同的搜索字段所属的域可以相同也可以不同。故该M个搜索字段所属的域中,某些域可以包括多个搜索字段,某些域可以包括一个搜索字段。可选的,搜索系统可以将该M个搜索字段所属的域中,包括搜索字段的个数最多的前K个域确定为热点域。Among the M search fields, different search fields may have the same or different domains. Therefore, in the domain to which the M search fields belong, some domains may include multiple search fields, and some domains may include one search field. Optionally, the search system may determine, in the domain to which the M search fields belong, the first K domains including the maximum number of search fields as the hotspot domain.
304、将该K个热点域添加到schema配置文件中。304. Add the K hotspot domains to the schema configuration file.
搜索系统确定了热点域后,将热点域添加到schema配置文件中。After the search system determines the hotspot domain, add the hotspot domain to the schema configuration file.
本实施例提供的数据配置方法中,搜索系统接收用户下发的N个搜索请求,并确定该N个搜索请求中的M个搜索字段所属的域,然后在该M个搜索字段所属的域中确定K个热点域,并将该K个热点域添加到schema配置文件中。这样搜索系统后续接收到属于热点域的搜索字段后,可以直接进行按域搜索。本实施例中,schema配置文件中的域不是由技术人员人为设定的域,而是由搜索系统根据用户即时下发的搜索请求确定的热点域。由于热点域是该N个搜索请求中搜索频率较高的域,故在后续时间内热点域有很大的概率能够被用户再次搜索。通过向schema配置文件中添加热点域,能够提高按域搜索的使用频率,进而充分发挥了按域搜索带来的速度和效率的提升,进一步提高了搜索系统的搜索性能。In the data configuration method provided by the embodiment, the search system receives N search requests sent by the user, and determines a domain to which the M search fields in the N search requests belong, and then in the domain to which the M search fields belong. Determine K hotspot domains and add the K hotspots to the schema configuration file. In this way, after the search system subsequently receives the search field belonging to the hotspot domain, the domain search can be directly performed. In this embodiment, the domain in the schema configuration file is not a domain set by a technician, but is a hotspot domain determined by the search system according to a search request sent by the user immediately. Since the hotspot domain is a domain with a higher search frequency among the N search requests, the hotspot domain has a high probability to be searched again by the user in the subsequent time. By adding a hotspot domain to the schema configuration file, the frequency of searching by domain can be increased, and the speed and efficiency of the domain search can be fully utilized to further improve the search performance of the search system.
需要指出的是,由于schema配置文件用于定义搜索系统中doc格式的配置文件,故在将热点域添加到schema配置文件中后,搜索系统需要根据添加了热点域的schema配置文件,更新系统中保存的doc以及doc的默认索引。在doc以及其默认索引更新完成后,搜索系统才能进行后续的搜索操作。It should be noted that, since the schema configuration file is used to define the configuration file in the doc format in the search system, after the hotspot domain is added to the schema configuration file, the search system needs to update the system according to the schema configuration file to which the hotspot domain is added. The saved doc and the default index of the doc. After the doc and its default index update are complete, the search system can perform subsequent search operations.
可选的,在步骤303之后,搜索系统还可以执行步骤:Optionally, after step 303, the search system may further perform the steps:
305、确定该K个热点域的域值格式。305. Determine a domain value format of the K hotspot domains.
搜索系统确定了确定了K个热点域之后,根据该K个热点域中每个热点域所包括的搜索字段,确定该K个热点域的域值格式。其中,热点域的域值格式表示热点域的域值所符合的格式。After the search system determines that the K hotspot domains are determined, the domain value format of the K hotspot domains is determined according to the search fields included in each of the K hotspot domains. The domain value format of the hotspot domain indicates the format in which the domain value of the hotspot domain conforms.
具体的,搜索系统可以将每个热点域所包括的搜索字段所共同符合的格式,确定为每个热点域的域值格式。举例来说,若热点域为“IP”,该IP域包括两个搜索字段:“192.199.0.1”与“192.199.0.2”。则搜索系统将“192.199.0.*”确定为IP域的域值格式,其中*表示模糊匹配。Specifically, the search system may determine a format in which the search fields included in each hotspot domain conform to each other as the domain value format of each hotspot domain. For example, if the hotspot domain is "IP", the IP domain includes two search fields: "192.199.0.1" and "192.199.0.2". The search system then determines "192.199.0.*" as the domain value format of the IP domain, where * indicates a fuzzy match.
域值格式可以为正则表达式的形式,也可以为其它形式,本申请中不做限定。The field value format may be in the form of a regular expression or other forms, which is not limited in this application.
本申请对步骤305与步骤304的顺序不做限定,步骤305也可以在步骤304之前。The application does not limit the order of step 305 and step 304, and step 305 may also precede step 304.
可选的,在确定了该K个热点域的域值格式之后,搜索系统接收用户新下发的第一搜索请求,第一搜索请求中包括第一搜索字段。若第一搜索字段中没有携带其所属的域,则搜索系统判断第一搜索字段是否符合热点域的格式。若第一搜索字段符合热点域中的第一域的格式,则认为第一搜索字段属于第一域,搜索系统可以执行按域搜索操作。Optionally, after determining the domain value format of the K hotspot domains, the search system receives the first search request newly sent by the user, where the first search request includes the first search field. If the first search field does not carry the domain to which it belongs, the search system determines whether the first search field conforms to the format of the hotspot domain. If the first search field conforms to the format of the first domain in the hotspot domain, the first search field is considered to belong to the first domain, and the search system may perform a domain-by-domain search operation.
可选的,搜索系统可以每隔预置周期,统计schema配置文件所定义的域中,每个域对应的的搜索数据。其中,搜索数据包括对应的域在当前预置周期内的被搜索次数、被搜索频率、搜索命中率中的一项或多项。搜索系统删除schema配置文件中搜索数据 低于阈值的一个或多个域,以实现schema配置文件中非热点域的动态淘汰。Optionally, the search system may count the search data corresponding to each domain in the domain defined by the schema configuration file every preset period. The search data includes one or more of the searched times, the searched frequency, and the search hit ratio of the corresponding domain in the current preset period. Search system deletes search data in schema configuration file One or more domains below the threshold to implement dynamic retirement of non-hotspot domains in the schema configuration file.
图3所示的数据配置方法通过根据用户的搜索请求动态的修改schema配置文件,实现了搜索系统的性能提升。下面将介绍一种索引管理方法,以同样达到提升搜索系统的性能的目的。图1(a)与图1(b)所示的搜索设备100与图2所示的计算设备200在运行时执行该方法,其基本流程请参阅图4:The data configuration method shown in FIG. 3 achieves performance improvement of the search system by dynamically modifying the schema configuration file according to the user's search request. The following describes an index management method to achieve the same purpose of improving the performance of the search system. The search device 100 shown in FIG. 1(a) and FIG. 1(b) and the computing device 200 shown in FIG. 2 execute the method at runtime. The basic flow is shown in FIG. 4:
401、接收用户下发的N条搜索请求,该N条搜索请求包括M个搜索字段。401. Receive N search requests sent by the user, where the N search requests include M search fields.
搜索系统接收用户下发的N条搜索请求,N为正整数。本申请对该N条搜索请求不做具体限定,举例来说,该N条搜索请求可以是搜索系统在预设时间段内接收到的搜索请求,又举例来说,该N条搜索请求可以是搜索系统接收到的用户下发的最新的N条搜索请求。The search system receives N search requests sent by the user, where N is a positive integer. The N search request is not specifically limited. For example, the N search requests may be search requests received by the search system within a preset time period. For example, the N search requests may be The latest N search requests sent by the user received by the search system.
该N条搜索请求中,每条搜索请求均包括搜索字段,用于搜索包含该搜索字段的doc。其中,每条搜索请求中可以包括一个搜索字段,也可以包括多个搜索字段。一条搜索请求中的多个搜索字段可以用“AND”、“OR”或其他逻辑连接词相连,用于表示“和”、“或”等关系,本申请中不做限定。该N条搜索请求中,不同的搜索请求所包括的搜索字段可以相同也可以不同。本申请中以该N条搜索请求中共包括M个不同的搜索字段为例进行说明,其中M为正整数。Each of the N search requests includes a search field for searching for a doc containing the search field. Wherein, each search request may include one search field, and may also include multiple search fields. A plurality of search fields in a search request may be connected by "AND", "OR" or other logical connection words, and are used to indicate "and", "or", etc., which are not limited in the present application. Among the N search requests, the search fields included in different search requests may be the same or different. In the present application, the M search requests include a total of M different search fields as an example, where M is a positive integer.
402、确定该M个搜索字段所属的L个域。402. Determine L domains to which the M search fields belong.
搜索系统确定该M个搜索字段中每个搜索字段所属的域,即确定该M个搜索字段中每个搜索字段的类型。The search system determines the domain to which each of the M search fields belongs, that is, determines the type of each of the M search fields.
在有些场景中,搜索字段中可能已经携带了其所属的域。在这种场景下,搜索系统可以根据搜索请求中的搜索字段直接确定搜索字段所属的域。举例来说:若搜索请求为“IP:192.199.0.1”,则显而易见的,IP是该搜索字段所属的域。In some scenarios, the search field may already carry the domain to which it belongs. In this scenario, the search system can directly determine the domain to which the search field belongs based on the search field in the search request. For example: If the search request is "IP: 192.199.0.1", then it is obvious that IP is the domain to which the search field belongs.
在有些场景中,搜索字段中也可能没有携带其所属的域。在这种场景下,搜索系统无法根据搜索请求中的搜索字段直接确定搜索字段所属的域。但由于搜索请求的相应消息中携带有完整的doc,而doc中携带有域,故搜索系统可以根据响应确定搜索字段所属的域。举例来说:用户下发的搜索字段仅为“192.199.0.1”,而该搜索请求的相应消息中携带有doc 1:(“hostname:node 1,IP:192.199.0.1”),则显而易见的,IP是该搜索字段所属的域。In some scenarios, the search field may not carry the domain it belongs to. In this scenario, the search system cannot directly determine the domain to which the search field belongs based on the search field in the search request. However, since the corresponding message of the search request carries the complete doc, and the doc carries the domain, the search system can determine the domain to which the search field belongs according to the response. For example, if the search field sent by the user is only "192.199.0.1", and the corresponding message of the search request carries doc 1: ("hostname: node 1, IP: 192.199.0.1"), it is obvious that IP is the domain to which the search field belongs.
可选的,搜索系统可以根据搜索请求或响应消息中的间隔符来确定搜索字段所属的域。举例来说,若搜索请求为“IP:192.199.0.1”,则搜索请求中间隔符“:”前面的字段IP即为搜索字段所属的域。又举例来说,若搜索请求为“192.199.0.1”,响应消息中携带有doc 1:(“hostname:node 1,IP:192.199.0.1”)中,则响应消息中位于192.199.0.1相邻的间隔符“:”前面,且位于间隔符“,”后面的字段IP即为搜索字段所属的域。搜索系统具体根据哪种间隔符来确定搜索字段所属的域与doc的格式有关,本申请仅以间隔符“:”和“,”为例进行说明,在某些搜索系统中,也可能根据其它间隔符来确定搜索字段所属的域,本申请中不做限定。Optionally, the search system may determine the domain to which the search field belongs based on the interval in the search request or response message. For example, if the search request is "IP: 192.199.0.1", the field IP in front of the interval ":" in the search request is the domain to which the search field belongs. For another example, if the search request is "192.199.0.1" and the response message carries doc 1: ("hostname: node 1, IP: 192.199.0.1"), the response message is located adjacent to 192.199.0.1. The field IP before the separator ":" and located after the spacer "," is the domain to which the search field belongs. The search system determines the domain to which the search field belongs according to the format of the doc. This application only uses the intervals ":" and "," as an example. In some search systems, it may also be based on other The spacer is used to determine the domain to which the search field belongs, which is not limited in this application.
该M个搜索字段中,不同的搜索字段所属的域可以相同也可以不同。本实施例中仅以该M个搜索字段共属于L个不同的域为例进行说明。 Among the M search fields, different search fields may have the same or different domains. In this embodiment, only the M search fields belong to L different domains as an example for description.
403、确定该L个域的域值格式。403. Determine a domain value format of the L domains.
本申请发明人经过研究发现,用户的搜索行为在时间上具有局部性:若某时刻用户请求搜索某个搜索字段,则在之后的一段时间内,该搜索字段以及与该搜索字段相似的字段均有较大概率被再次搜索。为此,本实施例中搜索系统确定了确定该M个搜索字段所属的L个域后,根据该L个域中每个域所包括的搜索字段,确定每个域的域值格式。可以认为,符合该L个域的域值格式的字段在搜索系统的后续运行中有较大概率被搜索。The inventor of the present application has found through research that the user's search behavior is local in time: if a user requests to search for a certain search field at a certain time, the search field and the field similar to the search field are in a later period of time. There is a greater probability of being searched again. To this end, in the present embodiment, after determining the L domains to which the M search fields belong, the search system determines the domain value format of each domain according to the search fields included in each of the L domains. It can be considered that fields that conform to the field value format of the L domains have a higher probability of being searched in subsequent runs of the search system.
具体的,搜索系统可以将该L个域中每个域所包括的搜索字段所共同符合的格式,确定为每个域的域值格式。举例来说,若该L个域中存在域“IP”,该IP域包括两个搜索字段:“192.199.0.1”与“192.199.0.2”。则搜索系统将“192.199.0.*”确定为IP域的域值格式,其中*表示模糊匹配。Specifically, the search system may determine, in a format that the search fields included in each of the L domains are consistently matched, a domain value format of each domain. For example, if there are domains "IP" in the L domains, the IP domain includes two search fields: "192.199.0.1" and "192.199.0.2". The search system then determines "192.199.0.*" as the domain value format of the IP domain, where * indicates a fuzzy match.
404、确定包含有符合域值格式的字段的第一doc。404. Determine a first doc that includes a field that conforms to a field value format.
搜索系统确定了该L个域中每个域的域值格式后,在搜索系统所保存的doc中确定第一doc,其中第一doc为:包含符合该L个域中任一个域的域值格式的字段的doc。可以理解的,第一doc的个数可以为一个也可为多个。After the search system determines the domain value format of each domain in the L domains, the first doc is determined in the doc saved by the search system, where the first doc is: including the domain value corresponding to any one of the L domains. The doc of the formatted field. It can be understood that the number of the first docs may be one or more.
举例来说,搜索系统已确定IP域的域值格式为“192.199.0.*”。则由于doc 1:(“hostname:node 1,IP:192.199.0.1”)中包括字段“192.199.0.1”,且“192.199.0.1”符合域值格式“192.199.0.*”,故doc 1倍确定为第一doc。For example, the search system has determined that the domain value format of the IP domain is "192.199.0.*". Then, since doc 1: ("hostname: node 1, IP: 192.199.0.1") includes the field "192.199.0.1", and "192.199.0.1" conforms to the field value format "192.199.0.*", the doc is 1 times Determined as the first doc.
搜索系统确定第一doc的方法有很多。举例来说,搜索系统可以将该L个域的域值格式作为搜索字段,直接对搜索系统的默认索引进行搜索,即可得到第一doc。There are many ways in which the search system determines the first doc. For example, the search system may use the domain value format of the L domains as a search field, and directly search the default index of the search system to obtain the first doc.
405、生成第二doc。405. Generate a second doc.
搜索系统确定了第一doc后,根据第一doc中符合域值格式的字段生成对应的第二doc。第二doc中包括:对应的第一doc中符合域值格式的字段,以及该字段的域。After the search system determines the first doc, a corresponding second doc is generated according to the field in the first doc that conforms to the field value format. The second doc includes: a field corresponding to the field value format in the corresponding first doc, and a field of the field.
举例来说,搜索系统已确定IP域的域值格式为“192.199.0.*”,第一doc为:(“hostname:node 1,IP:192.199.0.1”),其中第一doc中包括字段“192.199.0.1”符合域值格式“192.199.0.*”,则搜索系统根据第一doc生成对应的第二doc为:(“IP:192.199.0.1”)。For example, the search system has determined that the domain value format of the IP domain is "192.199.0.*", and the first doc is: ("hostname: node 1, IP: 192.199.0.1"), where the first doc includes fields. "192.199.0.1" conforms to the field value format "192.199.0.*", and the search system generates a corresponding second doc according to the first doc: ("IP: 192.199.0.1").
由于第一doc的个数可以为一个也可为多个,故第二doc的个数也可以为一个或多个。但由于第二doc是根据搜索系统原有的doc中的部分doc(即第一doc)生成的,故第二doc的个数远远小于搜索系统原有的doc的个数;且每个第二doc中仅包括一个域和一个域值字段,其长度也小于大部分搜索系统原有的doc。Since the number of the first docs may be one or more, the number of the second docs may also be one or more. However, since the second doc is generated according to part of the doc (ie, the first doc) in the original doc of the search system, the number of the second doc is much smaller than the number of original docs of the search system; The second doc includes only one field and one field value field, and its length is also smaller than the original doc of most search systems.
406、为第二doc建立新建索引。406. Create a new index for the second doc.
搜索系统生成了第二doc后,为第二doc建立对应的新建索引。After the search system generates the second doc, a corresponding new index is created for the second doc.
由于第二doc的个数与长度均较小,故第二doc的新建索引的数据体量要远远小于搜索系统原有的doc的默认索引的数据体量。Since the number and length of the second doc are both small, the data volume of the new index of the second doc is much smaller than the data volume of the default index of the original doc of the search system.
本实施例提供的索引管理方法中,搜索系统接收用户下发的N个搜索请求;确定该N个搜索请求中的M个搜索字段所属的L个域;确定该L个域的域值格式;在搜索系统所保存的doc中确定包含有符合域值格式的字段的第一doc;根据第一doc生成第二doc;为第二doc建立新建索引。由于第二doc符合该L个域的域值格式,故在 搜索系统的后续运行中有较大概率被搜索。这样当搜索系统接收到用户新下发的搜索请求时,搜索请求中的搜索字段有较大概率能够命中新建索引。由于新建索引的数据体量要远远小于默认索引的数据体量,故查找新建索引与直接查找默认索引相比,能够大幅度节约搜索系统的工作量,提高搜索系统的搜索速度和效率,提升搜索系统的搜索性能。In the index management method provided in this embodiment, the search system receives N search requests sent by the user; determines L domains to which the M search fields belong to the N search requests; and determines a domain value format of the L domains; A first doc containing a field conforming to the field value format is determined in the doc saved by the search system; a second doc is generated according to the first doc; and a new index is established for the second doc. Since the second doc conforms to the domain value format of the L domains, There is a greater probability of being searched for in subsequent runs of the search system. Thus, when the search system receives the newly issued search request from the user, the search field in the search request has a high probability of hitting the new index. Since the data volume of the newly created index is much smaller than the data volume of the default index, searching for the new index can greatly save the workload of the search system and improve the search speed and efficiency of the search system, and improve the search index. Search system search performance.
可选的,在步骤406之后,搜索系统接收用户新下发的目标搜索请求,目标搜索请求中包括目标搜索字段,用于请求查找包含该目标搜索字段的doc。搜索系统查找与目标搜索字段相匹配的新建索引,若查找到与目标搜索字段相匹配的新建索引,则搜索系统获取查找到的新建索引对应的第二doc,并将获取的第二doc携带在响应消息中反馈给用户。若未查找到与目标搜索字段相匹配的新建索引,则搜索系统查找与目标搜索字段相匹配的默认索引。Optionally, after the step 406, the search system receives the target search request newly sent by the user, where the target search request includes a target search field for requesting to search for a doc including the target search field. The search system searches for a new index that matches the target search field. If a new index matching the target search field is found, the search system obtains the second doc corresponding to the newly created index, and carries the acquired second doc. The response message is fed back to the user. If a new index that matches the target search field is not found, the search system looks for a default index that matches the target search field.
可选的,当有新的doc导入搜索系统时,若该新的doc中包含符合步骤403中确定的域值格式的字段,则搜索系统生成该新的doc对应的第二doc。生成该新的doc对应的第二doc的具体方法与步骤405类似,此处不做赘述。在生成了新的doc对应的第二doc后,搜索系统为新的doc对应的第二doc建立新建索引,并为该新的doc生成默认索引。Optionally, when a new doc is imported into the search system, if the new doc includes a field that conforms to the field value format determined in step 403, the search system generates a second doc corresponding to the new doc. The specific method for generating the second doc corresponding to the new doc is similar to step 405, and is not described here. After generating the second doc corresponding to the new doc, the search system creates a new index for the second doc corresponding to the new doc, and generates a default index for the new doc.
可选的,搜索系统可以每隔预置周期,统计每个新建索引的搜索参数。其中,每个新建索引的搜索参数包括该新建索引对应的域在当前预置周期内的被搜索次数、被搜索频率、搜索命中率中的一项或多项。搜索系统删除搜索参数低于阈值的一个或多个新建索引,以实现新建索引中非热点索引的动态淘汰。Optionally, the search system may count the search parameters of each new index every preset period. The search parameter of each new index includes one or more of the searched times, the searched frequency, and the search hit ratio of the domain corresponding to the new index in the current preset period. The search system deletes one or more new indexes whose search parameters are below the threshold to implement dynamic retirement of non-hotspot indexes in the newly created index.
上面的实施例介绍了本申请提供的数据配置方法以及索引管理方法,下面将介绍用于实现上述方法的装置。The above embodiment describes the data configuration method and the index management method provided by the present application, and the apparatus for implementing the above method will be described below.
首先介绍用于实现图3所示的数据配置方法的数据配置装置,请参阅图5,该数据配置装置的基本结构包括:First, the data configuration device for implementing the data configuration method shown in FIG. 3 is introduced. Referring to FIG. 5, the basic structure of the data configuration device includes:
信息接收模块501,用于接收用户下发的多条搜索请求,该多条搜索请求中每条搜索请求均包括搜索字段,用于请求查找包含该搜索字段的数据文件。The information receiving module 501 is configured to receive a plurality of search requests sent by the user, where each of the plurality of search requests includes a search field for requesting to search for a data file that includes the search field.
域确定模块502,用于确定该多条搜索请求所包括的搜索字段中,每个搜索字段所属的域。The domain determining module 502 is configured to determine a domain to which each search field belongs in the search field included in the plurality of search requests.
热点确定模块503,用于在每个搜索字段所属的域中确定一个或多个热点域。The hotspot determining module 503 is configured to determine one or more hotspot domains in a domain to which each search field belongs.
配置修改模块504,用于将该一个或多个热点域添加到schema配置文件中,并根据添加了该一个或多个热点域的schema配置文件更新搜索系统中的数据文件。The configuration modification module 504 is configured to add the one or more hotspot domains to the schema configuration file, and update the data files in the search system according to the schema configuration file to which the one or more hotspot domains are added.
图5所示的数据配置装置的具体介绍可以参考图3所示的数据配置方法中的相关描述,此处不做赘述。For a detailed description of the data configuration apparatus shown in FIG. 5, reference may be made to the related description in the data configuration method shown in FIG. 3, and details are not described herein.
本实施例所提供的数据配置装置中,信息接收模块501接收用户下发的多条搜索请求,域确定模块502确定该多个搜索请求所包括的搜索字段中每个搜索字段所属的域,热点确定模块503在搜索字段所属的域中确定热点域,配置修改模块504将确定的热点域添加到schema配置文件中。这样搜索系统后续接收到属于热点域的搜索字段后,可以直接进行按域搜索。本实施例中,schema配置文件中的域不是由技术人员人为设定的域,而是由数据配置装置根据用户即时下发的搜索请求确定的热点域。由于 热点域是该N个搜索请求中搜索频率较高的域,故在后续时间内热点域有很大的概率能够被用户再次搜索。通过向schema配置文件中添加热点域,能够提高按域搜索的使用频率,进而充分发挥了按域搜索带来的速度和效率的提升,进一步提高了搜索系统的搜索性能。In the data configuration apparatus provided in this embodiment, the information receiving module 501 receives a plurality of search requests sent by the user, and the domain determining module 502 determines a domain to which each search field in the search field included in the plurality of search requests belongs. The determining module 503 determines the hotspot domain in the domain to which the search field belongs, and the configuration modification module 504 adds the determined hotspot domain to the schema configuration file. In this way, after the search system subsequently receives the search field belonging to the hotspot domain, the domain search can be directly performed. In this embodiment, the domain in the schema configuration file is not a domain set by a technician, but is a hotspot domain determined by the data configuration device according to a search request sent by the user immediately. Due to The hotspot domain is a domain with a high search frequency among the N search requests, so the hotspot domain has a high probability to be searched again by the user in the subsequent time. By adding a hotspot domain to the schema configuration file, the frequency of searching by domain can be increased, and the speed and efficiency of the domain search can be fully utilized to further improve the search performance of the search system.
可选的,域确定模块502具体用于:根据每条搜索请求,和/或搜索系统对每条搜索请求的响应消息,确定该多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Optionally, the domain determining module 502 is specifically configured to: according to each search request, and/or a response message of the search system for each search request, determine, by each search field included in the search field included in the multiple search requests. area.
可选的,域确定模块502具体用于:根据每条搜索请求,和/或搜索系统对每条搜索请求的响应消息中的间隔符,确定该多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Optionally, the domain determining module 502 is specifically configured to: determine, according to each search request, and/or a slot in the response message of the search request for each search request, each of the search fields included in the multiple search requests. The domain to which the search field belongs.
可选的,热点确定模块503具体用于:将该每个搜索字段所属的域中,包括搜索字段最多的前一个或多个域确定为热点域。Optionally, the hotspot determining module 503 is specifically configured to: determine, in the domain to which each search field belongs, the first one or more domains including the search field to be the hotspot domain.
可选的,数据配置装置还包括格式确定模块505,用于根据该一个或多个热点域中每个热点域所包括的搜索字段,确定每个热点域的域值格式。Optionally, the data configuration apparatus further includes a format determining module 505, configured to determine a domain value format of each hotspot domain according to a search field included in each hotspot domain in the one or more hotspot domains.
可选的,格式确定模块505具体用于:将该一个或多个热点域中每个热点域所包括的搜索字段所共同符合的格式,确定为该每个热点域的域值格式。Optionally, the format determining module 505 is specifically configured to determine, in a format that the search fields included in each hotspot domain in the one or more hotspot domains are consistently matched, a domain value format of each hotspot domain.
可选的,配置修改模块504还用于:每隔预置周期,统计schema配置文件所定义的域中每个域的搜索数据,该搜索数据包括schema配置文件所定义的域在当前预置周期内的被搜索次数、被搜索频率、搜索命中率中的一项或多项;在schema配置文件中删除搜索数据低于阈值的一个或多个域。Optionally, the configuration modification module 504 is further configured to: in each preset period, count the search data of each domain in the domain defined by the schema configuration file, where the search data includes a domain defined by the schema configuration file in the current preset period. One or more of the number of searches, the frequency of search, and the search hit rate; delete one or more domains whose search data is below the threshold in the schema configuration file.
下面介绍用于实现图4所示的索引管理方法的索引管理装置,请参阅图6,该索引管理装置的基本结构包括:The index management apparatus for implementing the index management method shown in FIG. 4 is described below. Referring to FIG. 6, the basic structure of the index management apparatus includes:
接收信息模块601,用于接收用户下发的多条搜索请求,该多条搜索请求中,每条搜索请求均包括搜索字段,用于请求查找包含该搜索字段的数据文件;The receiving information module 601 is configured to receive a plurality of search requests sent by the user, where each of the plurality of search requests includes a search field, configured to request to find a data file that includes the search field;
确定域模块602,用于确定该多条搜索请求所包括的搜索字段中,每个搜索字段所属的域;a determining domain module 602, configured to determine a domain to which each search field belongs in the search field included in the plurality of search requests;
确定格式模块603,用于根据每个搜索字段所属的域中每个域所包括的搜索字段,确定每个搜索字段所属的域中每个域的域值格式;a determining format module 603, configured to determine a domain value format of each domain in a domain to which each search field belongs according to a search field included in each domain in a domain to which each search field belongs;
文件确定模块604,用于将搜索系统中,包含符合确定格式模块603中确定的域值格式的字段的数据文件,确定为第一数据文件;a file determining module 604, configured to determine, in the search system, a data file that includes a field corresponding to the field value format determined in the determining format module 603, as the first data file;
文件生成模块605,用于生成每个第一数据文件对应的第二数据文件,每个第二数据文件中包括对应的第一数据文件所包含的符合域值格式的字段,以及该符合所述域值格式的字段所属的域;a file generating module 605, configured to generate a second data file corresponding to each first data file, where each second data file includes a field corresponding to the field value format included in the corresponding first data file, and the The domain to which the field in the field value format belongs;
索引管理模块606,用于生成每个第二数据文件对应的新建索引,该新建索引中包括对应的第二数据文件在搜索系统中的保存位置。The index management module 606 is configured to generate a new index corresponding to each second data file, where the new index includes a storage location of the corresponding second data file in the search system.
图6所示的索引管理装置的具体介绍可以参考图4所示的索引管理方法中的相关描述,此处不做赘述。For a detailed description of the index management apparatus shown in FIG. 6, reference may be made to the related description in the index management method shown in FIG. 4, and details are not described herein.
本实施例提供的索引管理装置中,接收信息模块601接收用户下发的多条搜索请求,确定域模块602确定该多个搜索请求所包括的搜索字段中每个搜索字段所属的域,确定格式模块603确定这些域的域值格式,文件确定模块604将搜索系统中,包含有 符合域值格式的字段的数据文件确定为第一数据文件,文件生成模块605生成第一数据文件对应的第二数据文件,索引管理模块606生成第二数据文件的新建索引。当搜索系统接收到用户新下发的搜索请求时,搜索请求中的搜索字段有较大概率能够命中新建索引。由于新建索引的数据体量要远远小于默认索引的数据体量,故查找新建索引与直接查找默认索引相比,能够大幅度节约搜索系统的工作量,提高搜索系统的搜索速度和效率,提升搜索系统的搜索性能。In the index management apparatus provided in this embodiment, the receiving information module 601 receives a plurality of search requests sent by the user, and the determining domain module 602 determines a domain to which each search field in the search field included in the plurality of search requests belongs, and determines a format. Module 603 determines the domain value format for these fields, and file determination module 604 will include in the search system The data file of the field conforming to the field value format is determined as the first data file, the file generation module 605 generates the second data file corresponding to the first data file, and the index management module 606 generates a new index of the second data file. When the search system receives the newly issued search request from the user, the search field in the search request has a high probability of hitting the new index. Since the data volume of the newly created index is much smaller than the data volume of the default index, searching for the new index can greatly save the workload of the search system and improve the search speed and efficiency of the search system, and improve the search index. Search system search performance.
可选的,确定域模块602具体用于:根据每条搜索请求,和/或搜索系统对每条搜索请求的响应消息,确定该多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Optionally, the determining domain module 602 is specifically configured to: according to each search request, and/or a response message of the search system for each search request, determine, by each search field included in the search field included in the multiple search requests. area.
可选的,确定域模块602具体用于:根据每条搜索请求,和/或搜索系统对每条搜索请求的响应消息中的间隔符,确定该多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Optionally, the determining domain module 602 is specifically configured to: determine, according to each search request, and/or a slot in the response message of the search request for each search request, each of the search fields included in the multiple search requests. The domain to which the search field belongs.
可选的,确定格式模块603具体用于:将每个搜索字段所属的域中每个域所包括的搜索字段所共同符合的格式,确定为每个搜索字段所属的域中每个域的域值格式。Optionally, the determining format module 603 is specifically configured to determine, according to a format that the search fields included in each domain in each domain in which the search field belongs, a domain of each domain in each domain in which the search field belongs. Value format.
可选的,接收信息模块601还用于:接收用户下发的目标搜索请求,该目标搜索请求中包括目标搜索字段,用于请求查找包含该目标搜索字段的数据文件。Optionally, the receiving information module 601 is further configured to: receive a target search request that is sent by the user, where the target search request includes a target search field, and is used to request to find a data file that includes the target search field.
索引管理装置还包括文件搜索模块607,用于:查找目标搜索字段对应的新建索引;若查找到该目标搜索字段对应的新建索引,则根据目标搜索字段对应的新建索引对应,获取包含该目标搜索字段的数据文件。The index management device further includes a file search module 607, configured to: find a new index corresponding to the target search field; if the new index corresponding to the target search field is found, obtain the target search according to the new index corresponding to the target search field. The data file for the field.
可选的,文件搜索模块607还用于:若未查找到目标搜索字段对应的新建索引,则查找该目标搜索字段对应的默认索引。Optionally, the file search module 607 is further configured to: if the new index corresponding to the target search field is not found, look for a default index corresponding to the target search field.
可选的,文件生成模块605还用于:当搜索系统中导入新的数据文件时,若该新的数据文件中包含确定符合格式模块603所确定的域值格式的字段,则生成该新的数据文件对应的第二数据文件,该新的数据文件对应的第二数据文件中包括:该新的数据文件中符合该域值格式的字段,以及该新的数据文件中符合该域值格式的字段所属的域。Optionally, the file generating module 605 is further configured to: when the new data file is imported in the search system, if the new data file includes a field that determines the format of the field value determined by the format module 603, the new data is generated. a second data file corresponding to the data file, where the second data file corresponding to the new data file includes: a field in the new data file that conforms to the format of the field value, and a format corresponding to the domain value in the new data file The domain to which the field belongs.
索引管理模块606还用于:生成新的数据文件对应的第二数据文件对应的新建索引,该新的数据文件对应的第二数据文件对应的新建索引中包括:该新的数据文件对应的第二数据文件在搜索系统中的保存位置。The index management module 606 is further configured to: generate a new index corresponding to the second data file corresponding to the new data file, where the new index corresponding to the second data file corresponding to the new data file includes: the corresponding corresponding to the new data file The location where the two data files are saved in the search system.
可选的,索引管理模块606还用于:隔预置周期,统计每个新建索引的搜索参数,其中,搜索参数包括每个新建索引在当前预置周期内的被搜索次数、被搜索频率、搜索命中率中的一项或多项;删除搜索参数低于阈值的一个或多个新建索引。Optionally, the index management module 606 is further configured to: collect, according to a preset period, a search parameter of each new index, where the search parameter includes the number of times of searching, the frequency to be searched, and the search frequency of each new index in the current preset period. Search for one or more of the hit ratios; delete one or more new indexes whose search parameters are below the threshold.
在一种实现方式中,图(5)和图(6)所示的实施例中的各模块可以是软件模块,且以程序代码的形式存储在图2所示的计算设备的存储器202中,并由处理器201调用执行。In an implementation manner, each module in the embodiments shown in FIG. 5 and FIG. 6 may be a software module, and is stored in the memory 202 of the computing device shown in FIG. 2 in the form of program code. And executed by the processor 201.
在另一种实现方式中,图(5)和图(6)所示的实施例中的各模块可以是硬件模块,例如可以为CPU、硬件芯片或CPU与硬件芯片的组合,作为图2所示的计算设备的处理器201执行本申请提供的方法。 In another implementation, the modules in the embodiments shown in FIG. 5 and FIG. 6 may be hardware modules, for example, may be a CPU, a hardware chip, or a combination of a CPU and a hardware chip, as shown in FIG. 2 . The processor 201 of the illustrated computing device performs the methods provided herein.
本申请还提供了一种计算机程序产品,该该计算机程序产品可以为一个软件安装包,该软件安装包被计算设备运行时,执行图3或图4所示的方法。The present application also provides a computer program product, which may be a software installation package that performs the method illustrated in FIG. 3 or FIG. 4 when executed by the computing device.
本申请说明书中所举例的doc、搜索请求、搜索字段和响应消息等实例,仅用于实例性的对本申请的技术方案进行介绍,并不对doc、搜索请求、搜索字段和响应消息的实际格式做任何限定。举例来说,本申请说明书中doc1为:(“hostname:node1,IP:192.199.0.1”)。在实际应用中,doc 1也可以为符合搜索系统设定的其他格式,例如域和域值之间除了可以用“:”间隔符做分隔之外,也可以使用空格间隔符或其它间隔符做分隔;不同域的数据之间除了可以用“,”间隔符做分隔之外,也可以使用“;”间隔符或其它间隔符做分隔。Examples of docs, search requests, search fields, and response messages exemplified in the specification of the present application are only used to introduce the technical solutions of the present application by way of example, and do not make actual formats of docs, search requests, search fields, and response messages. Any restrictions. For example, in the specification of the present application, doc1 is: ("hostname: node1, IP: 192.199.0.1"). In practical applications, doc 1 can also be in other formats that match the search system settings. For example, domain and domain values can be separated by ":" spacers, or space separators or other spacers can be used. Separate; data from different domains can be separated by a "," spacer, or by a ";" spacer or other separator.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。 The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the embodiments are modified, or the equivalents of the technical features are replaced by the equivalents of the technical solutions of the embodiments of the present invention.

Claims (32)

  1. 一种数据配置方法,适用于搜索系统,所述搜索系统包括多个数据文件,每个数据文件包括一个或多个域、以及所述一个或多个域对应的域值,其中,所述域用于表示对应的域值的类型,所述域值用于记录对应的域的具体取值,所述搜索系统还包括字段配置schema配置文件,所述schema配置文件用于定义所述搜索系统中的数据文件的域,其特征在于,所述方法包括:A data configuration method, applicable to a search system, the search system includes a plurality of data files, each data file including one or more domains, and domain values corresponding to the one or more domains, wherein the domain a type for indicating a corresponding domain value, where the domain value is used to record a specific value of the corresponding domain, the search system further includes a field configuration schema configuration file, where the schema configuration file is used to define the search system. The domain of the data file, characterized in that the method comprises:
    接收用户下发的多条搜索请求,所述多条搜索请求中,每条搜索请求均包括搜索字段,所述每条搜索请求用于请求查找包含所述搜索字段的数据文件;And receiving, by the user, a plurality of search requests, each of the plurality of search requests includes a search field, where each search request is used to request to find a data file that includes the search field;
    确定所述多条搜索请求所包括的搜索字段中,每个搜索字段所属的域;Determining, in the search field included in the plurality of search requests, a domain to which each search field belongs;
    在所述每个搜索字段所属的域中确定一个或多个热点域;Determining one or more hotspot domains in a domain to which each search field belongs;
    将所述一个或多个热点域添加到所述schema配置文件中,并根据添加了所述一个或多个热点域的schema配置文件更新所述搜索系统中的数据文件。Adding the one or more hotspot domains to the schema configuration file and updating data files in the search system according to a schema configuration file to which the one or more hotspot domains are added.
  2. 根据权利要求1所述的数据配置方法,其特征在于,所述确定所述多条搜索请求所包括的搜索字段中,每个搜索字段所属的域包括:The data configuration method according to claim 1, wherein the determining, by the search field included in the plurality of search requests, the domain to which each search field belongs includes:
    根据所述每条搜索请求,和/或所述搜索系统对所述每条搜索请求的响应消息,确定所述多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Determining, according to each of the search requests, and/or the response message of the search system for each of the search requests, a domain to which each of the search fields included in the plurality of search requests belongs.
  3. 根据权利要求2所述的数据配置方法,其特征在于,所述根据所述每条搜索请求,和/或所述搜索系统对所述每条搜索请求的响应消息,确定所述多条搜索请求所包括的搜索字段中每个搜索字段所属的域包括:The data configuration method according to claim 2, wherein said determining said plurality of search requests according to said each search request, and/or said search system responding to said each search request The fields to which each search field in the included search field belongs include:
    根据所述每条搜索请求,和/或所述搜索系统对所述每条搜索请求的响应消息中的间隔符,确定所述多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Determining, according to each of the search requests, and/or a spacer in the response message of each search request by the search system, a domain to which each search field included in the search field included in the plurality of search requests belongs .
  4. 根据权利要求1至3中任一项所述的数据配置方法,其特征在于,所述在所述每个搜索字段所属的域中确定一个或多个热点域包括:The data configuration method according to any one of claims 1 to 3, wherein the determining one or more hotspot domains in the domain to which each search field belongs includes:
    将所述每个搜索字段所属的域中,包括搜索字段最多的前一个或多个域确定为热点域。The domain in which each of the search fields belongs, including the previous one or more domains with the largest search field, is determined as the hotspot domain.
  5. 根据权利要求1至4中任一项所述的数据配置方法,其特征在于,所述在所述每个搜索字段所属的域中确定一个或多个热点域包括之后还包括:The data configuration method according to any one of claims 1 to 4, wherein the determining one or more hotspot domains in the domain to which each search field belongs includes
    根据所述一个或多个热点域中每个热点域所包括的搜索字段,确定每个热点域的域值格式。A field value format of each hotspot domain is determined according to a search field included in each of the one or more hotspot domains.
  6. 根据权利要求5所述的数据配置方法,其特征在于,所述根据所述一个或多个热点域中每个热点域所包括的搜索字段,确定所述每个热点域的域值格式包括:The data configuration method according to claim 5, wherein the determining the domain value format of each hotspot domain according to the search field included in each of the one or more hotspot domains includes:
    将所述一个或多个热点域中每个热点域所包括的搜索字段所共同符合的格式,确定为搜索每个热点域的域值格式。The format in which the search fields included in each of the one or more hotspot domains are commonly matched is determined to be a domain value format for searching each hotspot domain.
  7. 根据权利要求1至6中任一项所述的数据配置方法,其特征在于,所述方法还包括:The data configuration method according to any one of claims 1 to 6, wherein the method further comprises:
    每隔预置周期,统计所述schema配置文件所定义的域中每个域的搜索数据,所述搜索数据包括所述schema配置文件所定义的域在当前预置周期内的被搜索次数、被搜索频率、搜索命中率中的一项或多项; Searching, for each preset period, search data of each domain in the domain defined by the schema configuration file, where the search data includes the number of searches of the domain defined by the schema configuration file in the current preset period, One or more of search frequency, search hit rate;
    在所述schema配置文件中删除搜索数据低于阈值的一个或多个域。One or more domains whose search data is below a threshold are deleted in the schema configuration file.
  8. 一种索引管理方法,适用于搜索系统,所述搜索系统包括多个数据文件,每个数据文件包括一个或多个域、以及所述一个或多个域对应的域值,其中,所述域用于表示对应的域值的类型,所述域值用于记录对应的域的具体取值,所述搜索系统还包括所述多个数据文件对应的默认索引,每个默认索引中包括其对应的数据文件在所述搜索系统中的保存位置,所述方法包括:An index management method, applicable to a search system, the search system includes a plurality of data files, each data file including one or more domains, and domain values corresponding to the one or more domains, wherein the domain a type for indicating a corresponding domain value, where the domain value is used to record a specific value of the corresponding domain, the search system further includes a default index corresponding to the multiple data files, and each default index includes a corresponding The location of the data file in the search system, the method comprising:
    接收用户下发的多条搜索请求,所述多条搜索请求中,每条搜索请求均包括搜索字段,所述每条搜索请求用于请求查找包含所述搜索字段的数据文件;And receiving, by the user, a plurality of search requests, each of the plurality of search requests includes a search field, where each search request is used to request to find a data file that includes the search field;
    确定所述多条搜索请求所包括的搜索字段中,每个搜索字段所属的域;Determining, in the search field included in the plurality of search requests, a domain to which each search field belongs;
    根据所述每个搜索字段所属的域中每个域所包括的搜索字段,确定所述每个搜索字段所属的域中每个域的域值格式;Determining a domain value format of each domain in the domain to which each search field belongs according to a search field included in each domain in the domain to which each search field belongs;
    将所述搜索系统中,包含符合所述域值格式的字段的数据文件,确定为第一数据文件;Determining, in the search system, a data file including a field conforming to the field value format as a first data file;
    生成每个所述第一数据文件对应的第二数据文件,每个所述第二数据文件中包括对应的第一数据文件所包含的符合所述域值格式的字段,以及所述符合所述域值格式的字段所属的域;Generating, according to the second data file corresponding to each of the first data files, each of the second data files includes a field corresponding to the domain value format included in the corresponding first data file, and the The domain to which the field in the field value format belongs;
    生成每个所述第二数据文件对应的新建索引,所述新建索引中包括对应的第二数据文件在所述搜索系统中的保存位置。Generating a new index corresponding to each of the second data files, where the new index includes a storage location of the corresponding second data file in the search system.
  9. 根据权利要求8所述的索引管理方法,其特征在于,所述确定所述多条搜索请求所包括的搜索字段中,每个搜索字段所属的域包括:The index management method according to claim 8, wherein the determining, by the search field included in the plurality of search requests, the domain to which each search field belongs includes:
    根据所述每条搜索请求,和/或所述搜索系统对所述每条搜索请求的响应消息,确定所述多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Determining, according to each of the search requests, and/or the response message of the search system for each of the search requests, a domain to which each of the search fields included in the plurality of search requests belongs.
  10. 根据权利要求9所述的索引管理方法,其特征在于,所述根据所述每条搜索请求,和/或所述搜索系统对所述每条搜索请求的响应消息,确定所述多条搜索请求所包括的搜索字段中每个搜索字段所属的域包括:The index management method according to claim 9, wherein said determining said plurality of search requests according to said each search request, and/or said search system responding to said each search request The fields to which each search field in the included search field belongs include:
    根据所述每条搜索请求,和/或所述搜索系统对所述每条搜索请求的响应消息中的间隔符,确定所述多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Determining, according to each of the search requests, and/or a spacer in the response message of each search request by the search system, a domain to which each search field included in the search field included in the plurality of search requests belongs .
  11. 根据权利要求8至10中任一项所述的索引管理方法,其特征在于,所述根据所述每个搜索字段所属的域中每个域所包括的搜索字段,确定所述每个搜索字段所属的域中每个域的域值格式包括:The index management method according to any one of claims 8 to 10, wherein the determining each of the search fields according to a search field included in each domain in the domain to which each search field belongs The domain value format for each domain in the domain to which it belongs includes:
    将所述每个搜索字段所属的域中每个域所包括的搜索字段所共同符合的格式,确定为所述每个搜索字段所属的域中每个域的域值格式。And determining, in a format that the search fields included in each domain in each domain in the domain to which the search field belongs, a domain value format of each domain in the domain to which each search field belongs.
  12. 根据权利要求8至11中任一项所述的索引管理方法,其特征在于,所述方法还包括:The index management method according to any one of claims 8 to 11, wherein the method further comprises:
    接收用户下发的目标搜索请求,所述目标搜索请求中包括目标搜索字段,所述目标搜索请求用于请求查找包含所述目标搜索字段的数据文件;Receiving a target search request issued by the user, where the target search request includes a target search field, where the target search request is used to request to find a data file that includes the target search field;
    查找所述目标搜索字段对应的新建索引;Finding a new index corresponding to the target search field;
    若查找到所述目标搜索字段对应的新建索引,则根据所述目标搜索字段对应的新建索引对应,获取包含所述目标搜索字段的数据文件。 If the new index corresponding to the target search field is found, the data file including the target search field is obtained according to the new index corresponding to the target search field.
  13. 根据权利要求12所述的索引管理方法,其特征在于,所述方法还包括:The index management method according to claim 12, wherein the method further comprises:
    若未查找到所述目标搜索字段对应的新建索引,则查找所述目标搜索字段对应的默认索引。If the new index corresponding to the target search field is not found, the default index corresponding to the target search field is searched for.
  14. 根据权利8至13中任一项所述的索引管理方法,其特征在于,所述方法还包括:The index management method according to any one of claims 8 to 13, wherein the method further comprises:
    当所述搜索系统中导入新的数据文件时,若所述新的数据文件中包含符合所述域值格式的字段,则生成所述新的数据文件对应的第二数据文件,所述新的数据文件对应的第二数据文件中包括:所述新的数据文件中符合所述域值格式的字段,以及所述新的数据文件中符合所述域值格式的字段所属的域;When a new data file is imported in the search system, if the new data file includes a field that conforms to the field value format, a second data file corresponding to the new data file is generated, the new data file The second data file corresponding to the data file includes: a field in the new data file that conforms to the format of the field value, and a field in the new data file that belongs to a field that conforms to the field value format;
    生成所述新的数据文件对应的第二数据文件对应的新建索引,所述新的数据文件对应的第二数据文件对应的新建索引中包括:所述新的数据文件对应的第二数据文件在所述搜索系统中的保存位置。Generating a new index corresponding to the second data file corresponding to the new data file, where the new index corresponding to the second data file corresponding to the new data file includes: the second data file corresponding to the new data file is The save location in the search system.
  15. 根据权利8至14中任一项所述的索引管理方法,其特征在于,所述方法还包括:The index management method according to any one of claims 8 to 14, wherein the method further comprises:
    每隔预置周期,统计每个所述新建索引的搜索参数,所述搜索参数包括所述新建索引在当前预置周期内的被搜索次数、被搜索频率、搜索命中率中的一项或多项;Searching, for each preset period, a search parameter of each of the newly created indexes, where the search parameter includes one or more of the searched number, the searched frequency, and the search hit ratio of the new index in the current preset period. item;
    删除搜索参数低于阈值的一个或多个新建索引。Delete one or more new indexes whose search parameters are below the threshold.
  16. 一种数据配置装置,适用于搜索系统,所述搜索系统包括多个数据文件,每个数据文件包括一个或多个域、以及所述一个或多个域对应的域值,其中,所述域用于表示对应的域值的类型,所述域值用于记录对应的域的具体取值,所述搜索系统还包括字段配置schema配置文件,所述schema配置文件用于定义所述搜索系统中的数据文件的域,其特征在于,所述数据配置装置包括:A data configuration apparatus, applicable to a search system, the search system includes a plurality of data files, each data file including one or more domains, and domain values corresponding to the one or more domains, wherein the domain a type for indicating a corresponding domain value, where the domain value is used to record a specific value of the corresponding domain, the search system further includes a field configuration schema configuration file, where the schema configuration file is used to define the search system. The domain of the data file, wherein the data configuration device comprises:
    信息接收模块,用于接收用户下发的多条搜索请求,所述多条搜索请求中,每条搜索请求均包括搜索字段,所述每条搜索请求用于请求查找包含所述搜索字段的数据文件;The information receiving module is configured to receive a plurality of search requests sent by the user, where each of the plurality of search requests includes a search field, where each search request is used to request to search for data including the search field. file;
    域确定模块,用于确定所述多条搜索请求所包括的搜索字段中,每个搜索字段所属的域;a domain determining module, configured to determine a domain to which each search field belongs in the search field included in the plurality of search requests;
    热点确定模块,用于在所述每个搜索字段所属的域中确定一个或多个热点域;a hotspot determining module, configured to determine one or more hotspot domains in a domain to which each search field belongs;
    配置修改模块,用于将所述一个或多个热点域添加到所述schema配置文件中,并根据添加了所述一个或多个热点域的schema配置文件更新所述搜索系统中的数据文件。And a configuration modification module, configured to add the one or more hotspot domains to the schema configuration file, and update a data file in the search system according to a schema configuration file to which the one or more hotspot domains are added.
  17. 根据权利要求16所述的数据配置装置,其特征在于,所述域确定模块具体用于:The data configuration apparatus according to claim 16, wherein the domain determining module is specifically configured to:
    根据所述每条搜索请求,和/或所述搜索系统对所述每条搜索请求的响应消息,确定所述多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Determining, according to each of the search requests, and/or the response message of the search system for each of the search requests, a domain to which each of the search fields included in the plurality of search requests belongs.
  18. 根据权利要求17所述的数据配置装置,其特征在于,所述域确定模块具体用于:The data configuration apparatus according to claim 17, wherein the domain determining module is specifically configured to:
    根据所述每条搜索请求,和/或所述搜索系统对所述每条搜索请求的响应消息中的间隔符,确定所述多条搜索请求所包括的搜索字段中每个搜索字段所属的域。 Determining, according to each of the search requests, and/or a spacer in the response message of each search request by the search system, a domain to which each search field included in the search field included in the plurality of search requests belongs .
  19. 根据权利要求16至18中任一项所述的数据配置装置,其特征在于,所述热点确定模块具体用于:The data configuration apparatus according to any one of claims 16 to 18, wherein the hotspot determining module is specifically configured to:
    将所述每个搜索字段所属的域中,包括搜索字段最多的前一个或多个域确定为热点域。The domain in which each of the search fields belongs, including the previous one or more domains with the largest search field, is determined as the hotspot domain.
  20. 根据权利要求16至19中任一项所述的数据配置装置,其特征在于,所述装置还包括:The data configuration apparatus according to any one of claims 16 to 19, wherein the apparatus further comprises:
    格式确定模块,用于根据所述一个或多个热点域中每个热点域所包括的搜索字段,确定每个热点域的域值格式。The format determining module is configured to determine a domain value format of each hotspot domain according to a search field included in each hotspot domain in the one or more hotspot domains.
  21. 根据权利要求20所述的数据配置装置,其特征在于,所述格式确定模块具体用于:The data configuration apparatus according to claim 20, wherein the format determining module is specifically configured to:
    将所述一个或多个热点域中每个热点域所包括的搜索字段所共同符合的格式,确定为所述每个热点域的域值格式。A format in which the search fields included in each of the one or more hotspot domains are commonly matched is determined as a domain value format of each of the hotspot domains.
  22. 根据权利要求16至21中任一项所述的数据配置装置,其特征在于,所述配置修改模块还用于:The data configuration apparatus according to any one of claims 16 to 21, wherein the configuration modification module is further configured to:
    每隔预置周期,统计所述schema配置文件所定义的域中每个域的搜索数据,所述搜索数据包括所述schema配置文件所定义的域在当前预置周期内的被搜索次数、被搜索频率、搜索命中率中的一项或多项;Searching, for each preset period, search data of each domain in the domain defined by the schema configuration file, where the search data includes the number of searches of the domain defined by the schema configuration file in the current preset period, One or more of search frequency, search hit rate;
    在所述schema配置文件中删除搜索数据低于阈值的一个或多个域。One or more domains whose search data is below a threshold are deleted in the schema configuration file.
  23. 一种索引管理装置,适用于搜索系统,所述搜索系统包括多个数据文件,每个数据文件包括一个或多个域、以及所述一个或多个域对应的域值,其中,所述域用于表示对应的域值的类型,所述域值用于记录对应的域的具体取值,所述搜索系统还包括所述多个数据文件对应的默认索引,每个默认索引中包括其对应的数据文件在所述搜索系统中的保存位置,所述索引管理装置包括:An index management apparatus, applicable to a search system, the search system includes a plurality of data files, each data file including one or more domains, and domain values corresponding to the one or more domains, wherein the domain a type for indicating a corresponding domain value, where the domain value is used to record a specific value of the corresponding domain, the search system further includes a default index corresponding to the multiple data files, and each default index includes a corresponding The location of the data file in the search system, the index management device includes:
    接收信息模块,用于接收用户下发的多条搜索请求,所述多条搜索请求中,每条搜索请求均包括搜索字段,所述每条搜索请求用于请求查找包含所述搜索字段的数据文件;The receiving information module is configured to receive a plurality of search requests sent by the user, where each of the plurality of search requests includes a search field, where each search request is used to request to search for data including the search field. file;
    确定域模块,用于确定所述多条搜索请求所包括的搜索字段中,每个搜索字段所属的域;Determining a domain module, configured to determine a domain to which each search field belongs in the search field included in the plurality of search requests;
    确定格式模块,用于根据所述每个搜索字段所属的域中每个域所包括的搜索字段,确定所述每个搜索字段所属的域中每个域的域值格式;a determining format module, configured to determine a domain value format of each domain in the domain to which each search field belongs according to a search field included in each domain in the domain to which each search field belongs;
    文件确定模块,用于将所述搜索系统中,包含符合所述域值格式的字段的数据文件,确定为第一数据文件;a file determining module, configured to determine, in the search system, a data file that includes a field that conforms to the field value format, as a first data file;
    文件生成模块,用于生成每个所述第一数据文件对应的第二数据文件,每个所述第二数据文件中包括对应的第一数据文件所包含的符合所述域值格式的字段,以及所述符合所述域值格式的字段所属的域;a file generating module, configured to generate a second data file corresponding to each of the first data files, where each of the second data files includes a field corresponding to the domain value format included in the corresponding first data file, And the domain to which the field conforming to the domain value format belongs;
    索引管理模块,用于生成每个所述第二数据文件对应的新建索引,所述新建索引中包括对应的第二数据文件在所述搜索系统中的保存位置。An index management module is configured to generate a new index corresponding to each of the second data files, where the new index includes a storage location of the corresponding second data file in the search system.
  24. 根据权利要求23所述的索引管理装置,其特征在于,所述确定域模块具体用于: The index management device according to claim 23, wherein the determining domain module is specifically configured to:
    根据所述每条搜索请求,和/或所述搜索系统对所述每条搜索请求的响应消息,确定所述多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Determining, according to each of the search requests, and/or the response message of the search system for each of the search requests, a domain to which each of the search fields included in the plurality of search requests belongs.
  25. 根据权利要求24所述的索引管理装置,其特征在于,所述确定域模块具体用于:The index management device according to claim 24, wherein the determining domain module is specifically configured to:
    根据所述每条搜索请求,和/或所述搜索系统对所述每条搜索请求的响应消息中的间隔符,确定所述多条搜索请求所包括的搜索字段中每个搜索字段所属的域。Determining, according to each of the search requests, and/or a spacer in the response message of each search request by the search system, a domain to which each search field included in the search field included in the plurality of search requests belongs .
  26. 根据权利要求23至25中任一项所述的索引管理装置,其特征在于,所述确定格式模块具体用于:The index management apparatus according to any one of claims 23 to 25, wherein the determining format module is specifically configured to:
    将所述每个搜索字段所属的域中每个域所包括的搜索字段所共同符合的格式,确定为所述每个搜索字段所属的域中每个域的域值格式。And determining, in a format that the search fields included in each domain in each domain in the domain to which the search field belongs, a domain value format of each domain in the domain to which each search field belongs.
  27. 根据权利要求23至26中任一项所述的索引管理装置,其特征在于:The index management device according to any one of claims 23 to 26, wherein:
    所述接收信息模块,还用于接收用户下发的目标搜索请求,所述目标搜索请求中包括目标搜索字段,所述目标搜索请求用于请求查找包含所述目标搜索字段的数据文件;The receiving information module is further configured to receive a target search request that is sent by a user, where the target search request includes a target search field, where the target search request is used to request to search for a data file that includes the target search field;
    所述索引管理装置还包括文件搜索模块,用于:The index management apparatus further includes a file search module for:
    查找所述目标搜索字段对应的新建索引;Finding a new index corresponding to the target search field;
    若查找到所述目标搜索字段对应的新建索引,则根据所述目标搜索字段对应的新建索引对应,获取包含所述目标搜索字段的数据文件。If the new index corresponding to the target search field is found, the data file including the target search field is obtained according to the new index corresponding to the target search field.
  28. 根据权利要求27所述的索引管理装置,其特征在于,所述文件搜索模块还用于:The index management device according to claim 27, wherein the file search module is further configured to:
    若未查找到所述目标搜索字段对应的新建索引,则查找所述目标搜索字段对应的默认索引。If the new index corresponding to the target search field is not found, the default index corresponding to the target search field is searched for.
  29. 根据权利23至28中任一项所述的索引管理装置,其特征在于,所述文件生成模块还用于:The index management apparatus according to any one of claims 23 to 28, wherein the file generating module is further configured to:
    当所述搜索系统中导入新的数据文件时,若所述新的数据文件中包含符合所述域值格式的字段,则生成所述新的数据文件对应的第二数据文件,所述新的数据文件对应的第二数据文件中包括:所述新的数据文件中符合所述域值格式的字段,以及所述新的数据文件中符合所述域值格式的字段所属的域;When a new data file is imported in the search system, if the new data file includes a field that conforms to the field value format, a second data file corresponding to the new data file is generated, the new data file The second data file corresponding to the data file includes: a field in the new data file that conforms to the format of the field value, and a field in the new data file that belongs to a field that conforms to the field value format;
    所述索引管理模块还用于:生成所述新的数据文件对应的第二数据文件对应的新建索引,所述新的数据文件对应的第二数据文件对应的新建索引中包括:所述新的数据文件对应的第二数据文件在所述搜索系统中的保存位置。The index management module is further configured to: generate a new index corresponding to the second data file corresponding to the new data file, where the new index corresponding to the second data file corresponding to the new data file includes: the new The storage location of the second data file corresponding to the data file in the search system.
  30. 根据权利23至29中任一项所述的索引管理装置,其特征在于,所述索引管理模块还用于:The index management apparatus according to any one of claims 23 to 29, wherein the index management module is further configured to:
    每隔预置周期,统计每个所述新建索引的搜索参数,所述搜索参数包括所述新建索引在当前预置周期内的被搜索次数、被搜索频率、搜索命中率中的一项或多项;Searching, for each preset period, a search parameter of each of the newly created indexes, where the search parameter includes one or more of the searched number, the searched frequency, and the search hit ratio of the new index in the current preset period. item;
    删除搜索参数低于阈值的一个或多个新建索引。Delete one or more new indexes whose search parameters are below the threshold.
  31. 一种计算设备,包括处理器、存储器,其特征在于,通过调用存储器中存储的程序代码,所述处理器用于执行如权利要求1至7中任一项所述的数据配置方法。A computing device, comprising a processor, a memory, wherein the processor is configured to execute the data configuration method according to any one of claims 1 to 7 by calling program code stored in a memory.
  32. 一种计算设备,包括处理器、存储器,其特征在于,通过调用存储器中存储的程序代码,所述处理器用于执行如权利要求8至15中任一项所述的索引管理方法。 A computing device, comprising a processor, a memory, wherein the processor is configured to execute the index management method according to any one of claims 8 to 15 by calling program code stored in a memory.
PCT/CN2017/107343 2016-10-24 2017-10-23 Data configuration method, index management method, related apparatus and computing device WO2018077138A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610939364.5A CN107977381B (en) 2016-10-24 2016-10-24 Data configuration method, index management method, related device and computing equipment
CN201610939364.5 2016-10-24

Publications (1)

Publication Number Publication Date
WO2018077138A1 true WO2018077138A1 (en) 2018-05-03

Family

ID=62004877

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/107343 WO2018077138A1 (en) 2016-10-24 2017-10-23 Data configuration method, index management method, related apparatus and computing device

Country Status (2)

Country Link
CN (1) CN107977381B (en)
WO (1) WO2018077138A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829880A (en) * 2018-06-27 2018-11-16 烽火通信科技股份有限公司 A kind of method of the configuration management of optical network terminal

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231356A (en) * 2020-10-20 2021-01-15 中国建设银行股份有限公司 Data processing method and device, electronic equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102217413A (en) * 2009-06-10 2011-10-12 华为技术有限公司 Method adapting to mobile search interface, search server and system thereof
US20130290319A1 (en) * 2012-04-27 2013-10-31 Eric Glover Performing application searches
CN104361005A (en) * 2014-10-11 2015-02-18 北京中搜网络技术股份有限公司 Scheduling method for information units in vertical search engine
CN104823169A (en) * 2012-10-12 2015-08-05 A9.com股份有限公司 Index configuration for searchable data in network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6898592B2 (en) * 2000-12-27 2005-05-24 Microsoft Corporation Scoping queries in a search engine
US8886628B1 (en) * 2009-03-12 2014-11-11 Akeakamai, Inc. Management of multilevel metadata in the PORTAL-DOORS system with bootstrapping
CN102317917B (en) * 2011-06-30 2013-09-11 华为技术有限公司 Hot field virtual machine cpu dispatching method and virtual machine system (vms)
US10776375B2 (en) * 2013-07-15 2020-09-15 Microsoft Technology Licensing, Llc Retrieval of attribute values based upon identified entities

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102217413A (en) * 2009-06-10 2011-10-12 华为技术有限公司 Method adapting to mobile search interface, search server and system thereof
US20130290319A1 (en) * 2012-04-27 2013-10-31 Eric Glover Performing application searches
CN104823169A (en) * 2012-10-12 2015-08-05 A9.com股份有限公司 Index configuration for searchable data in network
CN104361005A (en) * 2014-10-11 2015-02-18 北京中搜网络技术股份有限公司 Scheduling method for information units in vertical search engine

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829880A (en) * 2018-06-27 2018-11-16 烽火通信科技股份有限公司 A kind of method of the configuration management of optical network terminal

Also Published As

Publication number Publication date
CN107977381A (en) 2018-05-01
CN107977381B (en) 2021-08-27

Similar Documents

Publication Publication Date Title
US20220335214A1 (en) Device Identifier Dependent Operation Processing of Packet Based Data Communication
CN105138592B (en) A kind of daily record data storage and search method based on distributed structure/architecture
EP2579167A1 (en) Method for active information push and server therefor
US20120197928A1 (en) Real time searching and reporting
US10915500B2 (en) Method and system for historical call lookup in distributed file systems
WO2013091346A1 (en) Webpage content preloading method, device and system
CN111611225A (en) Data storage management method, query method, device, electronic equipment and medium
CN107147748B (en) File uploading method and device
CN109756528B (en) Frequency control method and device, equipment, storage medium and server
CN107103011B (en) Method and device for realizing terminal data search
WO2017174013A1 (en) Data storage management method and apparatus, and data storage system
US11841893B2 (en) Coordination of parallel processing of audio queries across multiple devices
CN111159219B (en) Data management method, device, server and storage medium
WO2015024476A1 (en) A method, server, and computer program product for managing ip address attributions
CN112527504A (en) Multi-tenant resource quota management method and device, and computer equipment
WO2018077138A1 (en) Data configuration method, index management method, related apparatus and computing device
WO2017000592A1 (en) Data processing method, apparatus and system
CN106156258B (en) Method, device and system for counting data in distributed storage system
CN113312355A (en) Data management method and device
US11599673B2 (en) Ascertaining network devices used with anonymous identifiers
CN112035413B (en) Metadata information query method, device and storage medium
US20180046656A1 (en) Constructing filterable hierarchy based on multidimensional key
CN112148925B (en) User identification association query method, device, equipment and readable storage medium
CN112817980A (en) Data index processing method, device, equipment and storage medium
US20140372361A1 (en) Apparatus and method for providing subscriber big data information in cloud computing environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17865303

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17865303

Country of ref document: EP

Kind code of ref document: A1