WO2019137365A1 - Method and device for creating index and performing search in cloud search platform - Google Patents

Method and device for creating index and performing search in cloud search platform Download PDF

Info

Publication number
WO2019137365A1
WO2019137365A1 PCT/CN2019/070820 CN2019070820W WO2019137365A1 WO 2019137365 A1 WO2019137365 A1 WO 2019137365A1 CN 2019070820 W CN2019070820 W CN 2019070820W WO 2019137365 A1 WO2019137365 A1 WO 2019137365A1
Authority
WO
WIPO (PCT)
Prior art keywords
tenant
search
document
platform
dictionary
Prior art date
Application number
PCT/CN2019/070820
Other languages
French (fr)
Chinese (zh)
Inventor
葛俊
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2019137365A1 publication Critical patent/WO2019137365A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Definitions

  • the present invention relates to the field of cloud search platforms, and more particularly to a method and apparatus for constructing an index and performing a search in a cloud search platform.
  • the search function is sold externally.
  • Each user who purchases the service is called a tenant.
  • the number of tenants is on the 10,000 level. Different tenants have different dictionary customization requirements, and different tenants may require indexing based on different dictionaries.
  • the traditional multi-tenant cloud search service is implemented by managing each tenant's search as a single instance, for example, 100 tenants and 100 instances.
  • the schema (field definition table) of each instance has the same structure, but has its own corresponding word segment dictionary and configuration. Therefore, there is a need for a more efficient solution for building indexes and searching in a cloud search platform.
  • the embodiments of the present specification aim to provide a more efficient solution for building an index and performing a search in a cloud search platform to solve the deficiencies in the prior art.
  • an aspect of the present specification provides a method for constructing an index in a cloud search platform, the cloud search platform including a search instance for a plurality of tenants, the search instance including being applicable to the plurality of tenants
  • a uniform field definition table of each tenant the search instance is assigned a tenant ID for each tenant, the tenant ID is used to uniquely identify the corresponding tenant, and the field definition table includes a description of the tenant identification field.
  • the tenant identification field is associated with the tenant identifier
  • the method is performed by the cloud search platform and includes the following steps: acquiring a tenant document, where the content of the tenant document includes a tenant identification field row, and the tenant identification field is displayed And obtaining a tenant identifier of the tenant to which the tenant document belongs; obtaining a tenant dictionary of the tenant by using the tenant identifier; and performing segmentation according to the tenant dictionary to obtain a word segmentation document corresponding to the tenant document; And in the search instance, the rent is based on the field definition table and the word segmentation document Indexing documents, including, according to the description of the tenant identification field of the field definition table, indexing the tenant relationship with the tenant identification field document.
  • the cloud search platform further includes a dictionary unit, the dictionary unit is separated from the search instance, and the dictionary unit includes the plurality of a tenant's respective tenant dictionary, the method further comprising, after obtaining the tenant document, transmitting the tenant document to the dictionary unit.
  • the tenant document is segmented according to the tenant dictionary, so that the segmented document corresponding to the tenant document is obtained: in the dictionary In the unit, the tenant document is segmented according to the tenant dictionary to generate a word segmentation document corresponding to the tenant document; and the word segmentation document is received from the dictionary unit.
  • the method for constructing an index in the cloud search platform further includes: after segmenting the tenant document according to the tenant dictionary, thereby acquiring a word segmentation document corresponding to the tenant document, The tenant document and its corresponding word segmentation document are sent to the search instance.
  • the cloud search platform further includes a unified service interface, the service interface is connected to the tenant platform of the plurality of tenants, and wherein the The tenant document includes: receiving, by the service interface, the tenant original document from the tenant platform, acquiring the tenant identifier according to the tenant platform, and adding the tenant identification field row to the content of the tenant original document, thereby obtaining The tenant document.
  • the cloud search platform further includes a unified service interface, the service interface is connected to the tenant platform of the plurality of tenants, and offline The method, and the method further includes: receiving, by the service interface, a tenant original document from the tenant platform before acquiring the tenant document, and acquiring the tenant identifier according to the tenant platform, in the tenant original document The tenant identification field row is added to the content, thereby generating the tenant document, and storing the tenant document in the cloud search platform.
  • the cloud search platform further includes a unified service interface, the service interface is connected to the tenant platform of the plurality of tenants, and the method The method further includes: before acquiring the tenant document, receiving a tenant dictionary from the tenant platform through the service interface, acquiring the tenant identifier according to the tenant platform, and storing the tenant dictionary in association with the tenant identifier In the dictionary unit.
  • the cloud search platform including a search instance for a plurality of tenants, the search instance including being applicable to each of the plurality of tenants a unified field definition table, where the search instance is assigned a tenant identifier for each tenant, the tenant identifier is used to uniquely identify the corresponding tenant, and the field definition table includes a description of the tenant identification field, and the tenant identification field
  • the device is implemented by the cloud search platform and includes the following unit: a first obtaining unit configured to acquire a tenant document, where the content of the tenant document includes a tenant identification field row, the tenant The identifier field row shows the tenant identifier of the tenant to which the tenant document belongs; the second obtaining unit is configured to acquire the tenant dictionary of the tenant by using the tenant identifier; and the word segmentation unit is configured to: according to the tenant dictionary The tenant document performs word segmentation to obtain a word segmentation document corresponding to the tenant document; and establishing a unit
  • Another aspect of the present specification provides a method of performing a search in a cloud search platform, the cloud search platform including a search instance for a plurality of tenants, the search instance including a unified field definition applicable to the plurality of tenants
  • the search instance is configured with a tenant identifier for each of the plurality of tenants, the tenant identifier is used to uniquely identify the corresponding tenant, and the field definition table includes a description of the tenant identification field, the tenant identification field and the tenant Identifying the association
  • the cloud search platform further includes a unified service interface, the service interface being connected to the tenant platform of the plurality of tenants, the method being executed by the cloud search platform and comprising the steps of: receiving a search from the tenant platform Obtaining a tenant identifier of the tenant from the tenant platform; acquiring a tenant dictionary of the tenant by using the tenant identifier; and performing word segmentation according to the tenant dictionary to obtain a word segment corresponding to the search sentence a statement; searching the tenant identification field and the segmented statement in the search instance to The tenant
  • the cloud search platform further includes a dictionary unit, the dictionary unit is separated from the search instance, and the dictionary unit includes the plurality of The tenant's respective tenant dictionary, the method further comprising, after obtaining the tenant's tenant identification according to the tenant platform, transmitting the search statement and the tenant identification to the dictionary unit.
  • the word segmentation is performed according to the tenant dictionary, so that the segmented sentence corresponding to the search sentence is obtained, including: in the dictionary In the unit, the search sentence is segmented by the tenant dictionary to generate a word segmentation statement; and the word segmentation statement is received from the dictionary unit.
  • the method for searching in the cloud search platform further includes: after the word segmentation is performed according to the tenant dictionary, to obtain the word segmentation statement corresponding to the search sentence, The word segmentation statement and the tenant identification are sent to the search instance.
  • the cloud search platform including a search instance for a plurality of tenants, the search instance including being applicable to each of the plurality of tenants a unified field definition table, where the search instance is assigned a tenant identifier for each tenant, the tenant identifier is used to uniquely identify the corresponding tenant, and the field definition table includes a description of the tenant identification field, and the tenant identification field
  • the cloud search platform further includes a unified service interface, the service interface is connected to the tenant platform of the plurality of tenants, and the device is implemented by the cloud search platform and includes the following unit: a receiving unit, configured to receive a search statement from the tenant platform; the first obtaining unit is configured to obtain a tenant identifier of the tenant from the tenant platform; and the second obtaining unit is configured to obtain the location by using the tenant identifier a tenant dictionary of a tenant; a word segmentation unit configured to segment the search sentence according to the tenant dictionary, thereby Taking
  • a plurality of tenants are processed in a unified logic in a single search instance, and the tenant dictionary is independently external to the search instance.
  • the service simplifies the complexity of the entire architecture, reduces development costs, and reduces the storage space required for the entire architecture.
  • the solution can realize the requirement of multi-tenant custom dictionary, and can increase the linear growth of the number of tenants by increasing the computing resources such as servers, that is, the system is "linearly scalable".
  • FIG. 1 is a schematic view showing an application scenario of an embodiment of the present specification
  • FIG. 2 illustrates a method of building an index in a cloud search platform in accordance with one embodiment of the present specification
  • FIG. 3 illustrates an apparatus 300 for building an index in a cloud search platform in accordance with an embodiment of the present specification
  • FIG. 4 illustrates a method of performing a search in a cloud search platform in accordance with an embodiment of the present specification
  • FIG. 5 illustrates an apparatus 500 for performing a search in a cloud search platform in accordance with an embodiment of the present specification.
  • FIG. 1 schematically shows an application scenario of an embodiment of the present specification.
  • the application scenarios of the embodiments of the present specification include a cloud search platform 101 and tenant platforms 102, 103, 104, and the like of a plurality of tenants.
  • the cloud search platform 101 includes a unified service interface for connecting to various tenant platforms, such as a restful service interface.
  • the cloud search platform 101 can receive a tenant's document, a tenant's custom dictionary, etc. from the tenant platform 103 through a service interface, thereby building an index from the tenant's document and a custom dictionary.
  • the tenant platform 103 searches on the tenant platform 103 using the search engine, the tenant platform 103 sends its user's search statement to the cloud search platform 101 through the service interface.
  • the cloud search platform includes a search instance in which a plurality of tenants are included, the search instances including a unified field definition schema applicable to each of the plurality of tenants. That is, multiple tenants share a single search instance, using the same field definition table.
  • the search example here is a stand-alone application in the cloud search platform for implementing the search function, and the search instance and the search instance are logically isolated from each other.
  • the field definition table can for example be in the form of a schema.xml configuration file containing all the fields that the document may contain and all the information about how these fields will be processed when creating the document index and query.
  • the document of the original data (ie, raw doc) is generated in the following format:
  • Table 1 schematically shows a field definition table for a search instance.
  • Table 1 lists the fields included in the document processed in the search instance: title (title after word segmentation), content (body), cat_id (category id), user_id (tenant id), origin_title (original title). Table 1 also records information about how these fields will be processed when creating document indexes and queries. For example, in the title line, the "inverted, positive row" in the column of the engine field indicates that the inverted index and the positive index will be created for the title, and that the index is displayed in the column if the word segmentation is required. Spaces are used to segment the title. For another example, in the content line, the "summary field" in the column of the engine field indicates that the body is displayed in the form of a digest when the query result is displayed.
  • the field definition table in Table 1 is only exemplary, but is used to illustrate the field definition table, and the field definition table in the actual application can contain more fields and can contain different field definitions.
  • the search instance reads the above field definition table when building an index for each tenant's document and builds an index based on the description in the table. In addition, when returning search results based on a tenant search request, the search instance also displays the fields according to the description in the field definition table.
  • the tenant ID can also be used to distinguish tenant's documents, tenant's dictionary, tenant platform, and so on. For example, by associating a tenant's dictionary with a tenant ID, when using a tenant dictionary, the tenant dictionary can be obtained through the tenant ID.
  • the field definition tables used by multiple tenants in a search instance are consistent, but the dictionaries of different tenants are different.
  • some sports articles require the use of sports proprietary lexicons
  • some The medical class requires the use of a medical dictionary.
  • the search instance classifies the tenant's original document according to the tenant's respective dictionary when building the tenant's index for building the index.
  • the tenant-related word segmentation logic is implemented outside the search instance, so that the search instance only performs general storage, indexing, retrieval, and the like, without complicated tenant dictionary customization.
  • the cloud search platform 101 further includes a dictionary unit 12, a service proxy unit 13, and a storage unit 14.
  • the dictionary unit 12 is separated from the search instance 11, and the dictionary unit includes respective tenant dictionaries of the plurality of tenants, which can provide word segmentation services to individual tenants in addition to the search instances.
  • the dictionary unit 12 uses the tenant's tenant dictionary to segment the tenant's document.
  • the service agent unit 13 is configured to proxy the service of the cloud search platform 101 and provide an http service upward, which is respectively connected with the tenant platform (102, 103, 104, etc.), the search instance 11, the dictionary unit 12, the storage unit 14, etc., to Transfer data between them and perform appropriate data preprocessing.
  • the service agent unit 13 receives the original document of the tenant from the tenant platform 103, adds the tenant identification to the original document, obtains the tenant document, and stores the tenant document in the storage unit 14.
  • the service agent unit 13 acquires the tenant document from the storage unit 14, transmits the document to the dictionary unit 12 for word segmentation, and receives the word segmentation document from the dictionary unit 12.
  • the business agent unit 13 then sends the tenant document and the word segmented document to the search instance 11, which searches the tenant document according to the field definition table and the tenant's word segmentation document, wherein the index includes the field index of the tenant ID.
  • the configuration of the cloud search platform 101 shown in FIG. 1 is only one embodiment of the present specification, and does not limit the embodiments of the present specification.
  • the dictionary word segmentation function is set inside the search instance.
  • the tenant document is only sent to the search instance, and the tenant document is segmented by the search instance according to the tenant dictionary.
  • the search instance calls the service of the external dictionary unit directly (ie, not through the service agent unit), that is, directly obtains the word segmentation document and the tenant document from the dictionary unit for construction. index.
  • FIG. 2 illustrates a method of building an index in a cloud search platform in accordance with one embodiment of the present specification.
  • a tenant document is acquired.
  • the content of the tenant document includes a tenant identification field row, and the tenant identification field row indicates a tenant identifier of a tenant to which the tenant document belongs.
  • the cloud search platform includes a unified service interface, and the service interface is connected to the tenant platforms of the plurality of tenants.
  • the tenant original document is received from the tenant platform through the service interface.
  • the tenant original document is received from the tenant platform through the service interface, and the tenant identifier “xxx” is obtained according to the tenant platform, and a field row is added to the content of the tenant original document.
  • the tenant ID xxx" line, thereby generating the tenant document and storing the tenant document in the cloud search platform.
  • the tenant document is obtained from the storage unit of the cloud search platform.
  • a service agent unit is included in the cloud search platform for proxying business logic in the platform.
  • the unified service interface is set on the service agent unit, and the service interface is connected to tenant platforms of multiple tenants.
  • the service agent unit receives the tenant original document from the tenant platform through the service interface, and obtains the tenant identifier “xxx” according to the tenant platform, and the content of the original document in the tenant.
  • the service agent unit acquires the tenant document from the storage unit 14.
  • the tenant dictionary of the tenant is obtained by the tenant identifier.
  • the cloud search platform further receives a tenant dictionary from the tenant platform through the service interface, acquires a tenant identifier according to the tenant platform, and stores the tenant dictionary in the dictionary unit in association with the tenant identifier.
  • the tenant dictionary associated with the tenant identification can be proposed from the platform by the tenant identification, and the tenant dictionary can be obtained.
  • step S23 the tenant document is segmented according to the tenant dictionary, thereby acquiring a word segmentation document corresponding to the tenant document.
  • the title of the original document is “Beijing Group Buying Website Long”
  • the tenant dictionary of tenant 001 includes the terms “group purchase” and “website”
  • the result of the word segmentation according to the tenant dictionary of tenant 001 is “Beijing group purchase website”. long”.
  • the tenant dictionary of the tenant 002 includes the terms "Beijing Group Buying Network” and "Webmaster”
  • the result of the word segmentation according to the tenant dictionary of the tenant 002 is "Beijing Group Buying Website Long”.
  • spaces are used as the separation between the word segments, which is merely exemplary, and may be separated by other characters or expressed in other forms, for example, by structuring each word segmentation.
  • the tenant document may be further segmented using the default dictionary in the dictionary unit.
  • the terms in the tenant dictionary are used preferentially.
  • a dictionary unit is further included in the cloud search platform, and the dictionary unit is an application on the platform separate from the search instance, and is used to provide a word segmentation service.
  • the service agent unit receives the tenant dictionary from the tenant platform through the service interface, acquires the tenant ID according to the tenant platform, and stores the tenant dictionary in the dictionary unit in association with the tenant identification. For example, you can set up a startup configuration file in the tenant dictionary.
  • One option of the configuration file is key: dict_path, for example, user_id_001:/home/admin/local_dict_1.txt, so that the tenant dictionary is associated with the tenant ID through the configuration file. stand up.
  • the word segmentation service of the dictionary unit can be called by the business agent unit.
  • the business agent unit sends the tenant document to the dictionary unit after acquiring the tenant document.
  • the tenant document is segmented according to the tenant dictionary to generate a word segmentation document.
  • the dictionary unit transmits the tenant document and the segmented document to the service agent unit.
  • step S24 in the search instance, the tenant document is indexed according to the field definition table and the word segmentation document, including: according to the description of the tenant identification field in the field definition table Establishing an index relationship between the tenant identification field and the tenant document.
  • the field definition table in the search instance is a table as shown in Table 1, which shows the inverted index and the positive row for the title, title id (cat_id), tenant ID (user_id) of the word segmentation. index.
  • the indexer in the search instance generates an index relationship table of the tenant ID and the tenant document, including the inverted list and the positive list.
  • the service agent unit 13 the service agent unit
  • the tenant document and the word segmented document are sent to the search instance 11.
  • the content of the word segmentation can be separated by a pre-agreed separator (for example, a space).
  • the word segmentation document can be further processed in the business agent unit. Modify the document to be segmented with the convention separator.
  • the search instance performs word segmentation according to the completed word segmentation field after receiving the word segmentation document, for example, word segmentation according to spaces, without additional word segmentation processing.
  • FIG. 3 illustrates an apparatus 300 for building an index in a cloud search platform in accordance with an embodiment of the present specification.
  • the cloud search platform includes a search instance for a plurality of tenants, the search instance including a unified field definition table applicable to the plurality of tenants, the search instance each having a tenant identifier assigned to each of the tenants, The tenant identifier is used to uniquely identify the corresponding tenant, and the field definition table includes a description of the tenant identification field, and the tenant identification field is associated with the tenant identifier.
  • the device 300 for constructing an index in the cloud search platform is implemented by the cloud search platform and includes the following unit: a first obtaining unit 31 configured to acquire a tenant document, where the content of the tenant document includes a tenant An identifier field row, the tenant identification field row showing a tenant identifier of the tenant to which the tenant document belongs; the second obtaining unit 32 is configured to obtain a tenant dictionary by using the tenant identifier, and the word segmentation unit 33 is configured to a tenant dictionary segmenting the tenant document to obtain a word segmentation document corresponding to the tenant document; and an establishing unit 34 configured to, according to the field definition table, the word segmentation document The tenant document is indexed, and the index relationship between the tenant identification field and the tenant document is established according to the description of the tenant identification field in the field definition table.
  • the cloud search platform further includes a dictionary unit, the dictionary unit is separated from the search instance, and the dictionary unit includes respective tenant dictionaries of the plurality of tenants, the cloud search
  • the apparatus 300 for constructing an index in the platform further includes a first sending unit configured to send the tenant document to the dictionary unit after acquiring the tenant document.
  • the segmentation of the tenant document according to the tenant dictionary includes: in the dictionary unit, the tenant document according to the tenant dictionary Word segmentation is performed to generate a word segmentation document corresponding to the tenant document; and the word segmentation document is received from the dictionary unit.
  • the apparatus 300 for constructing an index in the cloud search platform further includes a second sending unit configured to perform segmentation on the tenant document according to the tenant dictionary, thereby acquiring the corresponding corresponding to the tenant document After the word segmentation document, the tenant document and its corresponding word segmentation document are sent to the search instance.
  • the cloud search platform further includes a unified service interface, the service interface is connected to the tenant platform of the plurality of tenants, and
  • the obtaining the tenant document includes: receiving, by the service interface, the tenant original document from the tenant platform, obtaining the tenant identifier according to the tenant platform, and adding the tenant identification field row to the content of the tenant original document. Thereby obtaining the tenant document.
  • the apparatus 300 for constructing an index in the cloud search platform is implemented offline, and the apparatus 300 further includes: a first storage unit configured to pass through the service interface before acquiring the tenant document
  • the tenant platform receives the tenant original document, obtains the tenant identifier according to the tenant platform, adds the tenant identification field row to the content of the tenant original document, thereby generating the tenant document, and the tenant document Stored in the cloud search platform.
  • the apparatus 300 for constructing an index in the cloud search platform further includes a second storage unit configured to receive a tenant dictionary from the tenant platform through the service interface before acquiring the tenant document, according to the The tenant platform obtains the tenant identification and stores the tenant dictionary in the dictionary unit in association with the tenant identification.
  • FIG. 4 illustrates a method of performing a search in a cloud search platform in accordance with an embodiment of the present specification.
  • the cloud search platform includes a search instance for a plurality of tenants, the search instance including a unified field definition table applicable to the plurality of tenants, the search instance each having a tenant identifier assigned to each of the ten tenants, The tenant identifier is used to uniquely identify the corresponding tenant, the field definition table includes a description of the tenant identification field, the tenant identification field is associated with the tenant identifier, and the cloud search platform includes a unified service interface, The service interface is connected to the tenant platform of the plurality of tenants.
  • a search sentence is received from the tenant platform.
  • the cloud search platform receives the tenant's search request from the tenant platform through the unified service interface described above.
  • the search request is the search phrase "Beijing Group Buy”.
  • the cloud search platform 101 includes a service proxy unit 13 that includes the service interface to connect with the tenant platform.
  • the service agent unit 13 receives the tenant's search request from the tenant platform through the unified service interface described above.
  • the tenant's tenant identification is obtained from the tenant platform.
  • each tenant platform may include a tenant identification parameter corresponding to the tenant platform in the request string it sends.
  • the tenant's tenant identification "xxx" is obtained from the tenant platform via the business agent layer 13.
  • step S43 the tenant dictionary of the tenant is obtained by the tenant identifier.
  • the tenant dictionary is stored in the cloud search platform in association with the tenant identification, so that the tenant dictionary associated with the tenant identification can be proposed from the platform through the tenant identification, and the tenant dictionary can be obtained.
  • cloud search platform 101 also includes a dictionary unit 12.
  • the business agent unit 13 transmits the search sentence and the tenant identification to the dictionary unit 12 after acquiring the tenant's tenant identification "xxx" from the tenant platform.
  • the dictionary unit 12 proposes a tenant dictionary associated with the tenant identification from itself through the tenant identification, and acquires the tenant dictionary.
  • step S44 the search sentence is segmented according to the tenant dictionary, thereby acquiring a word segmentation sentence corresponding to the search sentence.
  • the tenant dictionary of the tenant 001 includes the terms “group purchase” and "website”, and the sentence is segmented according to the tenant dictionary of the tenant 001.
  • the result is "Beijing group purchase.”
  • the cloud search platform receives the search sentence “Beijing Group Purchase” from the tenant platform of the tenant 002
  • the tenant dictionary of the tenant 002 includes the term “Beijing Group Purchase Network”, and the sentence is segmented according to the tenant dictionary of the tenant 002.
  • the result is still "Beijing group purchase.”
  • the search sentence is segmented by the tenant dictionary to generate a word segmentation statement.
  • the dictionary unit 12 sends the word segmentation statement and the tenant identification back to the service agent unit 13.
  • the service agent unit 13 transmits the word segmentation statement and the tenant identification to the search instance 11.
  • step S45 the tenant identification field and the segmented sentence are retrieved in the search instance to retrieve the word segmentation statement in the tenant's tenant document.
  • the result of the dictionary segmentation of the title is “Beijing Group Buying Website Long”, that is, the index entry corresponding to the document of tenant 001 includes “Beijing”, “Group Purchase”, “Website” and “Long”.
  • the word segmentation sentence that is segmented by the tenant dictionary of the tenant 001 is "Beijing Group Purchase”.
  • the "Beijing Group Purchase" search sentence is received from the tenant platform of the tenant 002, the word segmentation sentence that is segmented by the tenant dictionary of the tenant 002 is still "Beijing Group Purchase”.
  • Beijing Group Buy is not associated with the document titled “Beijing Group Buying Website Long” in the index, so that in this case, the search sentence "Beijing Group Buy” is searched by the search instance, and the title will not be returned.
  • step S46 the tenant platform is located according to the tenant identification.
  • the cloud search platform is connected to the tenant platforms of multiple tenants through a unified service interface, so that when the result is returned, the platform can locate the tenant platform before sending the search request through the tenant identification.
  • the service agent unit 13 locates the tenant platform that previously sent the search request by the tenant identification received from the search instance 11.
  • step S47 the search result is returned to the tenant platform according to the field definition table.
  • the "engine field” column in the table defines the content line as "summary field", that is, when the search result is returned, the content is displayed in the summary field.
  • cat_id category id
  • origin_title original title
  • the service agent unit 13 returns the search result displayed according to the field definition table received from the search instance 11 to the tenant platform after the tenant platform is located.
  • FIG. 5 illustrates an apparatus 500 for performing a search in a cloud search platform in accordance with an embodiment of the present specification.
  • the cloud search platform includes a search instance for a plurality of tenants, the search instance including a unified field definition table applicable to each of the plurality of tenants, the search instance each having a tenant assigned to a tenant And the tenant identifier is used to uniquely identify the corresponding tenant, the field definition table includes a description of the tenant identification field, the tenant identification field is associated with the tenant identifier, and the cloud search platform further includes a unified service interface.
  • the service interface is connected to the tenant platform of the plurality of tenants.
  • the device 500 for searching in the cloud search platform is implemented by the cloud search platform and includes the following unit: a first receiving unit 51 configured to receive a search sentence from the tenant platform;
  • the obtaining unit 52 is configured to acquire the tenant identifier of the tenant from the tenant platform.
  • the second obtaining unit 53 is configured to acquire the tenant dictionary of the tenant by using the tenant identifier, and the word segmentation unit 54 is configured to The tenant dictionary performs segmentation on the search sentence to obtain a word segmentation statement corresponding to the search sentence; the retrieval unit 55 is configured to retrieve the tenant identification field and the segmented sentence in the search instance Retrieving the word segmentation statement in the tenant's tenant document; the positioning unit 56 is configured to locate the tenant platform according to the tenant identifier; and the returning unit 57 is configured to be defined according to the field The table returns the search results to the tenant platform.
  • the cloud search platform further includes a dictionary unit, the dictionary unit is separated from the search instance, and the dictionary unit includes respective tenant dictionaries of the plurality of tenants, the cloud search
  • the device 500 for searching in the platform further includes a first sending unit, configured to send the search statement and the tenant identifier to the dictionary unit after acquiring the tenant identifier of the tenant according to the tenant platform.
  • the search term is segmented according to the tenant dictionary, so as to obtain the word segmentation statement corresponding to the search sentence, including:
  • the search sentence is segmented by the tenant dictionary to generate a word segmentation sentence; and the word segmentation sentence is received from the dictionary unit.
  • the device 500 for searching in the cloud search platform further includes: a second sending unit, configured to: after receiving the tenant identifier from the dictionary unit, the word segmentation statement and the The tenant identification is sent to the search instance.
  • the embodiment of the present specification further includes a computer readable storage medium having stored thereon an instruction code, when executed in a computer, causing a computer to execute indexing and performing in a cloud search platform according to an embodiment of the present specification.
  • the method of searching is a computer readable storage medium having stored thereon an instruction code, when executed in a computer, causing a computer to execute indexing and performing in a cloud search platform according to an embodiment of the present specification. The method of searching.
  • a plurality of tenants are processed in a unified logic in a single search instance, and the tenant dictionary is independently external to the search instance.
  • the service simplifies the complexity of the entire architecture, reduces development costs, and reduces the storage space required for the entire architecture.
  • the solution can realize the requirement of multi-tenant custom dictionary, and can increase the linear growth of the number of tenants by increasing the computing resources such as servers, that is, the system is "linearly scalable".
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein may be implemented in hardware, in a software module in a processor orbit, or in a combination of the two.
  • the software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed are a method and device for creating an index and performing a search in a cloud search platform, where the cloud search platform comprises a search instance for use by multiple subscribers, and the search instance assigns subscriber identifiers respectively to the multiple subscribers. The method for creating an index comprises the following steps: acquiring a subscriber file, the content of the subscriber file comprising a subscriber identifier field row, and the subscriber identifier field row showing a subscriber identifier of a subscriber to whom the subscriber file belongs; acquiring a subscriber dictionary of the subscriber via the subscriber identifier; performing word segmentation with respect to the subscriber file on the basis of the subscriber dictionary so as to acquire a word-segmented file corresponding to the subscriber file; and, in the search instance, creating an index with respect to the subscriber file on the basis of the field definition table and of the word-segmented file, comprising creating an index relation between the subscriber identifier field and the subscriber file on the basis of the description in the field definition table with respect to the subscriber identifier field.

Description

一种在云搜索平台中构建索引和进行搜索的方法和装置Method and device for constructing index and searching in cloud search platform 技术领域Technical field
本发明涉及云搜索平台领域,更具体地,涉及一种在云搜索平台中构建索引和进行搜索的方法和装置。The present invention relates to the field of cloud search platforms, and more particularly to a method and apparatus for constructing an index and performing a search in a cloud search platform.
背景技术Background technique
在云搜索平台中,对外售卖搜索功能。其中将每个购买服务的用户称为租户。租户的数量在万级别。不同租户有不同的词典定制需求,不同的租户可能要求根据不同的词典构建索引。传统的多租户云搜索服务的实现方案是:将每个租户的搜索作为一个单独的实例来管理,比如100个租户就100个实例。每个实例的schema(字段定义表)结构相同,但有其各自对应的分词词典以及配置。因此,需要一种在云搜索平台中构建索引和进行搜索的更有效的方案。In the cloud search platform, the search function is sold externally. Each user who purchases the service is called a tenant. The number of tenants is on the 10,000 level. Different tenants have different dictionary customization requirements, and different tenants may require indexing based on different dictionaries. The traditional multi-tenant cloud search service is implemented by managing each tenant's search as a single instance, for example, 100 tenants and 100 instances. The schema (field definition table) of each instance has the same structure, but has its own corresponding word segment dictionary and configuration. Therefore, there is a need for a more efficient solution for building indexes and searching in a cloud search platform.
发明内容Summary of the invention
本说明书实施例旨在提供一种在云搜索平台中构建索引和进行搜索的更有效的方案,以解决现有技术中的不足。The embodiments of the present specification aim to provide a more efficient solution for building an index and performing a search in a cloud search platform to solve the deficiencies in the prior art.
为实现上述目的,本说明书一个方面提供一种在云搜索平台中构建索引的方法,所述云搜索平台包括用于多个租户的搜索实例,所述搜索实例包括适用于所述多个租户中每个租户的统一的字段定义表,所述搜索实例为多个租户各自分配有租户标识,所述租户标识用于唯一标识对应租户,所述字段定义表中包括对租户标识字段的描述,所述租户标识字段与所述租户标识关联,所述方法由所述云搜索平台执行并包括以下步骤:获取租户文档,所述租户文档的内容中包括租户标识字段行,所述租户标识字段行示出所述租户文档所属租户的租户标识;通过所述租户标识获取所述租户的租户词典;根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档;以及在所述搜索实例中,根据所述字段定义表和所述已分词文档对所述租户文档建立索引,其中包括,根据所述字段定义表中的对所述租户标识字段的描述,建立所述租户标识字段与所述租户文档的索引关系。To achieve the above object, an aspect of the present specification provides a method for constructing an index in a cloud search platform, the cloud search platform including a search instance for a plurality of tenants, the search instance including being applicable to the plurality of tenants A uniform field definition table of each tenant, the search instance is assigned a tenant ID for each tenant, the tenant ID is used to uniquely identify the corresponding tenant, and the field definition table includes a description of the tenant identification field. The tenant identification field is associated with the tenant identifier, and the method is performed by the cloud search platform and includes the following steps: acquiring a tenant document, where the content of the tenant document includes a tenant identification field row, and the tenant identification field is displayed And obtaining a tenant identifier of the tenant to which the tenant document belongs; obtaining a tenant dictionary of the tenant by using the tenant identifier; and performing segmentation according to the tenant dictionary to obtain a word segmentation document corresponding to the tenant document; And in the search instance, the rent is based on the field definition table and the word segmentation document Indexing documents, including, according to the description of the tenant identification field of the field definition table, indexing the tenant relationship with the tenant identification field document.
在一个实施例中,在上述在云搜索平台中构建索引的方法中,所述云搜索平台还包括词典单元,所述词典单元与所述搜索实例分离,并且所述词典单元中包括所述多个租 户的各自的租户词典,所述方法还包括,在获取租户文档之后,将所述租户文档发送给所述词典单元。In one embodiment, in the above method for constructing an index in a cloud search platform, the cloud search platform further includes a dictionary unit, the dictionary unit is separated from the search instance, and the dictionary unit includes the plurality of a tenant's respective tenant dictionary, the method further comprising, after obtaining the tenant document, transmitting the tenant document to the dictionary unit.
在一个实施例中,在上述在云搜索平台中构建索引的方法中,根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档包括:在所述词典单元中,根据所述租户词典对所述租户文档进行分词,以生成与所述租户文档对应的已分词文档;以及从所述词典单元接收所述已分词文档。In an embodiment, in the method for constructing an index in the cloud search platform, the tenant document is segmented according to the tenant dictionary, so that the segmented document corresponding to the tenant document is obtained: in the dictionary In the unit, the tenant document is segmented according to the tenant dictionary to generate a word segmentation document corresponding to the tenant document; and the word segmentation document is received from the dictionary unit.
在一个实施例中,上述在云搜索平台中构建索引的方法还包括,在根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档之后,将所述租户文档及其对应的所述已分词文档发送给所述搜索实例。In an embodiment, the method for constructing an index in the cloud search platform further includes: after segmenting the tenant document according to the tenant dictionary, thereby acquiring a word segmentation document corresponding to the tenant document, The tenant document and its corresponding word segmentation document are sent to the search instance.
在一个实施例中,在上述在云搜索平台中构建索引的方法中,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,以及,其中获取租户文档包括,通过所述服务接口从所述租户平台接收租户原始文档,根据所述租户平台获取所述租户标识,以及在所述租户原始文档的内容中增加所述租户标识字段行,从而获取所述租户文档。In an embodiment, in the foregoing method for building an index in a cloud search platform, the cloud search platform further includes a unified service interface, the service interface is connected to the tenant platform of the plurality of tenants, and wherein the The tenant document includes: receiving, by the service interface, the tenant original document from the tenant platform, acquiring the tenant identifier according to the tenant platform, and adding the tenant identification field row to the content of the tenant original document, thereby obtaining The tenant document.
在一个实施例中,在上述在云搜索平台中构建索引的方法中,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,以及,离线进行所述方法,并且所述方法还包括,在获取租户文档之前,通过所述服务接口从所述租户平台接收租户原始文档,并根据所述租户平台获取所述租户标识,在所述租户原始文档的内容中增加所述租户标识字段行,从而生成所述租户文档,并将所述租户文档存储在所述云搜索平台中。In an embodiment, in the foregoing method for building an index in a cloud search platform, the cloud search platform further includes a unified service interface, the service interface is connected to the tenant platform of the plurality of tenants, and offline The method, and the method further includes: receiving, by the service interface, a tenant original document from the tenant platform before acquiring the tenant document, and acquiring the tenant identifier according to the tenant platform, in the tenant original document The tenant identification field row is added to the content, thereby generating the tenant document, and storing the tenant document in the cloud search platform.
在一个实施例中,在上述在云搜索平台中构建索引的方法中,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,并且所述方法还包括:在获取租户文档之前,通过所述服务接口从所述租户平台接收租户词典,根据所述租户平台获取所述租户标识,并将所述租户词典与所述租户标识关联地存储在所述词典单元中。In an embodiment, in the method for constructing an index in the cloud search platform, the cloud search platform further includes a unified service interface, the service interface is connected to the tenant platform of the plurality of tenants, and the method The method further includes: before acquiring the tenant document, receiving a tenant dictionary from the tenant platform through the service interface, acquiring the tenant identifier according to the tenant platform, and storing the tenant dictionary in association with the tenant identifier In the dictionary unit.
本说明书另一方面提供一种在云搜索平台中构建索引的装置,所述云搜索平台包括用于多个租户的搜索实例,所述搜索实例包括适用于所述多个租户中每个租户的统一的字段定义表,所述搜索实例为多个租户各自分配有租户标识,所述租户标识用于唯一标识对应租户,所述字段定义表中包括对租户标识字段的描述,所述租户标识字段与所述 租户标识关联,所述装置由所述云搜索平台实施并包括以下单元:第一获取单元,配置为,获取租户文档,所述租户文档的内容中包括租户标识字段行,所述租户标识字段行示出所述租户文档所属租户的租户标识;第二获取单元,配置为,通过所述租户标识获取所述租户的租户词典;分词单元,配置为,根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档;以及建立单元,配置为,在所述搜索实例中,根据所述字段定义表和所述已分词文档对所述租户文档建立索引,其中包括,根据所述字段定义表中的对所述租户标识字段的描述,建立所述租户标识字段与所述租户文档的索引关系。Another aspect of the present specification provides an apparatus for constructing an index in a cloud search platform, the cloud search platform including a search instance for a plurality of tenants, the search instance including being applicable to each of the plurality of tenants a unified field definition table, where the search instance is assigned a tenant identifier for each tenant, the tenant identifier is used to uniquely identify the corresponding tenant, and the field definition table includes a description of the tenant identification field, and the tenant identification field In association with the tenant identifier, the device is implemented by the cloud search platform and includes the following unit: a first obtaining unit configured to acquire a tenant document, where the content of the tenant document includes a tenant identification field row, the tenant The identifier field row shows the tenant identifier of the tenant to which the tenant document belongs; the second obtaining unit is configured to acquire the tenant dictionary of the tenant by using the tenant identifier; and the word segmentation unit is configured to: according to the tenant dictionary The tenant document performs word segmentation to obtain a word segmentation document corresponding to the tenant document; and establishing a unit, configuring In the search example, the tenant document is indexed according to the field definition table and the word segmentation document, including: establishing, according to the description of the tenant identification field in the field definition table, The index relationship between the tenant identification field and the tenant document.
本说明书另一方面提供一种在云搜索平台中进行搜索的方法,所述云搜索平台包括用于多个租户的搜索实例,所述搜索实例包括适用于所述多个租户的统一的字段定义表,所述搜索实例为多个租户各自分配有租户标识,所述租户标识用于唯一标识对应租户,所述字段定义表中包括对租户标识字段的描述,所述租户标识字段与所述租户标识关联,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,所述方法由所述云搜索平台执行并包括以下步骤:从租户平台接收搜索语句;从所述租户平台获取租户的租户标识;通过所述租户标识获取所述租户的租户词典;根据所述租户词典对所述搜索语句进行分词,从而获取与所述搜索语句对应的已分词语句;在所述搜索实例中对所述租户标识字段和所述已分词语句进行检索,以在所述租户的租户文档中对所述已分词语句进行检索;根据所述租户标识定位所述租户平台;以及根据所述字段定义表向所述租户平台返回检索结果。Another aspect of the present specification provides a method of performing a search in a cloud search platform, the cloud search platform including a search instance for a plurality of tenants, the search instance including a unified field definition applicable to the plurality of tenants The search instance is configured with a tenant identifier for each of the plurality of tenants, the tenant identifier is used to uniquely identify the corresponding tenant, and the field definition table includes a description of the tenant identification field, the tenant identification field and the tenant Identifying the association, the cloud search platform further includes a unified service interface, the service interface being connected to the tenant platform of the plurality of tenants, the method being executed by the cloud search platform and comprising the steps of: receiving a search from the tenant platform Obtaining a tenant identifier of the tenant from the tenant platform; acquiring a tenant dictionary of the tenant by using the tenant identifier; and performing word segmentation according to the tenant dictionary to obtain a word segment corresponding to the search sentence a statement; searching the tenant identification field and the segmented statement in the search instance to The tenant tenant document retrieval word have the sentence; according to the tenant identification positioning the tenant platform; and returns a search result to the tenant platform according to the field definition table.
在一个实施例中,在上述在云搜索平台中进行搜索的方法中,所述云搜索平台还包括词典单元,所述词典单元与所述搜索实例分离,并且所述词典单元中包括所述多个租户的各自的租户词典,所述方法还包括,在根据所述租户平台获取租户的租户标识之后,将所述搜索语句和租户标识发送给所述词典单元。In one embodiment, in the above method for searching in a cloud search platform, the cloud search platform further includes a dictionary unit, the dictionary unit is separated from the search instance, and the dictionary unit includes the plurality of The tenant's respective tenant dictionary, the method further comprising, after obtaining the tenant's tenant identification according to the tenant platform, transmitting the search statement and the tenant identification to the dictionary unit.
在一个实施例中,在上述在云搜索平台中进行搜索的方法中,根据所述租户词典对所述搜索语句进行分词,从而获取与所述搜索语句对应的已分词语句包括:在所述词典单元中,通过所述租户词典对所述搜索语句进行分词,以生成已分词语句;以及从所述词典单元接收所述已分词语句。In an embodiment, in the foregoing method for searching in the cloud search platform, the word segmentation is performed according to the tenant dictionary, so that the segmented sentence corresponding to the search sentence is obtained, including: in the dictionary In the unit, the search sentence is segmented by the tenant dictionary to generate a word segmentation statement; and the word segmentation statement is received from the dictionary unit.
在一个实施例中,上述在云搜索平台中进行搜索的方法还包括,在根据所述租户词典对所述搜索语句进行分词,从而获取与所述搜索语句对应的已分词语句之后,将所述已分词语句和所述租户标识发送给所述搜索实例。In one embodiment, the method for searching in the cloud search platform further includes: after the word segmentation is performed according to the tenant dictionary, to obtain the word segmentation statement corresponding to the search sentence, The word segmentation statement and the tenant identification are sent to the search instance.
本说明书另一方面提供一种在云搜索平台中进行搜索的装置,所述云搜索平台包括用于多个租户的搜索实例,所述搜索实例包括适用于所述多个租户中每个租户的统一的字段定义表,所述搜索实例为多个租户各自分配有租户标识,所述租户标识用于唯一标识对应租户,所述字段定义表中包括对租户标识字段的描述,所述租户标识字段与所述租户标识关联,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,所述装置由所述云搜索平台实施并包括以下单元:第一接收单元,配置为,从所述租户平台接收搜索语句;第一获取单元,配置为,从所述租户平台获取租户的租户标识;第二获取单元,配置为,通过所述租户标识获取所述租户的租户词典;分词单元,配置为,根据所述租户词典对所述搜索语句进行分词,从而获取与所述搜索语句对应的已分词语句;检索单元,配置为,在所述搜索实例中对所述租户标识字段和所述已分词语句进行检索,以在所述租户的租户文档中对所述已分词语句进行检索;定位单元,配置为,根据所述租户标识定位所述租户平台;以及返回单元,配置为,根据所述字段定义表向所述租户平台返回检索结果。Another aspect of the present specification provides an apparatus for performing a search in a cloud search platform, the cloud search platform including a search instance for a plurality of tenants, the search instance including being applicable to each of the plurality of tenants a unified field definition table, where the search instance is assigned a tenant identifier for each tenant, the tenant identifier is used to uniquely identify the corresponding tenant, and the field definition table includes a description of the tenant identification field, and the tenant identification field In association with the tenant identifier, the cloud search platform further includes a unified service interface, the service interface is connected to the tenant platform of the plurality of tenants, and the device is implemented by the cloud search platform and includes the following unit: a receiving unit, configured to receive a search statement from the tenant platform; the first obtaining unit is configured to obtain a tenant identifier of the tenant from the tenant platform; and the second obtaining unit is configured to obtain the location by using the tenant identifier a tenant dictionary of a tenant; a word segmentation unit configured to segment the search sentence according to the tenant dictionary, thereby Taking the word segmentation statement corresponding to the search statement; the retrieval unit is configured to retrieve the tenant identification field and the segmented sentence in the search instance to be in the tenant's tenant document The segmentation statement is searched; the positioning unit is configured to locate the tenant platform according to the tenant identifier; and the returning unit is configured to return a search result to the tenant platform according to the field definition table.
在上述根据本说明书实施例的在云搜索平台中构建索引和进行搜索的方法和装置中,通过在单个搜索实例中对多个租户按照统一的逻辑进行处理,并且将租户词典独立为搜索实例外部的服务,简化了整个架构的复杂度,降低了开发成本,并且可以减少整个架构需要的存储空间。另外,本方案可实现多租户自定义词典的需求,且可通过增加服务器等计算资源达到支持租户数量的线性增长,即系统是“线性可扩展”的。In the above method and apparatus for constructing an index and performing a search in a cloud search platform according to an embodiment of the present specification, a plurality of tenants are processed in a unified logic in a single search instance, and the tenant dictionary is independently external to the search instance. The service simplifies the complexity of the entire architecture, reduces development costs, and reduces the storage space required for the entire architecture. In addition, the solution can realize the requirement of multi-tenant custom dictionary, and can increase the linear growth of the number of tenants by increasing the computing resources such as servers, that is, the system is "linearly scalable".
附图说明DRAWINGS
通过结合附图描述本说明书实施例,可以使得本说明书实施例更加清楚:The embodiments of the present specification can be more clearly understood by describing the embodiments of the specification with reference to the accompanying drawings:
图1示意示出了本说明书实施例的应用场景;FIG. 1 is a schematic view showing an application scenario of an embodiment of the present specification;
图2示出了根据本说明书一个实施例的在云搜索平台中构建索引的方法;2 illustrates a method of building an index in a cloud search platform in accordance with one embodiment of the present specification;
图3示出了根据本说明书实施例的一种在云搜索平台中构建索引的装置300;FIG. 3 illustrates an apparatus 300 for building an index in a cloud search platform in accordance with an embodiment of the present specification;
图4示出了根据本说明书实施例的在云搜索平台中进行搜索的方法;以及4 illustrates a method of performing a search in a cloud search platform in accordance with an embodiment of the present specification;
图5示出了根据本说明书实施例的在云搜索平台中进行搜索的装置500。FIG. 5 illustrates an apparatus 500 for performing a search in a cloud search platform in accordance with an embodiment of the present specification.
具体实施方式Detailed ways
下面将结合附图描述本说明书实施例。Embodiments of the present specification will be described below with reference to the drawings.
图1示意示出了本说明书实施例的应用场景。本说明书实施例的应用场景包括云搜索平台101和多个租户的租户平台102、103、104等。云搜索平台101包括统一的服务接口用于与各个租户平台连接,该服务接口例如是restful服务接口。例如,云搜索平台101可通过服务接口从租户平台103接收租户的文档、租户的自定义词典等,从而根据租户的文档和自定义词典构建索引。当租户平台103的用户在租户平台103上使用搜索引擎进行搜索时,租户平台103通过服务接口将其用户的搜索语句发送给云搜索平台101。云搜索平台101通过例如“租户ID=xxx”(其中xxx为租户的租户标识)的字段行区别租户而进行检索,并向租户平台103返回检索结果。FIG. 1 schematically shows an application scenario of an embodiment of the present specification. The application scenarios of the embodiments of the present specification include a cloud search platform 101 and tenant platforms 102, 103, 104, and the like of a plurality of tenants. The cloud search platform 101 includes a unified service interface for connecting to various tenant platforms, such as a restful service interface. For example, the cloud search platform 101 can receive a tenant's document, a tenant's custom dictionary, etc. from the tenant platform 103 through a service interface, thereby building an index from the tenant's document and a custom dictionary. When the user of the tenant platform 103 searches on the tenant platform 103 using the search engine, the tenant platform 103 sends its user's search statement to the cloud search platform 101 through the service interface. The cloud search platform 101 searches by distinguishing tenants by a field line such as "tenant ID = xxx" (where xxx is the tenant's tenant identification), and returns the search result to the tenant platform 103.
所述云搜索平台包括这样的搜索实例(cluster),在该搜索实例中包含多个租户,所述搜索实例包括适用于所述多个租户中每个租户的统一的字段定义表(schema)。即,多个租户共享一个搜索实例,使用相同的字段定义表。这里的搜索实例是云搜索平台中的用于实现搜索功能的独立应用,搜索实例与搜索实例之间在逻辑上互相隔离。The cloud search platform includes a search instance in which a plurality of tenants are included, the search instances including a unified field definition schema applicable to each of the plurality of tenants. That is, multiple tenants share a single search instance, using the same field definition table. The search example here is a stand-alone application in the cloud search platform for implementing the search function, and the search instance and the search instance are logically isolated from each other.
字段定义表(schema)例如可以是schema.xml配置文件的形式,其包含所有文档可能包含的字段(Field)以及在建立文档索引和查询时这些字段将被如何处理的所有信息。The field definition table (schema) can for example be in the form of a schema.xml configuration file containing all the fields that the document may contain and all the information about how these fields will be processed when creating the document index and query.
例如,原始数据的文档(即,raw doc)以如下格式生成:For example, the document of the original data (ie, raw doc) is generated in the following format:
id=1Id=1
user_id=001User_id=001
title=xxxxxxx yyyyyTitle=xxxxxxx yyyyy
content=...origin_title=xxxxxyyyyyContent=...origin_title=xxxxxyyyyy
其中id、user_id、title、content就是该文档中包含的字段,其中,“=”后面的内容是对应字段的值。The id, user_id, title, and content are the fields included in the document, and the content after the “=” is the value of the corresponding field.
表1示意示出一个搜索实例的字段定义表。Table 1 schematically shows a field definition table for a search instance.
Figure PCTCN2019070820-appb-000001
Figure PCTCN2019070820-appb-000001
表1Table 1
表1列出了在搜索实例中处理的文档中的包含的字段:title(分词后标题)、content(正文)、cat_id(类目id)、user_id(租户id)、origin_title(原始标题)。表1还记录了在建立文档索引和查询时这些字段将被如何处理的信息。例如,在title这行中,引擎字段这一列中的“倒排、正排”表示,将对标题建立倒排索引和正排索引,以及,在是否需分词这栏中显示了在建索引时通过空格对title进行分词。再例如,在content这行中,引擎字段这一列中的“摘要字段”表示,在显示查询结果时,以摘要的形式显示正文。当然,表1中的字段定义表只是示例性的,只是用于对字段定义表进行示例说明,而实际应用中的字段定义表可以包含更多的字段、并且可以包含不同的字段定义。Table 1 lists the fields included in the document processed in the search instance: title (title after word segmentation), content (body), cat_id (category id), user_id (tenant id), origin_title (original title). Table 1 also records information about how these fields will be processed when creating document indexes and queries. For example, in the title line, the "inverted, positive row" in the column of the engine field indicates that the inverted index and the positive index will be created for the title, and that the index is displayed in the column if the word segmentation is required. Spaces are used to segment the title. For another example, in the content line, the "summary field" in the column of the engine field indicates that the body is displayed in the form of a digest when the query result is displayed. Of course, the field definition table in Table 1 is only exemplary, but is used to illustrate the field definition table, and the field definition table in the actual application can contain more fields and can contain different field definitions.
搜索实例在针对每个租户的文档构建索引时都会读取上述字段定义表,并根据表中的描述构建索引。另外,在根据租户搜索请求返回搜索结果时,搜索实例也会根据字段定义表中的描述对各字段进行显示。The search instance reads the above field definition table when building an index for each tenant's document and builds an index based on the description in the table. In addition, when returning search results based on a tenant search request, the search instance also displays the fields according to the description in the field definition table.
所述搜索实例为其中的多个租户各自分配有租户标识,所述租户标识用于唯一标识对应租户。具体是,搜索实例通过字段定义表中的一个字段(例如,user_id)来区分不同的租户,例如,“user_id=001”表示是第一个租户,“user_id=002”表示第二个租户。该租户标识还可以用来区分租户的文档、租户的词典、租户平台等。例如,通过将租户的词典与租户标识关联,在使用租户词典时,可以通过租户标识获取租户词典。当在搜索实例中进行搜索时,通过在搜索请求中包含“user_id=xxx”来区分不同租户的请求,从而可以做到租户之间相互无干扰。例如,具体的搜索命令(query)可为:query=title:天气AND user_id:xxx,这条搜索语句的含义是搜索title中含有“天气”的索引,索引同时需要满足的条件是user_id为xxx,这样就相当于按照租户维度做了过滤。The search instance is that each of the plurality of tenants is assigned a tenant identifier, and the tenant identifier is used to uniquely identify the corresponding tenant. Specifically, the search instance distinguishes different tenants by one field (for example, user_id) in the field definition table. For example, "user_id=001" indicates that it is the first tenant, and "user_id=002" indicates the second tenant. The tenant ID can also be used to distinguish tenant's documents, tenant's dictionary, tenant platform, and so on. For example, by associating a tenant's dictionary with a tenant ID, when using a tenant dictionary, the tenant dictionary can be obtained through the tenant ID. When searching in the search instance, by distinguishing the requests of different tenants by including "user_id=xxx" in the search request, it is possible to achieve mutual interference between the tenants. For example, the specific search command (query) can be: query=title: weather AND user_id: xxx, the meaning of this search sentence is to search for the index containing "weather" in the title, and the index needs to satisfy the condition that the user_id is xxx. This is equivalent to filtering by tenant dimension.
从上文的描述可以了解,一个搜索实例中的多个租户使用的字段定义表一致,但是不同租户的词典是不同的,例如,某些体育类的文章要求使用体育专有词库,某些医学类的要求使用医学词典。搜索实例在构建租户的索引时,根据租户各自的词典对租户的原始文档进行分词,以用于构建索引。在本说明书一个实施例中,将租户相关的分词逻辑放在搜索实例之外实现,从而搜索实例只进行通用的存储、索引、检索等功能,无需进行复杂的租户词典自定义。As can be seen from the above description, the field definition tables used by multiple tenants in a search instance are consistent, but the dictionaries of different tenants are different. For example, some sports articles require the use of sports proprietary lexicons, some The medical class requires the use of a medical dictionary. The search instance classifies the tenant's original document according to the tenant's respective dictionary when building the tenant's index for building the index. In one embodiment of the present specification, the tenant-related word segmentation logic is implemented outside the search instance, so that the search instance only performs general storage, indexing, retrieval, and the like, without complicated tenant dictionary customization.
具体是,如图1所示,云搜索平台101中还包括词典单元12、业务代理单元13和存储单元14。词典单元12与搜索实例11分离,并且所述词典单元中包括所述多个租户的各自的租户词典,该词典单元12可以在搜索实例之外对各个租户提供分词服务。具体是,当对于来自租户平台102的文档调用词典单元12的服务时,该词典单元12使用该租户的租户词典对租户的文档进行分词。Specifically, as shown in FIG. 1, the cloud search platform 101 further includes a dictionary unit 12, a service proxy unit 13, and a storage unit 14. The dictionary unit 12 is separated from the search instance 11, and the dictionary unit includes respective tenant dictionaries of the plurality of tenants, which can provide word segmentation services to individual tenants in addition to the search instances. Specifically, when the service of the dictionary unit 12 is invoked for a document from the tenant platform 102, the dictionary unit 12 uses the tenant's tenant dictionary to segment the tenant's document.
业务代理单元13用于代理云搜索平台101的业务,并向上提供http服务,其分别与租户平台(102、103、104等)、搜索实例11、词典单元12、存储单元14等连接,以在其之间中转数据,并进行适当的数据预处理。例如,在离线构建索引的情况中,业务代理单元13从租户平台103接收租户的原始文档,在该原始文档中添加租户标识,以获得租户文档,并将该租户文档存储到存储单元14中。在构建索引时,业务代理单元13从存储单元14获取租户文档,将该文档发送给词典单元12进行分词,从词典单元12接收已分词文档。然后,业务代理单元13将租户文档和已分词文档发送给搜索实例11,搜索实例11根据字段定义表和租户的已分词文档对租户文档建立索引,其中该索引包括租户ID的字段索引。The service agent unit 13 is configured to proxy the service of the cloud search platform 101 and provide an http service upward, which is respectively connected with the tenant platform (102, 103, 104, etc.), the search instance 11, the dictionary unit 12, the storage unit 14, etc., to Transfer data between them and perform appropriate data preprocessing. For example, in the case of offline indexing, the service agent unit 13 receives the original document of the tenant from the tenant platform 103, adds the tenant identification to the original document, obtains the tenant document, and stores the tenant document in the storage unit 14. Upon construction of the index, the service agent unit 13 acquires the tenant document from the storage unit 14, transmits the document to the dictionary unit 12 for word segmentation, and receives the word segmentation document from the dictionary unit 12. The business agent unit 13 then sends the tenant document and the word segmented document to the search instance 11, which searches the tenant document according to the field definition table and the tenant's word segmentation document, wherein the index includes the field index of the tenant ID.
图1所示的云搜索平台101的构成只是本说明书的一个实施例,并不限定本说明书实施例。在另一个实施例中,在云搜索平台中,将词典分词功能设置在搜索实例内部。从而,只对搜索实例发送租户文档,并由搜索实例根据租户词典对租户文档进行分词。在另一个实施例中,在云搜索平台中,搜索实例直接(即,不通过业务代理单元)调用外部词典单元的服务,即,直接从词典单元获取已分词文档和租户文档,以用于构建索引。The configuration of the cloud search platform 101 shown in FIG. 1 is only one embodiment of the present specification, and does not limit the embodiments of the present specification. In another embodiment, in the cloud search platform, the dictionary word segmentation function is set inside the search instance. Thus, the tenant document is only sent to the search instance, and the tenant document is segmented by the search instance according to the tenant dictionary. In another embodiment, in the cloud search platform, the search instance calls the service of the external dictionary unit directly (ie, not through the service agent unit), that is, directly obtains the word segmentation document and the tenant document from the dictionary unit for construction. index.
下面描述根据本说明书一个实施例的在云搜索平台中构建索引的方法和装置。图2示出了根据本说明书一个实施例的在云搜索平台中构建索引的方法。A method and apparatus for constructing an index in a cloud search platform in accordance with one embodiment of the present specification is described below. 2 illustrates a method of building an index in a cloud search platform in accordance with one embodiment of the present specification.
如图2所示,在步骤S21,获取租户文档,所述租户文档的内容中包括租户标识字段行,所述租户标识字段行示出所述租户文档所属租户的租户标识。如前文所述,所述 云搜索平台包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接。通过所述服务接口从所述租户平台接收租户原始文档。As shown in FIG. 2, in step S21, a tenant document is acquired. The content of the tenant document includes a tenant identification field row, and the tenant identification field row indicates a tenant identifier of a tenant to which the tenant document belongs. As described above, the cloud search platform includes a unified service interface, and the service interface is connected to the tenant platforms of the plurality of tenants. The tenant original document is received from the tenant platform through the service interface.
例如,在实时构建索引的情况中,通过所述服务接口从所述租户平台接收租户原始文档,并根据所述租户平台获取租户标识“xxx”,在所述租户原始文档的内容中增加字段行“租户ID=xxx”,从而获取所述租户文档。其中,例如,当云搜索平台从租户001的租户平台接收租户原始文档时,会同时从该租户平台接收诸如type=user_001的参数,从而可以获取租户的租户标识“xxx”,例如“001”。而在离线构建索引的情况中,通过所述服务接口从所述租户平台接收租户原始文档,并根据所述租户平台获取租户标识“xxx”,在所述租户原始文档的内容中增加字段行“租户ID=xxx”行,从而生成所述租户文档,并将所述租户文档存储在所述云搜索平台中。从而,当离线构建索引时,从云搜索平台的存储单元中获取所述租户文档。For example, in the case of constructing an index in real time, the tenant original document is received from the tenant platform through the service interface, and the tenant identifier “xxx” is obtained according to the tenant platform, and a field row is added to the content of the tenant original document. "Tenant ID = xxx" to obtain the tenant document. For example, when the cloud search platform receives the tenant original document from the tenant platform of the tenant 001, the parameter such as type=user_001 is also received from the tenant platform, so that the tenant's tenant identifier “xxx”, for example, “001” can be obtained. In the case of offline indexing, the tenant original document is received from the tenant platform through the service interface, and the tenant identifier “xxx” is obtained according to the tenant platform, and a field row is added to the content of the tenant original document. The tenant ID = xxx" line, thereby generating the tenant document and storing the tenant document in the cloud search platform. Thus, when the index is built offline, the tenant document is obtained from the storage unit of the cloud search platform.
在一个实施例中,如图1所示,在云搜索平台中包括业务代理单元,用于代理执行平台中的业务逻辑。在该业务代理单元上设置有上述统一的服务接口,该服务接口与多个租户的租户平台连接。并且所述业务代理单元还与所述云搜索平台的存储单元连接。例如,在实时构建索引的情况中,所述业务代理单元通过所述服务接口从所述租户平台接收租户原始文档,并根据所述租户平台获取租户标识“xxx”,在所述租户原始文档的内容中增加字段行“租户ID=xxx”,从而获取所述租户文档。而在离线构建索引的情况中,所述业务代理单元通过所述服务接口从所述租户平台接收租户原始文档,并根据所述租户平台获取租户标识“xxx”,在所述租户原始文档的内容中增加字段行“租户ID=xxx”行,从而生成所述租户文档,并将所述租户文档存储在所述云搜索平台的存储单元14中。从而,当离线构建索引时,所述业务代理单元从存储单元14中获取所述租户文档。In one embodiment, as shown in FIG. 1, a service agent unit is included in the cloud search platform for proxying business logic in the platform. The unified service interface is set on the service agent unit, and the service interface is connected to tenant platforms of multiple tenants. And the service agent unit is further connected to the storage unit of the cloud search platform. For example, in the case of constructing an index in real time, the service agent unit receives a tenant original document from the tenant platform through the service interface, and acquires a tenant identification “xxx” according to the tenant platform, in the tenant original document The field line "tenant ID=xxx" is added to the content to obtain the tenant document. In the case of offline indexing, the service agent unit receives the tenant original document from the tenant platform through the service interface, and obtains the tenant identifier “xxx” according to the tenant platform, and the content of the original document in the tenant. The field row "tenant ID=xxx" row is added to generate the tenant document, and the tenant document is stored in the storage unit 14 of the cloud search platform. Thus, when the index is built offline, the service agent unit acquires the tenant document from the storage unit 14.
在步骤S22,通过所述租户标识获取所述租户的租户词典。所述云搜索平台还通过所述服务接口从所述租户平台接收租户词典,根据所述租户平台获取租户标识,并将所述租户词典与所述租户标识关联地存储在所述词典单元中。从而,当对租户文档构建索引时,可以通过租户标识从平台中提出与租户标识关联的租户词典,并获取该租户词典。In step S22, the tenant dictionary of the tenant is obtained by the tenant identifier. The cloud search platform further receives a tenant dictionary from the tenant platform through the service interface, acquires a tenant identifier according to the tenant platform, and stores the tenant dictionary in the dictionary unit in association with the tenant identifier. Thus, when the tenant document is indexed, the tenant dictionary associated with the tenant identification can be proposed from the platform by the tenant identification, and the tenant dictionary can be obtained.
在步骤S23,根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档。例如,原始文档的标题为“北京团购网站长”,租户001的租户词典中包括词条“团购”和“网站”,则根据租户001的租户词典对该标题进行分词的结果是“北京团购网站长”。再例如,租户002的租户词典中包括词条“北京团购 网”和“站长”,则根据租户002的租户词典对该标题进行分词的结果是“北京团购网站长”。在上述分词结果中,以空格作为分词之间的分隔,这仅是示例性的,也可以通过其它字符作为分隔,或者以其它形式表示分词,例如通过结构化每个分词。In step S23, the tenant document is segmented according to the tenant dictionary, thereby acquiring a word segmentation document corresponding to the tenant document. For example, the title of the original document is “Beijing Group Buying Website Long”, and the tenant dictionary of tenant 001 includes the terms “group purchase” and “website”, and the result of the word segmentation according to the tenant dictionary of tenant 001 is “Beijing group purchase website”. long". For another example, if the tenant dictionary of the tenant 002 includes the terms "Beijing Group Buying Network" and "Webmaster", the result of the word segmentation according to the tenant dictionary of the tenant 002 is "Beijing Group Buying Website Long". In the above word segmentation results, spaces are used as the separation between the word segments, which is merely exemplary, and may be separated by other characters or expressed in other forms, for example, by structuring each word segmentation.
在一个实施例中,在词典单元中还可以叠加使用默认词典对租户文档进行进一步的分词。在该情况中,优先使用租户词典中的词条。In one embodiment, the tenant document may be further segmented using the default dictionary in the dictionary unit. In this case, the terms in the tenant dictionary are used preferentially.
在一个实施例中,如图1所示,在云搜索平台中还包括词典单元,该词典单元是平台上的与所述搜索实例分离的一个应用,其用于提供分词服务。In one embodiment, as shown in FIG. 1, a dictionary unit is further included in the cloud search platform, and the dictionary unit is an application on the platform separate from the search instance, and is used to provide a word segmentation service.
业务代理单元通过服务接口从租户平台接收租户词典,根据所述租户平台获取租户标识,并将所述租户词典与所述租户标识关联地存储在所述词典单元中。例如,可以在租户词典中设置一个启动的配置文件,配置文件的一个选项为key:dict_path,具体例如为user_id_001:/home/admin/local_dict_1.txt,从而通过该配置文件将租户词典与租户标识关联起来。The service agent unit receives the tenant dictionary from the tenant platform through the service interface, acquires the tenant ID according to the tenant platform, and stores the tenant dictionary in the dictionary unit in association with the tenant identification. For example, you can set up a startup configuration file in the tenant dictionary. One option of the configuration file is key: dict_path, for example, user_id_001:/home/admin/local_dict_1.txt, so that the tenant dictionary is associated with the tenant ID through the configuration file. stand up.
从而,当对租户构建索引时,可以通过业务代理单元调用词典单元的分词服务。例如,业务代理单元在获取所述租户文档之后,将所述租户文档发送给词典单元。词典单元通过租户文档中的字段行“租户ID=xxx”获取租户标识,并通过租户标识从其自身中提出与租户标识关联存储的租户词典,并获取该租户词典。然后,在所述词典单元中,根据所述租户词典对所述租户文档进行分词,以生成已分词文档。之后,所述词典单元将租户文档和已分词文档发送给业务代理单元。Thus, when an index is built for a tenant, the word segmentation service of the dictionary unit can be called by the business agent unit. For example, the business agent unit sends the tenant document to the dictionary unit after acquiring the tenant document. The dictionary unit obtains the tenant ID through the field line "tenant ID=xxx" in the tenant document, and proposes a tenant dictionary stored in association with the tenant ID from the tenant ID, and acquires the tenant dictionary. Then, in the dictionary unit, the tenant document is segmented according to the tenant dictionary to generate a word segmentation document. Thereafter, the dictionary unit transmits the tenant document and the segmented document to the service agent unit.
在步骤S24,在所述搜索实例中,根据所述字段定义表和所述已分词文档对所述租户文档建立索引,其中包括,根据所述字段定义表中的对所述租户标识字段的描述,建立所述租户标识字段与所述租户文档的索引关系。例如,搜索实例中的字段定义表是如表1所示的表,表中示出了对经过分词的标题(title)、类目id(cat_id)、租户ID(user_id)建立倒排索引和正排索引。以租户ID为例,根据字段定义表中对租户ID字段的描述,搜索实例中的索引器生成租户ID与租户文档的索引关系表,其中包括倒排表和正排表。In step S24, in the search instance, the tenant document is indexed according to the field definition table and the word segmentation document, including: according to the description of the tenant identification field in the field definition table Establishing an index relationship between the tenant identification field and the tenant document. For example, the field definition table in the search instance is a table as shown in Table 1, which shows the inverted index and the positive row for the title, title id (cat_id), tenant ID (user_id) of the word segmentation. index. Taking the tenant ID as an example, according to the description of the tenant ID field in the field definition table, the indexer in the search instance generates an index relationship table of the tenant ID and the tenant document, including the inverted list and the positive list.
如图1所示,在一个实施例中,当在词典单元12中根据所述租户词典对所述租户文档进行分词,并将租户文档和已分词文档发送给业务代理单元13之后,业务代理单元13将所述租户文档和所述已分词文档发送给所述搜索实例11。分词好的内容,可以由预先约定的分隔符(例如空格)进行分隔,在词典单元中不是以该分隔符对文档进行分词的情况中,可以在业务代理单元中进一步对已分词文档进行处理,以将该文档修改为 以所述约定分隔符进行分词。在该情况中,搜索实例在接收已分词文档之后,按照已完成的分词字段进行分词,例如,按照空格进行分词,而不需要进行额外的分词处理。As shown in FIG. 1, in one embodiment, after the tenant document is segmented according to the tenant dictionary in the dictionary unit 12, and the tenant document and the word segmented document are transmitted to the service agent unit 13, the service agent unit The tenant document and the word segmented document are sent to the search instance 11. The content of the word segmentation can be separated by a pre-agreed separator (for example, a space). In the case where the document is not segmented by the separator in the dictionary unit, the word segmentation document can be further processed in the business agent unit. Modify the document to be segmented with the convention separator. In this case, the search instance performs word segmentation according to the completed word segmentation field after receiving the word segmentation document, for example, word segmentation according to spaces, without additional word segmentation processing.
从而,在搜索实例中可以对所有租户的文档都按照统一的逻辑进行处理,而不需要区分租户。即,搜索实例的索引中包括全部租户的文档,在搜索时,通过加入例如user_id=“xxx”来区分出不同的租户的请求,从而可以在租户之间进行隔离。Thus, in the search instance, all tenant documents can be processed in a unified logic without the need to distinguish tenants. That is, the index of the search instance includes the documents of all the tenants, and when searching, the requests of different tenants are distinguished by adding, for example, user_id=“xxx”, so that the tenants can be isolated.
图3示出了根据本说明书实施例的一种在云搜索平台中构建索引的装置300。所述云搜索平台包括用于多个租户的搜索实例,所述搜索实例包括适用于所述多个租户的统一的字段定义表,所述搜索实例为多个租户各自分配有租户标识,所述租户标识用于唯一标识对应租户,所述字段定义表中包括对租户标识字段的描述,所述租户标识字段与所述租户标识关联。FIG. 3 illustrates an apparatus 300 for building an index in a cloud search platform in accordance with an embodiment of the present specification. The cloud search platform includes a search instance for a plurality of tenants, the search instance including a unified field definition table applicable to the plurality of tenants, the search instance each having a tenant identifier assigned to each of the tenants, The tenant identifier is used to uniquely identify the corresponding tenant, and the field definition table includes a description of the tenant identification field, and the tenant identification field is associated with the tenant identifier.
如图3所示,在云搜索平台中构建索引的装置300由所述云搜索平台实施并包括以下单元:第一获取单元31,配置为,获取租户文档,所述租户文档的内容中包括租户标识字段行,所述租户标识字段行示出所述租户文档所属租户的租户标识;第二获取单元32,配置为,通过所述租户标识获取租户词典;分词单元33,配置为,根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档;以及建立单元34,配置为,在所述搜索实例中,根据所述字段定义表和所述已分词文档对所述租户文档建立索引,其中包括,根据所述字段定义表中的对所述租户标识字段的描述,建立所述租户标识字段与所述租户文档的索引关系。As shown in FIG. 3, the device 300 for constructing an index in the cloud search platform is implemented by the cloud search platform and includes the following unit: a first obtaining unit 31 configured to acquire a tenant document, where the content of the tenant document includes a tenant An identifier field row, the tenant identification field row showing a tenant identifier of the tenant to which the tenant document belongs; the second obtaining unit 32 is configured to obtain a tenant dictionary by using the tenant identifier, and the word segmentation unit 33 is configured to a tenant dictionary segmenting the tenant document to obtain a word segmentation document corresponding to the tenant document; and an establishing unit 34 configured to, according to the field definition table, the word segmentation document The tenant document is indexed, and the index relationship between the tenant identification field and the tenant document is established according to the description of the tenant identification field in the field definition table.
在一个实施例中,所述云搜索平台还包括词典单元,所述词典单元与所述搜索实例分离,并且所述词典单元中包括所述多个租户的各自的租户词典,所述在云搜索平台中构建索引的装置300还包括,第一发送单元,配置为,在获取租户文档之后,将所述租户文档发送给所述词典单元。In one embodiment, the cloud search platform further includes a dictionary unit, the dictionary unit is separated from the search instance, and the dictionary unit includes respective tenant dictionaries of the plurality of tenants, the cloud search The apparatus 300 for constructing an index in the platform further includes a first sending unit configured to send the tenant document to the dictionary unit after acquiring the tenant document.
在一个实施例中,根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档包括:在所述词典单元中,根据所述租户词典对所述租户文档进行分词,以生成与所述租户文档对应的已分词文档;以及从所述词典单元接收所述已分词文档。In one embodiment, the segmentation of the tenant document according to the tenant dictionary, so as to obtain the word segmentation document corresponding to the tenant document, includes: in the dictionary unit, the tenant document according to the tenant dictionary Word segmentation is performed to generate a word segmentation document corresponding to the tenant document; and the word segmentation document is received from the dictionary unit.
在一个实施例中,在云搜索平台中构建索引的装置300还包括第二发送单元,配置为,在根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档之后,将所述租户文档及其对应的所述已分词文档发送给所述搜索实例。In an embodiment, the apparatus 300 for constructing an index in the cloud search platform further includes a second sending unit configured to perform segmentation on the tenant document according to the tenant dictionary, thereby acquiring the corresponding corresponding to the tenant document After the word segmentation document, the tenant document and its corresponding word segmentation document are sent to the search instance.
在一个实施例中,在所述在云搜索平台中构建索引的装置300中,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,以及,其中获取租户文档包括,通过所述服务接口从所述租户平台接收租户原始文档,根据所述租户平台获取所述租户标识,以及在所述租户原始文档的内容中增加所述租户标识字段行,从而获取所述租户文档。In an embodiment, in the apparatus 300 for building an index in a cloud search platform, the cloud search platform further includes a unified service interface, the service interface is connected to the tenant platform of the plurality of tenants, and The obtaining the tenant document includes: receiving, by the service interface, the tenant original document from the tenant platform, obtaining the tenant identifier according to the tenant platform, and adding the tenant identification field row to the content of the tenant original document. Thereby obtaining the tenant document.
在一个实施例中,离线实施所述在云搜索平台中构建索引的装置300,并且所述装置300还包括,第一存储单元,配置为,在获取租户文档之前,通过所述服务接口从所述租户平台接收租户原始文档,并根据所述租户平台获取所述租户标识,在所述租户原始文档的内容中增加所述租户标识字段行,从而生成所述租户文档,并将所述租户文档存储在所述云搜索平台中。In one embodiment, the apparatus 300 for constructing an index in the cloud search platform is implemented offline, and the apparatus 300 further includes: a first storage unit configured to pass through the service interface before acquiring the tenant document The tenant platform receives the tenant original document, obtains the tenant identifier according to the tenant platform, adds the tenant identification field row to the content of the tenant original document, thereby generating the tenant document, and the tenant document Stored in the cloud search platform.
在一个实施例中,所述在云搜索平台中构建索引的装置300还包括第二存储单元,配置为,在获取租户文档之前,通过所述服务接口从所述租户平台接收租户词典,根据所述租户平台获取租户标识,并将所述租户词典与所述租户标识关联地存储在所述词典单元中。In an embodiment, the apparatus 300 for constructing an index in the cloud search platform further includes a second storage unit configured to receive a tenant dictionary from the tenant platform through the service interface before acquiring the tenant document, according to the The tenant platform obtains the tenant identification and stores the tenant dictionary in the dictionary unit in association with the tenant identification.
图4示出了根据本说明书实施例的在云搜索平台中进行搜索的方法。,所述云搜索平台包括用于多个租户的搜索实例,所述搜索实例包括适用于所述多个租户的统一的字段定义表,所述搜索实例为多个租户各自分配有租户标识,所述租户标识用于唯一标识对应租户,所述字段定义表中包括对租户标识字段的描述,所述租户标识字段与所述租户标识关联,并且所述云搜索平台包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接。FIG. 4 illustrates a method of performing a search in a cloud search platform in accordance with an embodiment of the present specification. The cloud search platform includes a search instance for a plurality of tenants, the search instance including a unified field definition table applicable to the plurality of tenants, the search instance each having a tenant identifier assigned to each of the ten tenants, The tenant identifier is used to uniquely identify the corresponding tenant, the field definition table includes a description of the tenant identification field, the tenant identification field is associated with the tenant identifier, and the cloud search platform includes a unified service interface, The service interface is connected to the tenant platform of the plurality of tenants.
如图4所示,在步骤S41,从租户平台接收搜索语句。云搜索平台通过上述统一的服务接口从租户平台接收租户的搜索请求。例如,该搜索请求为搜索语句“北京团购”。As shown in FIG. 4, in step S41, a search sentence is received from the tenant platform. The cloud search platform receives the tenant's search request from the tenant platform through the unified service interface described above. For example, the search request is the search phrase "Beijing Group Buy".
在一个实施例中,如图1所示,云搜索平台101中包括业务代理单元13,业务代理单元13包括所述服务接口,以与所述租户平台连接。业务代理单元13通过上述统一的服务接口从租户平台接收租户的搜索请求。In one embodiment, as shown in FIG. 1, the cloud search platform 101 includes a service proxy unit 13 that includes the service interface to connect with the tenant platform. The service agent unit 13 receives the tenant's search request from the tenant platform through the unified service interface described above.
在步骤S42,从所述租户平台获取租户的租户标识。例如,每个租户平台可在其发送的请求串中包含对应于租户平台的租户标识参数。例如,租户001的租户平台向云搜索平台发送诸如type=user_001的参数。从而,从租户平台在请求串中包含的参数,可以获取租户的租户标识“xxx”,例如“001”。在一个实施例中,如图1所示,通过业 务代理层13从租户平台获取租户的租户标识“xxx”。In step S42, the tenant's tenant identification is obtained from the tenant platform. For example, each tenant platform may include a tenant identification parameter corresponding to the tenant platform in the request string it sends. For example, the tenant platform of tenant 001 sends a parameter such as type=user_001 to the cloud search platform. Thus, from the parameters included in the request string by the tenant platform, the tenant's tenant identification "xxx", for example, "001" can be obtained. In one embodiment, as shown in Figure 1, the tenant's tenant identification "xxx" is obtained from the tenant platform via the business agent layer 13.
在步骤S43,通过所述租户标识获取所述租户的租户词典。如上文所述,在云搜索平台中将租户词典与租户标识关联地存储在其中,从而,可以通过租户标识从平台中提出与租户标识关联的租户词典,并获取该租户词典。In step S43, the tenant dictionary of the tenant is obtained by the tenant identifier. As described above, the tenant dictionary is stored in the cloud search platform in association with the tenant identification, so that the tenant dictionary associated with the tenant identification can be proposed from the platform through the tenant identification, and the tenant dictionary can be obtained.
在一个实施例中,如图1所示,云搜索平台101还包括词典单元12。业务代理单元13在从租户平台获取租户的租户标识“xxx”之后,将所述搜索语句和租户标识发送给词典单元12。从而,词典单元12通过租户标识从其自身中提出与租户标识关联的租户词典,并获取该租户词典。In one embodiment, as shown in FIG. 1, cloud search platform 101 also includes a dictionary unit 12. The business agent unit 13 transmits the search sentence and the tenant identification to the dictionary unit 12 after acquiring the tenant's tenant identification "xxx" from the tenant platform. Thus, the dictionary unit 12 proposes a tenant dictionary associated with the tenant identification from itself through the tenant identification, and acquires the tenant dictionary.
在步骤S44,根据所述租户词典对所述搜索语句进行分词,从而获取与所述搜索语句对应的已分词语句。例如,当云搜索平台从租户001的租户平台接收搜索语句“北京团购”时,租户001的租户词典中包括词条“团购”和“网站”,则根据租户001的租户词典对该语句进行分词的结果是“北京团购”。再例如,当云搜索平台从租户002的租户平台接收搜索语句“北京团购”时,租户002的租户词典中包括词条“北京团购网”,则根据租户002的租户词典对该语句进行分词的结果还是“北京团购”。In step S44, the search sentence is segmented according to the tenant dictionary, thereby acquiring a word segmentation sentence corresponding to the search sentence. For example, when the cloud search platform receives the search sentence "Beijing Group Purchase" from the tenant platform of the tenant 001, the tenant dictionary of the tenant 001 includes the terms "group purchase" and "website", and the sentence is segmented according to the tenant dictionary of the tenant 001. The result is "Beijing group purchase." For another example, when the cloud search platform receives the search sentence “Beijing Group Purchase” from the tenant platform of the tenant 002, the tenant dictionary of the tenant 002 includes the term “Beijing Group Purchase Network”, and the sentence is segmented according to the tenant dictionary of the tenant 002. The result is still "Beijing group purchase."
在一个实施例中,如图1所示,在所述词典单元12中,通过所述租户词典对所述搜索语句进行分词,以生成已分词语句。之后,所述词典单元12将所述已分词语句和租户标识发送回业务代理单元13。之后,业务代理单元13将所述已分词语句和租户标识发送给搜索实例11。In one embodiment, as shown in FIG. 1, in the dictionary unit 12, the search sentence is segmented by the tenant dictionary to generate a word segmentation statement. Thereafter, the dictionary unit 12 sends the word segmentation statement and the tenant identification back to the service agent unit 13. Thereafter, the service agent unit 13 transmits the word segmentation statement and the tenant identification to the search instance 11.
在步骤S45,在所述搜索实例中对所述租户标识字段和所述已分词语句进行检索,以在所述租户的租户文档中对所述已分词语句进行检索。In step S45, the tenant identification field and the segmented sentence are retrieved in the search instance to retrieve the word segmentation statement in the tenant's tenant document.
例如,租户001(即,租户ID=001)的文档中包括标题为“北京团购网站长”的文档,租户001的租户词典中包括词条“团购”和“网站”,则根据租户001的租户词典对该标题进行分词的结果是“北京团购网站长”,即租户001的该文档对应的索引词条包括“北京”、“团购”“网站”和“长”。当从租户001的租户平台接收“北京团购”的搜索语句时,通过租户001的租户词典对该搜索语句进行分词的已分词语句为“北京团购”。“北京”和“团购”都在索引中与所述标题为“北京团购网站长”的文档关联,并且该文档同时在索引中与“租户ID=001”关联(即,该文档是租户001的文档),从而在该情况中,通过搜索实例对搜索语句“北京团购”进行检索,将返回所述标题为“北京团购网站长”的文档。For example, the document of tenant 001 (ie, tenant ID=001) includes the document titled “Beijing Group Buying Website Long”, and the tenant dictionary of tenant 001 includes the terms “group purchase” and “website”, then the tenant according to tenant 001 The result of the dictionary segmentation of the title is “Beijing Group Buying Website Long”, that is, the index entry corresponding to the document of tenant 001 includes “Beijing”, “Group Purchase”, “Website” and “Long”. When the "Beijing Group Purchase" search sentence is received from the tenant platform of the tenant 001, the word segmentation sentence that is segmented by the tenant dictionary of the tenant 001 is "Beijing Group Purchase". Both "Beijing" and "Group Buy" are associated with the document titled "Beijing Group Buying Site Long" in the index, and the document is also associated with "tenant ID=001" in the index (ie, the document is tenant 001) Document), so in this case, the search sentence "Beijing Group Buy" is searched by the search instance, and the document titled "Beijing Group Buying Website Long" will be returned.
再例如,租户002(即,租户ID=002)的文档中包括标题为“北京团购网站长”的文档,租户002的租户词典中包括词条“北京团购网”和“站长”,则根据租户002的租户词典对该标题进行分词的结果是“北京团购网站长”,即租户002的该文档对应的索引词条包括“北京团购网”、“站长”。当从租户002的租户平台接收“北京团购”的搜索语句时,通过租户002的租户词典对该搜索语句进行分词的已分词语句仍为“北京团购”。“北京团购”在索引中未与所述标题为“北京团购网站长”的文档关联,从而,在该情况中通过搜索实例对搜索语句“北京团购”进行检索,将不会返回所述标题为“北京团购网站长”的文档。For another example, the document of tenant 002 (ie, tenant ID=002) includes a document titled “Beijing Group Buying Website Long”, and the tenant dictionary of tenant 002 includes the terms “Beijing Group Buying Network” and “Webmaster”, according to The result of the segmentation of the title by the tenant dictionary of tenant 002 is “Beijing Group Buying Website Long”, that is, the index entry corresponding to the document of tenant 002 includes “Beijing Group Buying Network” and “Webmaster”. When the "Beijing Group Purchase" search sentence is received from the tenant platform of the tenant 002, the word segmentation sentence that is segmented by the tenant dictionary of the tenant 002 is still "Beijing Group Purchase". "Beijing Group Buy" is not associated with the document titled "Beijing Group Buying Website Long" in the index, so that in this case, the search sentence "Beijing Group Buy" is searched by the search instance, and the title will not be returned. The document of "Beijing Group Buying Website Director".
在一个实施例中,如图1所示,当在搜索实例11中根据字段行“租户ID=xxx”和所述已分词语句进行检索之后,搜索实例11将根据字段定义表显示的检索结果和租户标识发送给业务代理单元13。In one embodiment, as shown in FIG. 1, after searching in the search example 11 according to the field row "tenant ID=xxx" and the word segmentation statement, the search instance 11 will display the search result according to the field definition table and The tenant identification is sent to the service agent unit 13.
在步骤S46,根据所述租户标识定位所述租户平台。在本说明书实施例中,云搜索平台通过统一的服务接口与多个租户的租户平台连接,从而,在返回结果时,平台可通过租户标识定位之前发送搜索请求的租户平台。In step S46, the tenant platform is located according to the tenant identification. In the embodiment of the present specification, the cloud search platform is connected to the tenant platforms of multiple tenants through a unified service interface, so that when the result is returned, the platform can locate the tenant platform before sending the search request through the tenant identification.
在一个实施例中,如图1所示,业务代理单元13通过从搜索实例11接收的租户标识而定位之前发送搜索请求的租户平台。In one embodiment, as shown in FIG. 1, the service agent unit 13 locates the tenant platform that previously sent the search request by the tenant identification received from the search instance 11.
最后,在步骤S47,根据所述字段定义表向所述租户平台返回检索结果。例如,参考上文中表1所示的字段定义表,表中在“引擎字段”列对content(正文)行限定为“摘要字段”,即,在返回搜索结果时,在摘要字段中显示content。根据该表还可以得出,在摘要中还显示cat_id(类目id)和origin_title(原始标题)。Finally, in step S47, the search result is returned to the tenant platform according to the field definition table. For example, referring to the field definition table shown in Table 1 above, the "engine field" column in the table defines the content line as "summary field", that is, when the search result is returned, the content is displayed in the summary field. According to the table, it is also found that cat_id (category id) and origin_title (original title) are also displayed in the summary.
在一个实施例中,如图1所示,业务代理单元13在定位租户平台之后,将从搜索实例11接收的根据字段定义表显示的检索结果返回给租户平台。In one embodiment, as shown in FIG. 1, the service agent unit 13 returns the search result displayed according to the field definition table received from the search instance 11 to the tenant platform after the tenant platform is located.
图5示出了根据本说明书实施例的在云搜索平台中进行搜索的装置500。所述云搜索平台包括用于多个租户的搜索实例,所述搜索实例包括适用于所述多个租户中每个租户的统一的字段定义表,所述搜索实例为多个租户各自分配有租户标识,所述租户标识用于唯一标识对应租户,所述字段定义表中包括对租户标识字段的描述,所述租户标识字段与所述租户标识关联,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接。FIG. 5 illustrates an apparatus 500 for performing a search in a cloud search platform in accordance with an embodiment of the present specification. The cloud search platform includes a search instance for a plurality of tenants, the search instance including a unified field definition table applicable to each of the plurality of tenants, the search instance each having a tenant assigned to a tenant And the tenant identifier is used to uniquely identify the corresponding tenant, the field definition table includes a description of the tenant identification field, the tenant identification field is associated with the tenant identifier, and the cloud search platform further includes a unified service interface. The service interface is connected to the tenant platform of the plurality of tenants.
如图5所示,所述在云搜索平台中进行搜索的装置500由所述云搜索平台实施并包 括以下单元:第一接收单元51,配置为,从所述租户平台接收搜索语句;第一获取单元52,配置为,从所述租户平台获取租户的租户标识;第二获取单元53,配置为,通过所述租户标识获取所述租户的租户词典;分词单元54,配置为,根据所述租户词典对所述搜索语句进行分词,从而获取与所述搜索语句对应的已分词语句;检索单元55,配置为,在所述搜索实例中对所述租户标识字段和所述已分词语句进行检索,以在所述租户的租户文档中对所述已分词语句进行检索;定位单元56,配置为,根据所述租户标识定位所述租户平台;以及返回单元57,配置为,根据所述字段定义表向所述租户平台返回检索结果。As shown in FIG. 5, the device 500 for searching in the cloud search platform is implemented by the cloud search platform and includes the following unit: a first receiving unit 51 configured to receive a search sentence from the tenant platform; The obtaining unit 52 is configured to acquire the tenant identifier of the tenant from the tenant platform. The second obtaining unit 53 is configured to acquire the tenant dictionary of the tenant by using the tenant identifier, and the word segmentation unit 54 is configured to The tenant dictionary performs segmentation on the search sentence to obtain a word segmentation statement corresponding to the search sentence; the retrieval unit 55 is configured to retrieve the tenant identification field and the segmented sentence in the search instance Retrieving the word segmentation statement in the tenant's tenant document; the positioning unit 56 is configured to locate the tenant platform according to the tenant identifier; and the returning unit 57 is configured to be defined according to the field The table returns the search results to the tenant platform.
在一个实施例中,所述云搜索平台还包括词典单元,所述词典单元与所述搜索实例分离,并且所述词典单元中包括所述多个租户的各自的租户词典,所述在云搜索平台中进行搜索的装置500还包括第一发送单元,配置为,在根据所述租户平台获取租户的租户标识之后,将所述搜索语句和租户标识发送给所述词典单元。In one embodiment, the cloud search platform further includes a dictionary unit, the dictionary unit is separated from the search instance, and the dictionary unit includes respective tenant dictionaries of the plurality of tenants, the cloud search The device 500 for searching in the platform further includes a first sending unit, configured to send the search statement and the tenant identifier to the dictionary unit after acquiring the tenant identifier of the tenant according to the tenant platform.
在一个实施例中,在所述在云搜索平台中进行搜索的装置500中,根据所述租户词典对所述搜索语句进行分词,从而获取与所述搜索语句对应的已分词语句包括:在所述词典单元中,通过所述租户词典对所述搜索语句进行分词,以生成已分词语句;以及从所述词典单元接收所述已分词语句。In an embodiment, in the device 500 for searching in the cloud search platform, the search term is segmented according to the tenant dictionary, so as to obtain the word segmentation statement corresponding to the search sentence, including: In the dictionary unit, the search sentence is segmented by the tenant dictionary to generate a word segmentation sentence; and the word segmentation sentence is received from the dictionary unit.
在一个实施例中,所述在云搜索平台中进行搜索的装置500还包括,第二发送单元,配置为,在从所述词典单元接收所述租户标识之后,将所述已分词语句和所述租户标识发送给所述搜索实例。In an embodiment, the device 500 for searching in the cloud search platform further includes: a second sending unit, configured to: after receiving the tenant identifier from the dictionary unit, the word segmentation statement and the The tenant identification is sent to the search instance.
本说明书实施例还包括一种计算机可读的存储介质,其上存储有指令代码,所述指令代码在计算机中执行时,令计算机执行根据本说明书实施例的在云搜索平台中构建索引和进行搜索的方法。The embodiment of the present specification further includes a computer readable storage medium having stored thereon an instruction code, when executed in a computer, causing a computer to execute indexing and performing in a cloud search platform according to an embodiment of the present specification. The method of searching.
在上述根据本说明书实施例的在云搜索平台中构建索引和进行搜索的方法和装置中,通过在单个搜索实例中对多个租户按照统一的逻辑进行处理,并且将租户词典独立为搜索实例外部的服务,简化了整个架构的复杂度,降低了开发成本,并且可以减少整个架构需要的存储空间。另外,本方案可实现多租户自定义词典的需求,且可通过增加服务器等计算资源达到支持租户数量的线性增长,即系统是“线性可扩展”的。In the above method and apparatus for constructing an index and performing a search in a cloud search platform according to an embodiment of the present specification, a plurality of tenants are processed in a unified logic in a single search instance, and the tenant dictionary is independently external to the search instance. The service simplifies the complexity of the entire architecture, reduces development costs, and reduces the storage space required for the entire architecture. In addition, the solution can realize the requirement of multi-tenant custom dictionary, and can increase the linear growth of the number of tenants by increasing the computing resources such as servers, that is, the system is "linearly scalable".
本领域普通技术人员应该还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了 清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执轨道,取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art should further appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both, in order to clearly illustrate the hardware. Interchangeability with software, the components and steps of the various examples have been generally described in terms of functionality in the above description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the solution. Different methods may be used to implement the described functionality for each particular application, but such implementation should not be considered to be beyond the scope of the application.
结合本文中所公开的实施例描述的方法或算法的步骤可以用硬件、处理器执轨道的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of a method or algorithm described in connection with the embodiments disclosed herein may be implemented in hardware, in a software module in a processor orbit, or in a combination of the two. The software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The specific embodiments of the present invention have been described in detail with reference to the preferred embodiments of the present invention. All modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (22)

  1. 一种在云搜索平台中构建索引的方法,所述云搜索平台包括用于多个租户的搜索实例,所述搜索实例包括适用于所述多个租户的统一的字段定义表,所述搜索实例为多个租户各自分配有租户标识,所述租户标识用于唯一标识对应租户,所述字段定义表中包括对租户标识字段的描述,所述租户标识字段与所述租户标识关联,所述方法由所述云搜索平台执行并包括以下步骤:A method of building an index in a cloud search platform, the cloud search platform including a search instance for a plurality of tenants, the search instance including a unified field definition table applicable to the plurality of tenants, the search instance A plurality of tenants are each assigned a tenant identifier, the tenant identifier is used to uniquely identify the corresponding tenant, the field definition table includes a description of the tenant identification field, and the tenant identification field is associated with the tenant identifier, the method Executed by the cloud search platform and including the following steps:
    获取租户文档,所述租户文档的内容中包括租户标识字段行,所述租户标识字段行示出所述租户文档所属租户的租户标识;Obtaining a tenant document, where the content of the tenant document includes a tenant identification field row, where the tenant identification field row shows a tenant identifier of a tenant to which the tenant document belongs;
    通过所述租户标识获取所述租户的租户词典;Obtaining, by the tenant identifier, a tenant dictionary of the tenant;
    根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档;以及Deriving the tenant document according to the tenant dictionary to obtain a word segmentation document corresponding to the tenant document;
    在所述搜索实例中,根据所述字段定义表和所述已分词文档对所述租户文档建立索引,其中包括,根据所述字段定义表中的对所述租户标识字段的描述,建立所述租户标识字段与所述租户文档的索引关系。In the search example, indexing the tenant document according to the field definition table and the word segmentation document, including: establishing, according to the description of the tenant identification field in the field definition table, The index relationship between the tenant identification field and the tenant document.
  2. 根据权利要求1所述的在云搜索平台中构建索引的方法,其中所述云搜索平台还包括词典单元,所述词典单元与所述搜索实例分离,并且所述词典单元中包括所述多个租户的各自的租户词典,所述方法还包括,The method of constructing an index in a cloud search platform according to claim 1, wherein the cloud search platform further comprises a dictionary unit, the dictionary unit is separated from the search instance, and the plurality of dictionary units are included Tenant's respective tenant dictionary, the method also includes
    在获取租户文档之后,将所述租户文档发送给所述词典单元。After obtaining the tenant document, the tenant document is sent to the dictionary unit.
  3. 根据权利要求2所述的在云搜索平台中构建索引的方法,其中,根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档包括:The method for constructing an index in a cloud search platform according to claim 2, wherein the segmentation of the tenant document according to the tenant dictionary, so as to obtain the word segmentation document corresponding to the tenant document, comprises:
    在所述词典单元中,根据所述租户词典对所述租户文档进行分词,以生成与所述租户文档对应的已分词文档;以及In the dictionary unit, segmenting the tenant document according to the tenant dictionary to generate a word segmentation document corresponding to the tenant document;
    从所述词典单元接收所述已分词文档。The word segmented document is received from the dictionary unit.
  4. 根据权利要求2所述的在云搜索平台中构建索引的方法,还包括,在根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档之后,将所述租户文档及其对应的所述已分词文档发送给所述搜索实例。The method for constructing an index in a cloud search platform according to claim 2, further comprising: after segmenting the tenant document according to the tenant dictionary, thereby acquiring a word segmentation document corresponding to the tenant document, The tenant document and its corresponding word segmentation document are sent to the search instance.
  5. 根据权利要求1-4中任一项所述的在云搜索平台中构建索引的方法,其中,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,以及,其中获取租户文档包括,通过所述服务接口从所述租户平台接收租户原始文档,根据所述租户平台获取所述租户标识,以及在所述租户原始文档的内容中增加所述租户标识字段行,从而获取所述租户文档。The method for constructing an index in a cloud search platform according to any one of claims 1 to 4, wherein the cloud search platform further comprises a unified service interface, the service interface and the tenant platform of the plurality of tenants. Connecting, and wherein obtaining the tenant document includes receiving, by the service interface, a tenant original document from the tenant platform, obtaining the tenant identification according to the tenant platform, and adding the tenant to content of the tenant original document Identify the field row to get the tenant document.
  6. 根据权利要求1-4中任一项所述的在云搜索平台中构建索引的方法,其中,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,以及,离线进行所述方法,并且所述方法还包括,在获取租户文档之前,通过所述服务接口从所述租户平台接收租户原始文档,并根据所述租户平台获取所述租户标识,在所述租户原始文档的内容中增加所述租户标识字段行,从而生成所述租户文档,并将所述租户文档存储在所述云搜索平台中。The method for constructing an index in a cloud search platform according to any one of claims 1 to 4, wherein the cloud search platform further comprises a unified service interface, the service interface and the tenant platform of the plurality of tenants. Connecting, and performing the method offline, and the method further includes: receiving, by the service interface, a tenant original document from the tenant platform before acquiring the tenant document, and acquiring the tenant identifier according to the tenant platform, Adding the tenant identification field row to the content of the tenant original document, thereby generating the tenant document, and storing the tenant document in the cloud search platform.
  7. 根据权利要求2-4中任一项所述的在云搜索平台中构建索引的方法,其中,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,并且所述方法还包括:在获取租户文档之前,通过所述服务接口从所述租户平台接收租户词典,根据所述租户平台获取所述租户标识,并将所述租户词典与所述租户标识关联地存储在所述词典单元中。The method for constructing an index in a cloud search platform according to any one of claims 2 to 4, wherein the cloud search platform further comprises a unified service interface, the service interface and the tenant platform of the plurality of tenants. Connecting, and the method further comprises: before acquiring the tenant document, receiving a tenant dictionary from the tenant platform through the service interface, acquiring the tenant identifier according to the tenant platform, and the tenant dictionary and the tenant The identification is stored in association with the dictionary unit.
  8. 一种在云搜索平台中构建索引的装置,所述云搜索平台包括用于多个租户的搜索实例,所述搜索实例包括适用于所述多个租户的统一的字段定义表,所述搜索实例为多个租户各自分配有租户标识,所述租户标识用于唯一标识对应租户,所述字段定义表中包括对租户标识字段的描述,所述租户标识字段与所述租户标识关联,所述装置由所述云搜索平台实施并包括以下单元:An apparatus for building an index in a cloud search platform, the cloud search platform including a search instance for a plurality of tenants, the search instance including a unified field definition table applicable to the plurality of tenants, the search instance Each tenant is assigned a tenant identifier, the tenant identifier is used to uniquely identify the corresponding tenant, the field definition table includes a description of the tenant identification field, and the tenant identification field is associated with the tenant identifier, the device Implemented by the cloud search platform and includes the following elements:
    第一获取单元,配置为,获取租户文档,所述租户文档的内容中包括租户标识字段行,所述租户标识字段行示出所述租户文档所属租户的租户标识;The first obtaining unit is configured to obtain a tenant document, where the content of the tenant document includes a tenant identification field row, and the tenant identification field row indicates a tenant identifier of a tenant to which the tenant document belongs;
    第二获取单元,配置为,通过所述租户标识获取所述租户的租户词典;a second acquiring unit, configured to acquire, by using the tenant identifier, a tenant dictionary of the tenant;
    分词单元,配置为,根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档;以及a word segmentation unit configured to perform segmentation on the tenant document according to the tenant dictionary, thereby acquiring a word segmentation document corresponding to the tenant document;
    建立单元,配置为,在所述搜索实例中,根据所述字段定义表和所述已分词文档对所述租户文档建立索引,其中包括,根据所述字段定义表中的对所述租户标识字段的描述,建立所述租户标识字段与所述租户文档的索引关系。An establishing unit, configured to: in the search instance, indexing the tenant document according to the field definition table and the word segmentation document, including: according to the tenant identification field in the field definition table The description establishes an index relationship between the tenant identification field and the tenant document.
  9. 根据权利要求8所述的在云搜索平台中构建索引的装置,其中所述云搜索平台还包括词典单元,所述词典单元与所述搜索实例分离,并且所述词典单元中包括所述多个租户的各自的租户词典,所述装置还包括,The apparatus for constructing an index in a cloud search platform according to claim 8, wherein the cloud search platform further comprises a dictionary unit, the dictionary unit is separated from the search instance, and the plurality of dictionary units are included Tenant's respective tenant dictionary, the device also includes
    第一发送单元,配置为,在获取租户文档之后,将所述租户文档发送给所述词典单元。The first sending unit is configured to send the tenant document to the dictionary unit after acquiring the tenant document.
  10. 根据权利要求9所述的在云搜索平台中构建索引的装置,其中,根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档包括:The apparatus for constructing an index in a cloud search platform according to claim 9, wherein the segmentation of the tenant document according to the tenant dictionary, so as to obtain the word segmentation document corresponding to the tenant document, comprises:
    在所述词典单元中,根据所述租户词典对所述租户文档进行分词,以生成与所述租户文档对应的已分词文档;以及In the dictionary unit, segmenting the tenant document according to the tenant dictionary to generate a word segmentation document corresponding to the tenant document;
    从所述词典单元接收所述已分词文档。The word segmented document is received from the dictionary unit.
  11. 根据权利要求9所述的在云搜索平台中构建索引的装置,还包括,第二发送单元,配置为,在根据所述租户词典对所述租户文档进行分词,从而获取与所述租户文档对应的已分词文档之后,将所述租户文档及其对应的所述已分词文档发送给所述搜索实例。The apparatus for constructing an index in a cloud search platform according to claim 9, further comprising: a second sending unit configured to perform word segmentation according to the tenant dictionary to obtain a corresponding to the tenant document After the word segmentation document, the tenant document and its corresponding word segmentation document are sent to the search instance.
  12. 根据权利要求8-11中任一项所述的在云搜索平台中构建索引的装置,其中,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,以及,其中获取租户文档包括,通过所述服务接口从所述租户平台接收租户原始文档,根据所述租户平台获取所述租户标识,以及在所述租户原始文档的内容中增加所述租户标识字段行,从而获取所述租户文档。The apparatus for constructing an index in a cloud search platform according to any one of claims 8-11, wherein the cloud search platform further comprises a unified service interface, the service interface and the tenant platform of the plurality of tenants. Connecting, and wherein obtaining the tenant document includes receiving, by the service interface, a tenant original document from the tenant platform, obtaining the tenant identification according to the tenant platform, and adding the tenant to content of the tenant original document Identify the field row to get the tenant document.
  13. 根据权利要求8-11中任一项所述的在云搜索平台中构建索引的装置,其中,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,以及,其中离线实施所述装置,并且所述装置还包括第一存储单元,配置为,在获取租户文档之前,通过所述服务接口从所述租户平台接收租户原始文档,并根据所述租户平台获取所述租户标识,在所述租户原始文档的内容中增加所述租户标识字段行,从而生成所述租户文档,并将所述租户文档存储在所述云搜索平台中。The apparatus for constructing an index in a cloud search platform according to any one of claims 8-11, wherein the cloud search platform further comprises a unified service interface, the service interface and the tenant platform of the plurality of tenants. Connecting, and wherein the apparatus is implemented offline, and the apparatus further includes a first storage unit configured to receive a tenant original document from the tenant platform through the service interface before acquiring the tenant document, and according to the The tenant platform obtains the tenant identifier, adds the tenant identifier field row to the content of the tenant original document, generates the tenant document, and stores the tenant document in the cloud search platform.
  14. 根据权利要求9-11中任一项所述的在云搜索平台中构建索引的装置,其中,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,并且其中所述装置还包括,第二存储单元,配置为,在获取租户文档之前,通过所述服务接口从所述租户平台接收租户词典,根据所述租户平台获取所述租户标识,并将所述租户词典与所述租户标识关联地存储在所述词典单元中。The apparatus for constructing an index in a cloud search platform according to any one of claims 9-11, wherein the cloud search platform further comprises a unified service interface, the service interface and the tenant platform of the plurality of tenants. Connecting, and wherein the apparatus further includes: a second storage unit configured to receive a tenant dictionary from the tenant platform through the service interface before acquiring the tenant document, obtain the tenant identifier according to the tenant platform, and The tenant dictionary is stored in the dictionary unit in association with the tenant identification.
  15. 一种在云搜索平台中进行搜索的方法,所述云搜索平台包括用于多个租户的搜索实例,所述搜索实例包括适用于所述多个租户的统一的字段定义表,所述搜索实例为多个租户各自分配有租户标识,所述租户标识用于唯一标识对应租户,所述字段定义表中包括对租户标识字段的描述,所述租户标识字段与所述租户标识关联,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,所述方法由所述云搜索平台执行并包括以下步骤:A method of searching in a cloud search platform, the cloud search platform including a search instance for a plurality of tenants, the search instance including a unified field definition table applicable to the plurality of tenants, the search instance Each tenant is assigned a tenant ID, the tenant identifier is used to uniquely identify the corresponding tenant, and the field definition table includes a description of the tenant identification field, where the tenant identification field is associated with the tenant identifier, the cloud The search platform also includes a unified service interface, the service interface being connected to the tenant platform of the plurality of tenants, the method being performed by the cloud search platform and comprising the following steps:
    从租户平台接收搜索语句;Receiving search statements from the tenant platform;
    从所述租户平台获取租户的租户标识;Obtaining a tenant's tenant ID from the tenant platform;
    通过所述租户标识获取所述租户的租户词典;Obtaining, by the tenant identifier, a tenant dictionary of the tenant;
    根据所述租户词典对所述搜索语句进行分词,从而获取与所述搜索语句对应的已分词语句;And segmenting the search sentence according to the tenant dictionary, thereby acquiring a word segmentation statement corresponding to the search sentence;
    在所述搜索实例中对所述租户标识字段和所述已分词语句进行检索,以在所述租户的租户文档中对所述已分词语句进行检索;Retrieving the tenant identification field and the segmented sentence in the search instance to retrieve the segmented statement in the tenant's tenant document;
    根据所述租户标识定位所述租户平台;以及Locating the tenant platform based on the tenant identification;
    根据所述字段定义表向所述租户平台返回检索结果。Returning the search result to the tenant platform according to the field definition table.
  16. 根据权利要求15所述的在云搜索平台中进行搜索的方法,其中所述云搜索平台还包括词典单元,所述词典单元与所述搜索实例分离,并且所述词典单元中包括所述多个租户的各自的租户词典,所述方法还包括,The method for searching in a cloud search platform according to claim 15, wherein the cloud search platform further comprises a dictionary unit, the dictionary unit is separated from the search instance, and the plurality of dictionary units are included Tenant's respective tenant dictionary, the method also includes
    在根据所述租户平台获取租户的租户标识之后,将所述搜索语句和租户标识发送给所述词典单元。After obtaining the tenant identification of the tenant according to the tenant platform, the search statement and the tenant identification are sent to the dictionary unit.
  17. 根据权利要求16所述的在云搜索平台中进行搜索的方法,其中,根据所述租户词典对所述搜索语句进行分词,从而获取与所述搜索语句对应的已分词语句包括:The method for performing a search in a cloud search platform according to claim 16, wherein the segmentation of the search sentence according to the tenant dictionary, so as to obtain the word segmentation statement corresponding to the search sentence, comprises:
    在所述词典单元中,通过所述租户词典对所述搜索语句进行分词,以生成已分词语句;以及In the dictionary unit, the search sentence is segmented by the tenant dictionary to generate a word segmentation statement;
    从所述词典单元接收所述已分词语句。The word segmentation statement is received from the dictionary unit.
  18. 根据权利要求16所述的在云搜索平台中进行搜索的方法,还包括,在根据所述租户词典对所述搜索语句进行分词,从而获取与所述搜索语句对应的已分词语句之后,将所述已分词语句和所述租户标识发送给所述搜索实例。The method for searching in a cloud search platform according to claim 16, further comprising: after segmenting the search sentence according to the tenant dictionary, thereby acquiring a word segmentation sentence corresponding to the search sentence, The word segmentation statement and the tenant identification are sent to the search instance.
  19. 一种在云搜索平台中进行搜索的装置,所述云搜索平台包括用于多个租户的搜索实例,所述搜索实例包括适用于所述多个租户的统一的字段定义表,所述搜索实例为多个租户各自分配有租户标识,所述租户标识用于唯一标识对应租户,所述字段定义表中包括对租户标识字段的描述,所述租户标识字段与所述租户标识关联,所述云搜索平台还包括统一的服务接口,所述服务接口与所述多个租户的租户平台连接,所述装置由所述云搜索平台实施并包括以下单元:An apparatus for searching in a cloud search platform, the cloud search platform including a search instance for a plurality of tenants, the search instance including a unified field definition table applicable to the plurality of tenants, the search instance Each tenant is assigned a tenant ID, the tenant identifier is used to uniquely identify the corresponding tenant, and the field definition table includes a description of the tenant identification field, where the tenant identification field is associated with the tenant identifier, the cloud The search platform also includes a unified service interface, the service interface being connected to the tenant platform of the plurality of tenants, the device being implemented by the cloud search platform and comprising the following units:
    第一接收单元,配置为,从所述租户平台接收搜索语句;a first receiving unit, configured to receive a search statement from the tenant platform;
    第一获取单元,配置为,从所述租户平台获取租户的租户标识;a first acquiring unit, configured to acquire a tenant identifier of the tenant from the tenant platform;
    第二获取单元,配置为,通过所述租户标识获取所述租户的租户词典;a second acquiring unit, configured to acquire, by using the tenant identifier, a tenant dictionary of the tenant;
    分词单元,配置为,根据所述租户词典对所述搜索语句进行分词,从而获取与所述搜索语句对应的已分词语句;a word segmentation unit configured to perform segmentation on the search sentence according to the tenant dictionary, thereby acquiring a word segmentation statement corresponding to the search sentence;
    检索单元,配置为,在所述搜索实例中对所述租户标识字段和所述已分词语句进行检索,以在所述租户的租户文档中对所述已分词语句进行检索;a retrieval unit configured to retrieve the tenant identification field and the segmented sentence in the search instance to retrieve the segmented statement in the tenant's tenant document;
    定位单元,配置为,根据所述租户标识定位所述租户平台;以及a positioning unit configured to locate the tenant platform according to the tenant identifier;
    返回单元,配置为,根据所述字段定义表向所述租户平台返回检索结果。Returning to the unit, configured to return a search result to the tenant platform according to the field definition table.
  20. 根据权利要求19所述的在云搜索平台中进行搜索的装置,其中所述云搜索平台还包括词典单元,所述词典单元与所述搜索实例分离,并且所述词典单元中包括所述多个租户的各自的租户词典,所述装置还包括,The apparatus for performing a search in a cloud search platform according to claim 19, wherein the cloud search platform further comprises a dictionary unit, the dictionary unit is separated from the search instance, and the plurality of dictionary units are included Tenant's respective tenant dictionary, the device also includes
    第一发送单元,配置为,在根据所述租户平台获取租户的租户标识之后,将所述搜索语句和租户标识发送给所述词典单元。The first sending unit is configured to send the search statement and the tenant identifier to the dictionary unit after acquiring the tenant identifier of the tenant according to the tenant platform.
  21. 根据权利要求20所述的在云搜索平台中进行搜索的装置,其中,根据所述租户词典对所述搜索语句进行分词,从而获取与所述搜索语句对应的已分词语句包括:The apparatus for performing a search in a cloud search platform according to claim 20, wherein the segmentation of the search sentence according to the tenant dictionary, so as to obtain the word segmentation statement corresponding to the search sentence, comprises:
    在所述词典单元中,通过所述租户词典对所述搜索语句进行分词,以生成已分词语句;以及In the dictionary unit, the search sentence is segmented by the tenant dictionary to generate a word segmentation statement;
    从所述词典单元接收所述已分词语句。The word segmentation statement is received from the dictionary unit.
  22. 根据权利要求20所述的在云搜索平台中进行搜索的装置,还包括,第二发送单元,配置为,在根据所述租户词典对所述搜索语句进行分词,从而获取与所述搜索语句对应的已分词语句之后,将所述已分词语句和所述租户标识发送给所述搜索实例。The apparatus for searching in a cloud search platform according to claim 20, further comprising: a second sending unit configured to perform word segmentation according to the tenant dictionary to obtain a correspondence corresponding to the search sentence After the word segmentation statement, the word segmentation statement and the tenant identifier are sent to the search instance.
PCT/CN2019/070820 2018-01-12 2019-01-08 Method and device for creating index and performing search in cloud search platform WO2019137365A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810031528.3 2018-01-12
CN201810031528.3A CN108280156A (en) 2018-01-12 2018-01-12 A kind of method and apparatus structure index in cloud search platform and scanned for

Publications (1)

Publication Number Publication Date
WO2019137365A1 true WO2019137365A1 (en) 2019-07-18

Family

ID=62803630

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/070820 WO2019137365A1 (en) 2018-01-12 2019-01-08 Method and device for creating index and performing search in cloud search platform

Country Status (3)

Country Link
CN (1) CN108280156A (en)
TW (1) TWI676112B (en)
WO (1) WO2019137365A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280156A (en) * 2018-01-12 2018-07-13 阿里巴巴集团控股有限公司 A kind of method and apparatus structure index in cloud search platform and scanned for

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9378200B1 (en) * 2014-09-30 2016-06-28 Emc Corporation Automated content inference system for unstructured text data
CN107168966A (en) * 2016-03-07 2017-09-15 阿里巴巴集团控股有限公司 A kind of search engine index construction method and device
CN107203532A (en) * 2016-03-16 2017-09-26 阿里巴巴集团控股有限公司 Construction method, the implementation method of search and the device of directory system
CN108280156A (en) * 2018-01-12 2018-07-13 阿里巴巴集团控股有限公司 A kind of method and apparatus structure index in cloud search platform and scanned for

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101499061A (en) * 2008-01-30 2009-08-05 国际商业机器公司 Multi-tenant oriented database engine and its data access method
CN102411590A (en) * 2010-09-21 2012-04-11 英业达股份有限公司 System and method for opening corresponding object file through custom label
CN102930027A (en) * 2012-11-06 2013-02-13 苏州两江科技有限公司 Data processing system and processing method in cloud computing multi-tenant architecture
US9280678B2 (en) * 2013-12-02 2016-03-08 Fortinet, Inc. Secure cloud storage distribution and aggregation
US9760635B2 (en) * 2014-11-07 2017-09-12 Rockwell Automation Technologies, Inc. Dynamic search engine for an industrial environment
TWI546680B (en) * 2014-12-30 2016-08-21 中華電信股份有限公司 Cloud files indexing system and method thereof
CN107038207B (en) * 2017-02-20 2021-03-19 创新先进技术有限公司 Data query method, data processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9378200B1 (en) * 2014-09-30 2016-06-28 Emc Corporation Automated content inference system for unstructured text data
CN107168966A (en) * 2016-03-07 2017-09-15 阿里巴巴集团控股有限公司 A kind of search engine index construction method and device
CN107203532A (en) * 2016-03-16 2017-09-26 阿里巴巴集团控股有限公司 Construction method, the implementation method of search and the device of directory system
CN108280156A (en) * 2018-01-12 2018-07-13 阿里巴巴集团控股有限公司 A kind of method and apparatus structure index in cloud search platform and scanned for

Also Published As

Publication number Publication date
TWI676112B (en) 2019-11-01
CN108280156A (en) 2018-07-13
TW201931171A (en) 2019-08-01

Similar Documents

Publication Publication Date Title
CN111259006B (en) Universal distributed heterogeneous data integrated physical aggregation, organization, release and service method and system
US9501578B2 (en) Dynamic semantic models having multiple indices
US8224772B2 (en) Data management apparatus, method and program
CN105900117B (en) Method and system for collecting, normalizing, matching and enriching data
US9965641B2 (en) Policy-based data-centric access control in a sorted, distributed key-value data store
US11798208B2 (en) Computerized systems and methods for graph data modeling
US9633015B2 (en) Apparatus and methods for user generated content indexing
US11775767B1 (en) Systems and methods for automated iterative population of responses using artificial intelligence
US11030242B1 (en) Indexing and querying semi-structured documents using a key-value store
JP2010541079A5 (en)
WO2013097231A1 (en) File access method and system
CN107103011B (en) Method and device for realizing terminal data search
US11100152B2 (en) Data portal
US20110029538A1 (en) System for creation of content with correlated geospatial and virtual locations by mobile device users
Paulus et al. Gathering and Combining Semantic Concepts from Multiple Knowledge Bases.
CN113377876B (en) Data database processing method, device and platform based on Domino platform
JP2003271584A (en) Document management device, client device, document management system, program and storage medium
US20110153678A1 (en) Configuration information management system, configuration information management method, and distributed information management device
WO2019137365A1 (en) Method and device for creating index and performing search in cloud search platform
CN117171108A (en) Virtual model mapping method and system
Ergüzen et al. An efficient middle layer platform for medical imaging archives
JP3786233B2 (en) Information search method and information search system
TW200928799A (en) Collaborative tagging systems and methods for resources
CN103646034A (en) Web search engine system and search method based content credibility
KR20080049428A (en) Method and apparatus for providing similarity searching services by semantic web

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19739001

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19739001

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19739001

Country of ref document: EP

Kind code of ref document: A1