CN108280156A - A kind of method and apparatus structure index in cloud search platform and scanned for - Google Patents

A kind of method and apparatus structure index in cloud search platform and scanned for Download PDF

Info

Publication number
CN108280156A
CN108280156A CN201810031528.3A CN201810031528A CN108280156A CN 108280156 A CN108280156 A CN 108280156A CN 201810031528 A CN201810031528 A CN 201810031528A CN 108280156 A CN108280156 A CN 108280156A
Authority
CN
China
Prior art keywords
tenant
document
platform
dictionary
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810031528.3A
Other languages
Chinese (zh)
Inventor
葛俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810031528.3A priority Critical patent/CN108280156A/en
Publication of CN108280156A publication Critical patent/CN108280156A/en
Priority to TW107144111A priority patent/TWI676112B/en
Priority to PCT/CN2019/070820 priority patent/WO2019137365A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This specification embodiment discloses a kind of method and apparatus structure index in cloud search platform and scanned for, the wherein described cloud search platform includes the search example for multiple tenants, described search example is that multiple tenants are respectively assigned tenant identification, and the method for the structure index includes the following steps:Tenant's document is obtained, the content of tenant's document includes tenant identification field row, and the tenant identification field row shows the tenant identification of tenant's document tenant;Tenant's dictionary of the tenant is obtained by the tenant identification;Tenant's document is segmented according to tenant's dictionary, document has been segmented to which acquisition is corresponding with tenant's document;And in described search example, index is established to tenant's document according to the field definition table and the document that segmented, including according to the description to the tenant identification field in the field definition table, establishing the index relative of the tenant identification field and tenant's document.

Description

A kind of method and apparatus structure index in cloud search platform and scanned for
Technical field
The present invention relates to cloud search platform fields, built in cloud search platform more particularly, to one kind index and into The method and apparatus of row search.
Background technology
In cloud search platform, function of search is externally sold.User wherein by each purchase service is known as tenant.Tenant Quantity in ten thousand ranks.Different tenants have different dictionary customized demand, different tenants to may require according to different dictionaries Structure index.The implementation of traditional multi-tenant cloud search service is:It is individually real using the search of each tenant as one Example manages, for example 100 tenants are with regard to 100 examples.Structure is identical, but has it by the schema (field definition table) of each example Corresponding dictionary for word segmentation and configuration.Therefore, it is necessary to a kind of to build index in cloud search platform and scanning for more Effective scheme.
Invention content
This specification embodiment is intended to provide a kind of built in cloud search platform and indexes and scan for more effective Scheme, to solve deficiency in the prior art.
To achieve the above object, this specification provides a kind of method building index in cloud search platform on one side, The cloud search platform includes the search example for multiple tenants, and described search example includes suitable for the multiple tenant The unified field definition table of each tenant, described search example are that multiple tenants are respectively assigned tenant identification, the tenant Mark corresponds to tenant for unique mark, and the field definition table includes the description to tenant identification field, tenant's mark Character learning section is associated with the tenant identification, and the method is executed and included the following steps by the cloud search platform:Obtain tenant The content of document, tenant's document includes tenant identification field row, and the tenant identification field row shows tenant's text The tenant identification of shelves tenant;Tenant's dictionary of the tenant is obtained by the tenant identification;According to tenant's dictionary Tenant's document is segmented, document has been segmented to which acquisition is corresponding with tenant's document;And in described search In example, index is established to tenant's document according to the field definition table and the document that segmented, including, according to The description to the tenant identification field in the field definition table, establishes the tenant identification field and tenant's document Index relative.
In one embodiment, the method for index is built in cloud search platform above-mentioned, the cloud search platform is also Including dictionary unit, the dictionary unit is detached with described search example, and the dictionary unit includes the multiple rent Respective tenant's dictionary at family, the method further include, and after obtaining tenant's document, tenant's document are sent to described Dictionary unit.
In one embodiment, the method for index is built in cloud search platform above-mentioned, according to tenant's dictionary Tenant's document is segmented, includes to obtain the document that segmented corresponding with tenant's document:In the dictionary In unit, tenant's document is segmented according to tenant's dictionary, to generate divided corresponding with tenant's document Word document;And from the dictionary unit receive described in segmented document.
In one embodiment, the above-mentioned method that index is built in cloud search platform further includes, according to the tenant Dictionary segments tenant's document, will be described after having segmented document to which acquisition is corresponding with tenant's document It tenant's document and its corresponding described segmented document and is sent to described search example.
In one embodiment, the method for index is built in cloud search platform above-mentioned, the cloud search platform is also Including unified service interface, the service interface is connect with tenant's platform of the multiple tenant, and, wherein obtaining tenant Document includes receiving tenant's original document from tenant's platform by the service interface, being obtained according to tenant's platform The tenant identification, and increase the tenant identification field row in the content of tenant's original document, to obtain State tenant's document.
In one embodiment, the method for index is built in cloud search platform above-mentioned, the cloud search platform is also Including unified service interface, the service interface is connect with tenant's platform of the multiple tenant, and, described in offline progress Method, and the method further includes, and before obtaining tenant's document, is received from tenant's platform by the service interface Tenant's original document, and the tenant identification is obtained according to tenant's platform, increase in the content of tenant's original document Add the tenant identification field row, to generate tenant's document, and tenant's document is stored in the cloud search and is put down In platform.
In one embodiment, the method for index is built in cloud search platform above-mentioned, the cloud search platform is also Including unified service interface, the service interface is connect with tenant's platform of the multiple tenant, and the method is also wrapped It includes:Before obtaining tenant's document, tenant's dictionary is received from tenant's platform by the service interface, according to the tenant Platform obtains the tenant identification, and tenant's dictionary and the tenant identification are associatedly stored in the dictionary unit In.
On the other hand this specification provides a kind of device building index in cloud search platform, the cloud search platform packet The search example for multiple tenants is included, described search example includes the unification of each tenant suitable for the multiple tenant Field definition table, described search example are that multiple tenants are respectively assigned tenant identification, and the tenant identification is used for unique mark Corresponding tenant, the field definition table include the description to tenant identification field, the tenant identification field and the tenant Mark association, described device are implemented by the cloud search platform and include with lower unit:First acquisition unit is configured to, and is obtained The content of tenant's document, tenant's document includes tenant identification field row, and the tenant identification field row shows the rent The tenant identification of family document tenant;Second acquisition unit is configured to, and the rent of the tenant is obtained by the tenant identification Family dictionary;Participle unit is configured to, and is segmented to tenant's document according to tenant's dictionary, to obtain with it is described Tenant's document is corresponding to have segmented document;And unit is established, and it is configured to, it is fixed according to the field in described search example Adopted table and the document that segmented establish index to tenant's document, including according to pair in the field definition table The index relative of the tenant identification field and tenant's document is established in the description of the tenant identification field.
On the other hand this specification provides a kind of method scanned in cloud search platform, the cloud search platform packet The search example for multiple tenants is included, described search example includes the unified field definition suitable for the multiple tenant Table, described search example are that multiple tenants are respectively assigned tenant identification, and the tenant identification corresponds to tenant for unique mark, The field definition table includes the description to tenant identification field, and the tenant identification field is associated with the tenant identification, The cloud search platform further includes unified service interface, and the service interface is connect with tenant's platform of the multiple tenant, The method is executed and is included the following steps by the cloud search platform:Search statement is received from tenant's platform;From the tenant Platform obtains the tenant identification of tenant;Tenant's dictionary of the tenant is obtained by the tenant identification;According to tenant's word Allusion quotation segments described search sentence, and sentence has been segmented to which acquisition is corresponding with described search sentence;In described search reality Example in the tenant identification field and the sentence that segmented are retrieved, in tenant's document of the tenant to described Sentence has been segmented to be retrieved;Tenant's platform is positioned according to the tenant identification;And according to the field definition table to Tenant's platform returns to retrieval result.
In one embodiment, in the above-mentioned method scanned in cloud search platform, the cloud search platform is also Including dictionary unit, the dictionary unit is detached with described search example, and the dictionary unit includes the multiple rent Respective tenant's dictionary at family, the method further include, will after the tenant identification for obtaining tenant according to tenant's platform Described search sentence and tenant identification are sent to the dictionary unit.
In one embodiment, in the above-mentioned method scanned in cloud search platform, according to tenant's dictionary Described search sentence is segmented, having segmented sentence to which acquisition is corresponding with described search sentence includes:In the dictionary In unit, described search sentence is segmented by tenant's dictionary, sentence has been segmented to generate;And from the dictionary Unit has segmented sentence described in receiving.
In one embodiment, the above-mentioned method scanned in cloud search platform further includes, according to the tenant Dictionary segments described search sentence, will be described after having segmented sentence to which acquisition is corresponding with described search sentence It has segmented sentence and the tenant identification is sent to described search example.
On the other hand this specification provides a kind of device scanned in cloud search platform, the cloud search platform packet The search example for multiple tenants is included, described search example includes the unification of each tenant suitable for the multiple tenant Field definition table, described search example are that multiple tenants are respectively assigned tenant identification, and the tenant identification is used for unique mark Corresponding tenant, the field definition table include the description to tenant identification field, the tenant identification field and the tenant Mark association, the cloud search platform further include unified service interface, the tenant of the service interface and the multiple tenant Platform connects, and described device is implemented by the cloud search platform and includes with lower unit:First receiving unit, is configured to, from institute It states tenant's platform and receives search statement;First acquisition unit is configured to, and the tenant identification of tenant is obtained from tenant's platform; Second acquisition unit is configured to, and tenant's dictionary of the tenant is obtained by the tenant identification;Participle unit is configured to, root Described search sentence is segmented according to tenant's dictionary, sentence has been segmented to which acquisition is corresponding with described search sentence; Retrieval unit is configured to, and is retrieved to the tenant identification field and the sentence that segmented in described search example, with The sentence that segmented is retrieved in tenant's document of the tenant;Positioning unit is configured to, and is marked according to the tenant Know and positions tenant's platform;And returning unit, it is configured to, is returned and examined to tenant's platform according to the field definition table Hitch fruit.
In the above-mentioned method and dress that structure is indexed and scanned in cloud search platform according to this specification embodiment In setting, by being handled according to unified logic multiple tenants in individually search example, and tenant's dictionary is independent To search for the service of Example external, the complexity of entire framework is simplified, reduces development cost, and entire frame can be reduced The memory space that structure needs.In addition, this programme can realize the demand of multi-tenant Custom Dictionaries, and can be by increasing server etc. Computing resource reaches the linear increase for supporting tenant's quantity, i.e. system is " linear expansible ".
Description of the drawings
This specification embodiment is described in conjunction with the accompanying drawings, and this specification embodiment can be made clearer:
Fig. 1 schematically illustrates the application scenarios of this specification embodiment;
Fig. 2 shows the methods that index is built in cloud search platform according to this specification one embodiment;
Fig. 3 shows a kind of device 300 building index in cloud search platform according to this specification embodiment;
Fig. 4 shows the method scanned in cloud search platform according to this specification embodiment;And
Fig. 5 shows the device 500 scanned in cloud search platform according to this specification embodiment.
Specific implementation mode
This specification embodiment is described below in conjunction with attached drawing.
Fig. 1 schematically illustrates the application scenarios of this specification embodiment.The application scenarios of this specification embodiment include cloud Search platform 101 and tenant's platform of multiple tenants 102,103,104 etc..Cloud search platform 101 includes unified service interface For being connect with each tenant's platform, which is, for example, restful service interfaces.For example, cloud search platform 101 can lead to Cross service interface from tenant's platform 103 receive the document of tenant, tenant Custom Dictionaries etc., to according to the document of tenant and Custom Dictionaries structure index.When the user of tenant's platform 103 is scanned on tenant's platform 103 using search engine, The search statement of its user is sent to cloud search platform 101 by tenant's platform 103 by service interface.Cloud search platform 101 is logical The field row for crossing such as " tenant ID=xxx " tenant identification of tenant (wherein xxx for) is distinguished tenant and is retrieved, and to rent Family platform 103 returns to retrieval result.
The cloud search platform includes such search example (cluster), includes multiple tenants in the search example, Described search example includes the unified field definition table (schema) of each tenant suitable for the multiple tenant.That is, more The shared search example of a tenant, uses identical field definition table.Here search example is the use in cloud search platform In the independent utility for realizing function of search, searches between example and search example and be logically mutually isolated.
Field definition table (schema) for example can be the form of schema.xml configuration files, and it includes all documents can The field (Field) that can include and all information that how these fields will be handled when establishing document index and inquiry.
For example, the document (that is, raw doc) of initial data generates in the following format:
Id=1
User_id=001
Title=xxxxxxx yyyyy
Content=...origin_title=xxxxxyyyyy
Wherein id, user_id, title, content are exactly the field for including in the document, wherein in "=" is subsequent Appearance is the value of corresponding field.
Table 1 schematically illustrates the field definition table of a search example.
Table 1
Table 1 lists the field for including in the document handled in searching for example:Title (title after participle), Content (text), cat_id (classification id), user_id (tenant id), origin_title (original header).Table 1 also records The information that how these fields will be handled when establishing document index and inquiry.For example, in title this row, engine word " row, positive row " in this row of section indicates, reverse index and forward index will be established to title, and, whether needing to segment It shows in this column and title is segmented by space when indexing.For another example in content this row, engine field " abstract fields " in this row indicate, when showing query result, text is shown in the form of abstract.Certainly, the word in table 1 Duan Dingyi tables are only exemplary, and are only intended to illustrate field definition table, and the field definition in practical application Table can include more fields and can include different field definitions.
Search example can read above-mentioned field definition table when the document for each tenant builds index, and according to table In description structure index.In addition, when returning to search result according to tenant's searching request, search example also can be fixed according to field Description in adopted table shows each field.
Described search example is that multiple tenants therein are respectively assigned tenant identification, and the tenant identification is for uniquely marking Know corresponding tenant.Specifically, search example distinguishes difference by a field (for example, user_id) in field definition table Tenant, for example, " user_id=001 " expression is first tenant, " user_id=002 " indicates second tenant.The rent Family mark can also be used to distinguish the document of tenant, the dictionary of tenant, tenant's platform etc..For example, by by the dictionary of tenant with Tenant identification is associated with, and when using tenant's dictionary, can obtain tenant's dictionary by tenant identification.It is carried out when in searching for example When search, the request of different tenants is distinguished by including " user_id=xxx " in searching request, so as to accomplish to rent It is mutually noiseless between family.For example, specific search command (query) can be:Query=title:Weather AND user_id: Xxx, this search statement are meant that the index for containing " weather " in search title, index while needing the condition met to be User_id is xxx, is equivalent to filter according to tenant's dimension in this way.
It will be seen that, the field definition table that multiple tenants in a search example use is consistent from the description above, but It is that the dictionary of different tenants is different, for example, the article of certain sport categories requires to use the proprietary dictionary of sport, certain medicines Requirement use Medical Dictionary.Example is searched for when building the index of tenant, according to the respective dictionary of tenant to the original of tenant Document is segmented, and is indexed for structure.In this specification one embodiment, the relevant participle logic of tenant is placed on and is searched It is realized except rope example, the functions such as general storage, index, retrieval is only carried out to search for example, without carrying out complicated rent Family dictionary is self-defined.
Specifically, as shown in Figure 1, further including dictionary unit 12, service agent unit 13 and storage in cloud search platform 101 Unit 14.Dictionary unit 12 is detached with search example 11, and the dictionary unit includes the respective of the multiple tenant Tenant's dictionary, the dictionary unit 12 can provide Chinese Word Segmentation Service searching for except example to each tenant.Specifically, when for coming When calling the service of dictionary unit 12 from the document of tenant's platform 102, which uses tenant's dictionary pair of the tenant The document of tenant segments.
Service agent unit 13 is used to act on behalf of the business of cloud search platform 101, and provides http services upwards, respectively with The connections such as tenant's platform (102,103,104 etc.), search example 11, dictionary unit 12, storage unit 14, with transfer in-between Data, and carry out data prediction appropriate.For example, in the case of offline structure index, service agent unit 13 is from tenant Platform 103 receives the original document of tenant, and tenant identification is added in the original document, to obtain tenant's document, and by the rent In the document storage to storage unit 14 of family.When building index, service agent unit 13 obtains tenant's document from storage unit 14, The document is sent to dictionary unit 12 to segment, document has been segmented from the reception of dictionary unit 12.Then, service agent unit 13 by tenant's document and have segmented document and are sent to search example 11, and search example 11 has divided according to field definition table and tenant's Word document is established tenant's document and is indexed, and wherein the index includes the field index of tenant ID.
The composition of cloud search platform 101 shown in FIG. 1 is one embodiment of this specification, does not limit this specification Embodiment.In another embodiment, in cloud search platform, by Dictionary based segment function setting inside search example.To, Tenant's document only is sent to search example, and tenant's document is segmented according to tenant's dictionary by search example.At another In embodiment, in cloud search platform, search example directly (that is, not passing through service agent unit) calls external dictionary unit Service, that is, directly segmented document and tenant's document from the acquisition of dictionary unit, indexed for structure.
The method and apparatus that index is built in cloud search platform according to this specification one embodiment are described below.Figure 2 show the method that index is built in cloud search platform according to this specification one embodiment.
As shown in Fig. 2, in step S21, tenant's document is obtained, the content of tenant's document includes tenant identification field Row, the tenant identification field row show the tenant identification of tenant's document tenant.As it was noted above, the cloud search Platform includes unified service interface, and the service interface is connect with tenant's platform of the multiple tenant.Pass through the service Interface receives tenant's original document from tenant's platform.
For example, in the case of real-time structure index, tenant's original is received from tenant's platform by the service interface Beginning document, and tenant identification " xxx " is obtained according to tenant's platform, increase field in the content of tenant's original document Row " tenant ID=xxx ", to obtain tenant's document.Wherein, for example, when cloud search platform is flat from the tenant of tenant 001 When platform receives tenant's original document, the parameter of such as type=user_001 can be received from tenant's platform simultaneously, so as to Obtain the tenant identification " xxx " of tenant, such as " 001 ".And in the case of offline structure index, by the service interface from Tenant's platform receives tenant's original document, and obtains tenant identification " xxx " according to tenant's platform, in tenant original Increase field row " tenant ID=xxx " row in the content of beginning document, to generate tenant's document, and by tenant's document It is stored in the cloud search platform.To when offline structure index, from the storage unit of cloud search platform described in acquisition Tenant's document.
In one embodiment, it as shown in Figure 1, including service agent unit in cloud search platform, is executed for acting on behalf of Service logic in platform.Be provided with above-mentioned unified service interface on the service agent unit, the service interface with it is multiple Tenant's platform of tenant connects.And the service agent unit is also connect with the storage unit of the cloud search platform.For example, In the case of real-time structure index, the service agent unit receives tenant by the service interface from tenant's platform Original document, and tenant identification " xxx " is obtained according to tenant's platform, increase word in the content of tenant's original document Section row " tenant ID=xxx ", to obtain tenant's document.And in the case of offline structure index, the service agent Unit receives tenant's original document by the service interface from tenant's platform, and obtains tenant according to tenant's platform It identifies " xxx ", increases field row " tenant ID=xxx " row in the content of tenant's original document, to generate the rent Family document, and tenant's document is stored in the storage unit 14 of the cloud search platform.To when offline structure index When, the service agent unit obtains tenant's document from storage unit 14.
In step S22, tenant's dictionary of the tenant is obtained by the tenant identification.The cloud search platform also passes through The service interface receives tenant's dictionary from tenant's platform, obtains tenant identification according to tenant's platform, and will be described Tenant's dictionary is associatedly stored in the tenant identification in the dictionary unit.To, when building index to tenant's document, Can be proposed from platform by tenant identification with the associated tenant's dictionary of tenant identification, and obtain tenant's dictionary.
In step S23, tenant's document is segmented according to tenant's dictionary, to obtain and tenant text Shelves are corresponding to have segmented document.For example, entitled " Beijing group buying websites are long " of original document, in tenant's dictionary of tenant 001 Including entry " purchasing by group " and " website ", then according to tenant's dictionary of tenant 001 to the title segmented the result is that " Beijing group It is long to purchase website ".For another example tenant's dictionary of tenant 002 includes entry " Beijing purchases by group net " and " head of a station ", then according to tenant 002 tenant's dictionary to the title segmented the result is that " Beijing group buying websites are long ".In above-mentioned word segmentation result, with space As the separation between participle, this is merely exemplary, and separation, or table in other forms can also be used as by other characters Show participle, such as is each segmented by structuring.
In one embodiment, it can also be superimposed in dictionary unit and tenant's document is carried out further using acquiescence dictionary Participle.In this case, preferentially using the entry in tenant's dictionary.
In one embodiment, as shown in Figure 1, further including dictionary unit in cloud search platform, which is flat The application detached with described search example on platform, is used to provide Chinese Word Segmentation Service.
Service agent unit receives tenant's dictionary by service interface from tenant's platform, is obtained and is rented according to tenant's platform Family identifies, and tenant's dictionary and the tenant identification are associatedly stored in the dictionary unit.For example, can rent The configuration file of one startup is set in the dictionary of family, and an option of configuration file is key:Dict_path, it is specific to be, for example, user_id_001:/ home/admin/local_dict_1.txt, to be marked tenant's dictionary and tenant by the configuration file Knowledge associates.
To which when building index to tenant, the Chinese Word Segmentation Service of service agent cell call dictionary unit can be passed through.Example Such as, tenant's document is sent to dictionary unit by service agent unit after obtaining tenant's document.Dictionary unit is logical Cross field row " tenant ID=xxx " in tenant's document and obtain tenant identification, and is proposed from its own by tenant identification and Tenant's dictionary of tenant identification associated storage, and obtain tenant's dictionary.Then, in the dictionary unit, according to the rent Family dictionary segments tenant's document, and document has been segmented to generate.Later, the dictionary unit has been by tenant's document and Participle document is sent to service agent unit.
In step S24, in described search example, according to the field definition table and the document that segmented to the rent Family document establishes index, including according to the description to the tenant identification field in the field definition table, establishing institute State the index relative of tenant identification field and tenant's document.For example, the field definition table in search example is as shown in table 1 Table, show in table to establishing the row of falling by the title (title), classification id (cat_id), tenant ID (user_id) of participle Index and forward index.By taking tenant ID as an example, according to the description to tenant's id field in field definition table, the rope in example is searched for Lead device generates the index relative table of tenant ID and tenant's document, including inverted list and positive row's table.
As shown in Figure 1, in one embodiment, when literary to the tenant according to tenant's dictionary in dictionary unit 12 Shelves are segmented, and by tenant's document and have been segmented after document is sent to service agent unit 13, the general of service agent unit 13 Tenant's document and the document that segmented are sent to described search example 11.The content segmented, can be by making an appointment Separator (such as space) be separated, be not in dictionary unit in the case of participle to document with the separator, It can further be handled having segmented document in service agent unit, the document is revised as separating with the agreement Symbol is segmented.In this case, search example is divided after reception has segmented document according to completed participle field Word, for example, being segmented according to space, without carrying out additional word segmentation processing.
To, the document of all tenants can be handled all in accordance with unified logic in searching for example, without Distinguish tenant.That is, the index of search example includes the document of whole tenants, in search, by the way that such as user_id is added =" xxx " distinguishes the request of different tenants, so as to be isolated between tenant.
Fig. 3 shows a kind of device 300 building index in cloud search platform according to this specification embodiment.It is described Cloud search platform includes the search example for multiple tenants, and described search example includes the unification suitable for the multiple tenant Field definition table, described search example is that multiple tenants are respectively assigned tenant identification, and the tenant identification is for uniquely marking Know corresponding tenant, the field definition table includes the description to tenant identification field, the tenant identification field and the rent Family mark association.
As shown in figure 3, the device 300 that index is built in cloud search platform implemented by the cloud search platform and include with Lower unit:First acquisition unit 31, is configured to, and obtains tenant's document, the content of tenant's document includes tenant identification word Duan Hang, the tenant identification field row show the tenant identification of tenant's document tenant;Second acquisition unit 32, configuration To obtain tenant's dictionary by the tenant identification;Participle unit 33, is configured to, according to tenant's dictionary to the tenant Document is segmented, and document has been segmented to which acquisition is corresponding with tenant's document;And unit 34 is established, it is configured to, In described search example, index is established to tenant's document according to the field definition table and the document that segmented, wherein Including according to the description to the tenant identification field in the field definition table, establishing the tenant identification field and institute State the index relative of tenant's document.
In one embodiment, the cloud search platform further includes dictionary unit, and the dictionary unit is real with described search Example separation, and the dictionary unit includes respective tenant's dictionary of the multiple tenant, it is described in cloud search platform The device 300 of structure index further includes that the first transmission unit is configured to, after obtaining tenant's document, by tenant's document It is sent to the dictionary unit.
In one embodiment, tenant's document is segmented according to tenant's dictionary, to obtain with it is described The corresponding document that segmented of tenant's document includes:In the dictionary unit, according to tenant's dictionary to tenant's document It is segmented, document has been segmented so that generation is corresponding with tenant's document;And from the dictionary unit receive described in divided Word document.
In one embodiment, the device 300 that index is built in cloud search platform further includes the second transmission unit, configuration To be segmented to tenant's document according to tenant's dictionary, to obtain divided corresponding with tenant's document After word document, by tenant's document and its corresponding described document segmented and is sent to described search example.
In one embodiment, in the device 300 for building index in cloud search platform, the cloud search platform Further include unified service interface, the service interface is connect with tenant's platform of the multiple tenant, and, it is rented wherein obtaining Family document includes receiving tenant's original document from tenant's platform by the service interface, being obtained according to tenant's platform The tenant identification is taken, and increases the tenant identification field row in the content of tenant's original document, to obtain Tenant's document.
In one embodiment, the device 300 of index, and the dress are built described in offline implementation in cloud search platform Setting 300 further includes, and the first storage unit is configured to, before obtaining tenant's document, by the service interface from the tenant Platform receives tenant's original document, and obtains the tenant identification according to tenant's platform, in tenant's original document Increase the tenant identification field row in content, to generate tenant's document, and tenant's document is stored in described In cloud search platform.
In one embodiment, the device 300 that index is built in cloud search platform further includes the second storage unit, It is configured to, before obtaining tenant's document, tenant's dictionary is received from tenant's platform by the service interface, according to described Tenant's platform obtains tenant identification, and tenant's dictionary and the tenant identification are associatedly stored in the dictionary unit In.
Fig. 4 shows the method scanned in cloud search platform according to this specification embodiment.The cloud search Platform includes the search example for multiple tenants, and described search example includes the unified field suitable for the multiple tenant Table is defined, described search example is that multiple tenants are respectively assigned tenant identification, and the tenant identification is corresponded to for unique mark Tenant, the field definition table include the description to tenant identification field, the tenant identification field and the tenant identification Association, and the cloud search platform includes unified service interface, and the service interface and the tenant of the multiple tenant are flat Platform connects.
As shown in figure 4, in step S41, search statement is received from tenant's platform.Cloud search platform passes through above-mentioned unified clothes Business interface receives the searching request of tenant from tenant's platform.For example, the searching request is search statement " Beijing purchases by group ".
In one embodiment, as shown in Figure 1, cloud search platform 101 includes service agent unit 13, service agent list Member 13 includes the service interface, to be connect with tenant's platform.Service agent unit 13 is connect by above-mentioned unified service Mouth receives the searching request of tenant from tenant's platform.
In step S42, the tenant identification of tenant is obtained from tenant's platform.For example, each tenant's platform can be in its hair Comprising the tenant identification parameter corresponding to tenant's platform in the request string sent.For example, tenant's platform of tenant 001 is flat to cloud search Platform sends the parameter of such as type=user_001.To which the parameter for including in request is gone here and there from tenant's platform can obtain rent The tenant identification " xxx " at family, such as " 001 ".In one embodiment, as shown in Figure 1, it is flat from tenant by service agent layer 13 Platform obtains the tenant identification " xxx " of tenant.
In step S43, tenant's dictionary of the tenant is obtained by the tenant identification.As described above, it is searched in cloud Tenant's dictionary is associatedly stored therein with tenant identification in platform, it is thus possible to be proposed from platform by tenant identification With the associated tenant's dictionary of tenant identification, and tenant's dictionary is obtained.
In one embodiment, as shown in Figure 1, cloud search platform 101 further includes dictionary unit 12.Service agent unit 13 After the tenant identification " xxx " that tenant's platform obtains tenant, described search sentence and tenant identification are sent to dictionary list Member 12.To which, dictionary unit 12 is proposed by tenant identification from its own and the associated tenant's dictionary of tenant identification, and obtain Tenant's dictionary.
In step S44, described search sentence is segmented according to tenant's dictionary, to obtain and described search language Sentence is corresponding to have segmented sentence.For example, when cloud search platform receives search statement " Beijing purchases by group " from tenant's platform of tenant 001 When, tenant's dictionary of tenant 001 includes entry " purchasing by group " and " website ", then according to tenant's dictionary of tenant 001 to the sentence It is being segmented the result is that " Beijing purchases by group ".For another example when cloud search platform receives search statement from tenant's platform of tenant 002 When " Beijing purchases by group ", tenant's dictionary of tenant 002 includes entry " Beijing purchases by group net ", then according to tenant's dictionary of tenant 002 The result segmented to the sentence is still " Beijing purchases by group ".
In one embodiment, it as shown in Figure 1, in the dictionary unit 12, is searched to described by tenant's dictionary Rope sentence is segmented, and sentence has been segmented to generate.Later, the dictionary unit 12 has segmented sentence and tenant identification by described Send back service agent unit 13.Later, service agent unit 13 has segmented sentence and tenant identification is sent to search by described Example 11.
In step S45, the tenant identification field and the sentence that segmented are retrieved in described search example, To be retrieved to the sentence that segmented in tenant's document of the tenant.
For example, the document of tenant 001 (that is, tenant ID=001) includes the document of entitled " Beijing group buying websites are long ", Tenant's dictionary of tenant 001 includes entry " purchasing by group " and " website ", then is carried out to the title according to tenant's dictionary of tenant 001 Participle the result is that " Beijing group buying websites are long ", i.e. the corresponding index entry of the document of tenant 001 includes " Beijing ", " purchasing by group " " website " and " length ".When receiving the search statement of " Beijing purchases by group " from tenant's platform of tenant 001, pass through the rent of tenant 001 The sentence of participle that family dictionary segments the search statement is " Beijing purchases by group "." Beijing " and " purchasing by group " all in the index with The document associations of entitled " Beijing group buying websites are long ", and the document is closed with " tenant ID=001 " in the index simultaneously Join (that is, the document is the document of tenant 001), thus in this case, by searching for example to search statement " Beijing purchases by group " It is retrieved, the document of entitled " Beijing group buying websites are long " will be returned.
For another example the document of tenant 002 (that is, tenant ID=002) includes the text of entitled " Beijing group buying websites are long " Shelves, tenant's dictionary of tenant 002 includes entry " Beijing purchases by group net " and " head of a station ", then according to tenant's dictionary pair of tenant 002 It is that the title is segmented the result is that " Beijing group buying websites are long ", i.e. the corresponding index entry of the document of tenant 002 includes " north Capital purchases by group net ", " head of a station ".When receiving the search statement of " Beijing purchases by group " from tenant's platform of tenant 002, pass through tenant 002 Tenant's dictionary sentence of participle that the search statement is segmented still be " Beijing purchases by group "." Beijing purchases by group " in the index not With the document associations of entitled " Beijing group buying websites are long ", to, in this case by search for example to search statement " Beijing purchases by group " is retrieved, and the document of entitled " Beijing group buying websites are long " will not be returned.
In one embodiment, as shown in Figure 1, working as in searching for example 11 according to field row " tenant ID=xxx " and institute It states and has segmented after sentence retrieved, search example 11 is by the retrieval result and tenant identification that shows according to field definition table hair Give service agent unit 13.
In step S46, tenant's platform is positioned according to the tenant identification.In this specification embodiment, cloud search Platform is connect by unified service interface with tenant's platform of multiple tenants, to which when returning the result, platform can pass through rent Tenant's platform of searching request is sent before the mark location of family.
In one embodiment, as shown in Figure 1, service agent unit 13 passes through the tenant identification from the search reception of example 11 And tenant's platform of searching request is sent before positioning.
Finally, in step S47, retrieval result is returned to tenant's platform according to the field definition table.For example, with reference to Field definition table shown in Table 1 above is limited to " abstract word in table in " engine field " row to content (text) rows Section ", that is, when returning to search result, content is shown in abstract fields.It can also be obtained according to the table, in abstract also Show cat_id (classification id) and origin_title (original header).
It in one embodiment, will be real from search as shown in Figure 1, service agent unit 13 is after positioning tenant's platform The retrieval result shown according to field definition table that example 11 receives returns to tenant's platform.
Fig. 5 shows the device 500 scanned in cloud search platform according to this specification embodiment.The cloud is searched Suo Pingtai includes the search example for multiple tenants, and described search example includes each tenant suitable for the multiple tenant Unified field definition table, described search example be multiple tenants be respectively assigned tenant identification, the tenant identification is used for Unique mark corresponds to tenant, and the field definition table includes the description to tenant identification field, the tenant identification field with Tenant identification association, the cloud search platform further include unified service interface, the service interface and the multiple rent Tenant's platform at family connects.
As shown in figure 5, the device 500 scanned in cloud search platform is implemented and is wrapped by the cloud search platform It includes with lower unit:First receiving unit 51, is configured to, and search statement is received from tenant's platform;First acquisition unit 52, matches It is set to, the tenant identification of tenant is obtained from tenant's platform;Second acquisition unit 53, is configured to, and passes through the tenant identification Obtain tenant's dictionary of the tenant;Participle unit 54, is configured to, and is divided described search sentence according to tenant's dictionary Word has segmented sentence to which acquisition is corresponding with described search sentence;Retrieval unit 55, is configured to, in described search example The tenant identification field and the sentence that segmented are retrieved, to have divided described in tenant's document of the tenant Word sentence is retrieved;Positioning unit 56, is configured to, and tenant's platform is positioned according to the tenant identification;And it returns single Member 57, is configured to, and retrieval result is returned to tenant's platform according to the field definition table.
In one embodiment, the cloud search platform further includes dictionary unit, and the dictionary unit is real with described search Example separation, and the dictionary unit includes respective tenant's dictionary of the multiple tenant, it is described in cloud search platform The device 500 scanned for further includes the first transmission unit, is configured to, and is marked in the tenant for obtaining tenant according to tenant's platform After knowledge, described search sentence and tenant identification are sent to the dictionary unit.
In one embodiment, in the device 500 scanned in cloud search platform, according to tenant's word Allusion quotation segments described search sentence, and having segmented sentence to which acquisition is corresponding with described search sentence includes:In institute's predicate In allusion quotation unit, described search sentence is segmented by tenant's dictionary, sentence has been segmented to generate;And from institute's predicate Allusion quotation unit has segmented sentence described in receiving.
In one embodiment, the device 500 scanned in cloud search platform further includes that second sends list Member is configured to, and after receiving the tenant identification from the dictionary unit, sentence and the tenant identification have been segmented by described It is sent to described search example.
This specification embodiment further includes a kind of computer-readable storage medium, is stored thereon with instruction code, described When instruction code executes in a computer, enables computer execute and rope is built in cloud search platform according to this specification embodiment The method drawn and scanned for.
In the above-mentioned method and dress that structure is indexed and scanned in cloud search platform according to this specification embodiment In setting, by being handled according to unified logic multiple tenants in individually search example, and tenant's dictionary is independent To search for the service of Example external, the complexity of entire framework is simplified, reduces development cost, and entire frame can be reduced The memory space that structure needs.In addition, this programme can realize the demand of multi-tenant Custom Dictionaries, and can be by increasing server etc. Computing resource reaches the linear increase for supporting tenant's quantity, i.e. system is " linear expansible ".
Those of ordinary skill in the art should further appreciate that, be described in conjunction with the embodiments described herein Each exemplary unit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clear Illustrate to Chu the interchangeability of hardware and software, generally describes each exemplary group according to function in the above description At and step.These functions hold track with hardware or software mode actually, depending on technical solution specific application and set Count constraints.Those of ordinary skill in the art can be described to be realized using distinct methods to each specific application Function, but this realization is it is not considered that exceed scope of the present application.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can hold track with hardware, processor Software module or the combination of the two implement.Software module can be placed in random access memory (RAM), memory, read-only storage Device (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology neck In any other form of storage medium well known in domain.
Above-described specific implementation mode has carried out further the purpose of the present invention, technical solution and advantageous effect It is described in detail, it should be understood that the foregoing is merely the specific implementation mode of the present invention, is not intended to limit the present invention Protection domain, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims (22)

1. a kind of method building index in cloud search platform, the cloud search platform includes the search reality for multiple tenants Example, described search example include the unified field definition table suitable for the multiple tenant, and described search example is multiple rents Family is respectively assigned tenant identification, and the tenant identification corresponds to tenant for unique mark, and the field definition table includes pair The description of tenant identification field, the tenant identification field are associated with the tenant identification, and the method is searched for flat by the cloud Platform executes and includes the following steps:
Tenant's document is obtained, the content of tenant's document includes tenant identification field row, and the tenant identification field row shows Go out the tenant identification of tenant's document tenant;
Tenant's dictionary of the tenant is obtained by the tenant identification;
Tenant's document is segmented according to tenant's dictionary, to obtain segmented corresponding with tenant's document Document;And
In described search example, rope is established to tenant's document according to the field definition table and the document that segmented Draw, including according to the description to the tenant identification field in the field definition table, establishing the tenant identification word The index relative of section and tenant's document.
2. the method according to claim 1 for building index in cloud search platform, wherein the cloud search platform also wraps Dictionary unit is included, the dictionary unit is detached with described search example, and the dictionary unit includes the multiple tenant Respective tenant's dictionary, the method further includes,
After obtaining tenant's document, tenant's document is sent to the dictionary unit.
3. the method according to claim 2 for building index in cloud search platform, wherein according to tenant's dictionary pair Tenant's document segments, and includes to obtain the document that segmented corresponding with tenant's document:
In the dictionary unit, tenant's document is segmented according to tenant's dictionary, to generate and the tenant Document is corresponding to have segmented document;And
From the dictionary unit receive described in segmented document.
4. the method according to claim 2 for building index in cloud search platform, further includes, according to tenant's word Allusion quotation segments tenant's document, after having segmented document to which acquisition is corresponding with tenant's document, by the rent It family document and its corresponding described segmented document and is sent to described search example.
5. the method for building index in cloud search platform according to any one of claim 1-4, wherein the cloud is searched Suo Pingtai further includes unified service interface, and the service interface is connect with tenant's platform of the multiple tenant, and, wherein Obtaining tenant's document includes, and tenant's original document is received from tenant's platform by the service interface, according to the tenant Platform obtains the tenant identification, and increases the tenant identification field row in the content of tenant's original document, from And obtain tenant's document.
6. the method for building index in cloud search platform according to any one of claim 1-4, wherein the cloud is searched Suo Pingtai further includes unified service interface, and the service interface is connect with tenant's platform of the multiple tenant, and, offline The method is carried out, and the method further includes, before obtaining tenant's document, by the service interface from the tenant Platform receives tenant's original document, and obtains the tenant identification according to tenant's platform, in tenant's original document Increase the tenant identification field row in content, to generate tenant's document, and tenant's document is stored in described In cloud search platform.
7. the method for building index in cloud search platform according to any one of claim 2-4, wherein the cloud is searched Suo Pingtai further includes unified service interface, and the service interface is connect with tenant's platform of the multiple tenant, and described Method further includes:Before obtaining tenant's document, tenant's dictionary is received from tenant's platform by the service interface, according to Tenant's platform obtains the tenant identification, and tenant's dictionary and the tenant identification are associatedly stored in institute's predicate In allusion quotation unit.
8. a kind of device building index in cloud search platform, the cloud search platform includes the search reality for multiple tenants Example, described search example include the unified field definition table suitable for the multiple tenant, and described search example is multiple rents Family is respectively assigned tenant identification, and the tenant identification corresponds to tenant for unique mark, and the field definition table includes pair The description of tenant identification field, the tenant identification field are associated with the tenant identification, and described device is searched for flat by the cloud Platform is implemented and includes with lower unit:
First acquisition unit is configured to, and obtains tenant's document, and the content of tenant's document includes tenant identification field row, The tenant identification field row shows the tenant identification of tenant's document tenant;
Second acquisition unit is configured to, and tenant's dictionary of the tenant is obtained by the tenant identification;
Participle unit is configured to, and is segmented to tenant's document according to tenant's dictionary, to obtain and the tenant Document is corresponding to have segmented document;And
Unit is established, is configured to, in described search example, according to the field definition table and the document that segmented to described Tenant's document establishes index, including according to the description to the tenant identification field in the field definition table, foundation The index relative of the tenant identification field and tenant's document.
9. the device according to claim 8 for building index in cloud search platform, wherein the cloud search platform also wraps Dictionary unit is included, the dictionary unit is detached with described search example, and the dictionary unit includes the multiple tenant Respective tenant's dictionary, described device further includes,
First transmission unit, is configured to, and after obtaining tenant's document, tenant's document is sent to the dictionary unit.
10. the device according to claim 9 for building index in cloud search platform, wherein according to tenant's dictionary Tenant's document is segmented, includes to obtain the document that segmented corresponding with tenant's document:
In the dictionary unit, tenant's document is segmented according to tenant's dictionary, to generate and the tenant Document is corresponding to have segmented document;And
From the dictionary unit receive described in segmented document.
11. the device according to claim 9 for building index in cloud search platform, further includes, the second transmission unit is matched It is set to, tenant's document is being segmented according to tenant's dictionary, it is corresponding with tenant's document to obtain After segmenting document, by tenant's document and its corresponding described document segmented and is sent to described search example.
12. the device for building index in cloud search platform according to any one of claim 8-11, wherein the cloud Search platform further includes unified service interface, and the service interface is connect with tenant's platform of the multiple tenant, and, Middle acquisition tenant's document includes tenant's original document being received from tenant's platform by the service interface, according to the rent Family platform obtains the tenant identification, and increases the tenant identification field row in the content of tenant's original document, To obtain tenant's document.
13. the device for building index in cloud search platform according to any one of claim 8-11, wherein the cloud Search platform further includes unified service interface, and the service interface is connect with tenant's platform of the multiple tenant, and, In implement described device offline, and described device further includes the first storage unit, is configured to, before obtaining tenant's document, Tenant's original document is received from tenant's platform by the service interface, and the tenant is obtained according to tenant's platform Mark, increases the tenant identification field row in the content of tenant's original document, to generate tenant's document, and Tenant's document is stored in the cloud search platform.
14. the device for building index in cloud search platform according to any one of claim 9-11, wherein the cloud Search platform further includes unified service interface, and the service interface is connect with tenant's platform of the multiple tenant, and its Described in device further include that the second storage unit is configured to, before obtaining tenant's document, by the service interface from institute It states tenant's platform and receives tenant's dictionary, the tenant identification is obtained according to tenant's platform, and by tenant's dictionary and institute Tenant identification is stated associatedly to be stored in the dictionary unit.
15. a kind of method scanned in cloud search platform, the cloud search platform includes the search for multiple tenants Example, described search example include the unified field definition table suitable for the multiple tenant, and described search example is multiple Tenant is respectively assigned tenant identification, and the tenant identification corresponds to tenant for unique mark, and the field definition table includes Description to tenant identification field, the tenant identification field are associated with the tenant identification, and the cloud search platform further includes Unified service interface, the service interface are connect with tenant's platform of the multiple tenant, and the method is searched for by the cloud Platform executes and includes the following steps:
Search statement is received from tenant's platform;
The tenant identification of tenant is obtained from tenant's platform;
Tenant's dictionary of the tenant is obtained by the tenant identification;
Described search sentence is segmented according to tenant's dictionary, to obtain segmented corresponding with described search sentence Sentence;
The tenant identification field and the sentence that segmented are retrieved in described search example, with the tenant's The sentence that segmented is retrieved in tenant's document;
Tenant's platform is positioned according to the tenant identification;And
According to the field definition table retrieval result is returned to tenant's platform.
16. the method according to claim 15 scanned in cloud search platform, wherein the cloud search platform is also Including dictionary unit, the dictionary unit is detached with described search example, and the dictionary unit includes the multiple rent Respective tenant's dictionary at family, the method further include,
After the tenant identification for obtaining tenant according to tenant's platform, described search sentence and tenant identification are sent to institute Predicate allusion quotation unit.
17. the method according to claim 16 scanned in cloud search platform, wherein according to tenant's dictionary Described search sentence is segmented, having segmented sentence to which acquisition is corresponding with described search sentence includes:
In the dictionary unit, described search sentence is segmented by tenant's dictionary, sentence has been segmented to generate; And
From the dictionary unit receive described in segmented sentence.
18. the method according to claim 16 scanned in cloud search platform, further includes, according to the tenant Dictionary segments described search sentence, will be described after having segmented sentence to which acquisition is corresponding with described search sentence It has segmented sentence and the tenant identification is sent to described search example.
19. a kind of device scanned in cloud search platform, the cloud search platform includes the search for multiple tenants Example, described search example include the unified field definition table suitable for the multiple tenant, and described search example is multiple Tenant is respectively assigned tenant identification, and the tenant identification corresponds to tenant for unique mark, and the field definition table includes Description to tenant identification field, the tenant identification field are associated with the tenant identification, and the cloud search platform further includes Unified service interface, the service interface are connect with tenant's platform of the multiple tenant, and described device is searched for by the cloud Platform is implemented and includes with lower unit:
First receiving unit, is configured to, and search statement is received from tenant's platform;
First acquisition unit is configured to, and the tenant identification of tenant is obtained from tenant's platform;
Second acquisition unit is configured to, and tenant's dictionary of the tenant is obtained by the tenant identification;
Participle unit is configured to, and is segmented to described search sentence according to tenant's dictionary, to acquisition and described search Sentence is corresponding to have segmented sentence;
Retrieval unit is configured to, and is examined to the tenant identification field and the sentence that segmented in described search example Rope, to be retrieved to the sentence that segmented in tenant's document of the tenant;
Positioning unit is configured to, and tenant's platform is positioned according to the tenant identification;And
Returning unit is configured to, and retrieval result is returned to tenant's platform according to the field definition table.
20. the device according to claim 19 scanned in cloud search platform, wherein the cloud search platform is also Including dictionary unit, the dictionary unit is detached with described search example, and the dictionary unit includes the multiple rent Respective tenant's dictionary at family, described device further include,
First transmission unit, is configured to, after the tenant identification for obtaining tenant according to tenant's platform, by described search language Sentence and tenant identification are sent to the dictionary unit.
21. the device according to claim 20 scanned in cloud search platform, wherein according to tenant's dictionary Described search sentence is segmented, having segmented sentence to which acquisition is corresponding with described search sentence includes:
In the dictionary unit, described search sentence is segmented by tenant's dictionary, sentence has been segmented to generate; And
From the dictionary unit receive described in segmented sentence.
22. the device according to claim 20 scanned in cloud search platform, further includes, the second transmission unit, It is configured to, described search sentence is segmented according to tenant's dictionary, it is corresponding with described search sentence to obtain After having segmented sentence, sentence has been segmented and the tenant identification is sent to described search example by described.
CN201810031528.3A 2018-01-12 2018-01-12 A kind of method and apparatus structure index in cloud search platform and scanned for Pending CN108280156A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201810031528.3A CN108280156A (en) 2018-01-12 2018-01-12 A kind of method and apparatus structure index in cloud search platform and scanned for
TW107144111A TWI676112B (en) 2018-01-12 2018-12-07 Method and apparatus for building an index and searching in a cloud search platform
PCT/CN2019/070820 WO2019137365A1 (en) 2018-01-12 2019-01-08 Method and device for creating index and performing search in cloud search platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810031528.3A CN108280156A (en) 2018-01-12 2018-01-12 A kind of method and apparatus structure index in cloud search platform and scanned for

Publications (1)

Publication Number Publication Date
CN108280156A true CN108280156A (en) 2018-07-13

Family

ID=62803630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810031528.3A Pending CN108280156A (en) 2018-01-12 2018-01-12 A kind of method and apparatus structure index in cloud search platform and scanned for

Country Status (3)

Country Link
CN (1) CN108280156A (en)
TW (1) TWI676112B (en)
WO (1) WO2019137365A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019137365A1 (en) * 2018-01-12 2019-07-18 阿里巴巴集团控股有限公司 Method and device for creating index and performing search in cloud search platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101499061A (en) * 2008-01-30 2009-08-05 国际商业机器公司 Multi-tenant oriented database engine and its data access method
CN102930027A (en) * 2012-11-06 2013-02-13 苏州两江科技有限公司 Data processing system and processing method in cloud computing multi-tenant architecture
CN107038207A (en) * 2017-02-20 2017-08-11 阿里巴巴集团控股有限公司 A kind of data query method, data processing method and device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411590A (en) * 2010-09-21 2012-04-11 英业达股份有限公司 System and method for opening corresponding object file through custom label
US9280678B2 (en) * 2013-12-02 2016-03-08 Fortinet, Inc. Secure cloud storage distribution and aggregation
US9378200B1 (en) * 2014-09-30 2016-06-28 Emc Corporation Automated content inference system for unstructured text data
US9760635B2 (en) * 2014-11-07 2017-09-12 Rockwell Automation Technologies, Inc. Dynamic search engine for an industrial environment
TWI546680B (en) * 2014-12-30 2016-08-21 中華電信股份有限公司 Cloud files indexing system and method thereof
CN107168966B (en) * 2016-03-07 2020-10-20 创新先进技术有限公司 Search engine index construction method and device
CN107203532B (en) * 2016-03-16 2021-03-16 阿里巴巴集团控股有限公司 Index system construction method, search realization method and device
CN108280156A (en) * 2018-01-12 2018-07-13 阿里巴巴集团控股有限公司 A kind of method and apparatus structure index in cloud search platform and scanned for

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101499061A (en) * 2008-01-30 2009-08-05 国际商业机器公司 Multi-tenant oriented database engine and its data access method
CN102930027A (en) * 2012-11-06 2013-02-13 苏州两江科技有限公司 Data processing system and processing method in cloud computing multi-tenant architecture
CN107038207A (en) * 2017-02-20 2017-08-11 阿里巴巴集团控股有限公司 A kind of data query method, data processing method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
OUTSANDING: "多租户过程记录一", 《CSDN》 *
毛无语666: "jieba分词工具的使用", 《博客园》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019137365A1 (en) * 2018-01-12 2019-07-18 阿里巴巴集团控股有限公司 Method and device for creating index and performing search in cloud search platform

Also Published As

Publication number Publication date
WO2019137365A1 (en) 2019-07-18
TWI676112B (en) 2019-11-01
TW201931171A (en) 2019-08-01

Similar Documents

Publication Publication Date Title
US20170337260A1 (en) Method and device for storing data
US11899681B2 (en) Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium
US10872236B1 (en) Layout-agnostic clustering-based classification of document keys and values
WO2015192655A1 (en) Method and device for establishing and using user recommendation model in social network
CN109635120B (en) Knowledge graph construction method and device and storage medium
US8161059B2 (en) Method and apparatus for collecting entity aliases
Khusro et al. On methods and tools of table detection, extraction and annotation in PDF documents
EP4040310A1 (en) Image and text data hierarchical classifiers
US20230214895A1 (en) Methods and systems for product discovery in user generated content
CN104765729B (en) A kind of cross-platform microblogging community account matching process
WO2017157200A1 (en) Characteristic keyword extraction method and device
US20200226168A1 (en) Methods and systems for optimizing display of user content
CN103491089B (en) Code-transferring method and system in a kind of data convert based on HTTP
US11880401B2 (en) Template generation using directed acyclic word graphs
US20210157845A1 (en) Systems, apparatuses, and methods for document querying
US11314819B2 (en) Systems, apparatuses, and method for document ingestion
US11321329B1 (en) Systems, apparatuses, and methods for document querying
CA2912460A1 (en) Method and system of intelligent generation of structured data and object discovery from the web using text, images, video and other data
JP2017220204A (en) Method and system for matching images with content using whitelists and blacklists in response to search query
CN104267974B (en) The call method and device of business interface
US20210019511A1 (en) Systems and methods for extracting data from an image
WO2021114634A1 (en) Text annotation method, device, and storage medium
CN109241299A (en) Multimedia resource searching method, device, storage medium and equipment
US20230030560A1 (en) Methods and systems for tagged image generation
CN107316248A (en) A kind of system and method for increasing explanatory note for picture and generating blog article

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1256517

Country of ref document: HK

TA01 Transfer of patent application right

Effective date of registration: 20201020

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201020

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20180713

RJ01 Rejection of invention patent application after publication