CN110019645A - Index base construction method, searching method and device - Google Patents

Index base construction method, searching method and device Download PDF

Info

Publication number
CN110019645A
CN110019645A CN201710901601.3A CN201710901601A CN110019645A CN 110019645 A CN110019645 A CN 110019645A CN 201710901601 A CN201710901601 A CN 201710901601A CN 110019645 A CN110019645 A CN 110019645A
Authority
CN
China
Prior art keywords
poi
index
information
character
piecemeal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710901601.3A
Other languages
Chinese (zh)
Other versions
CN110019645B (en
Inventor
谭鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Network Technology Co.,Ltd.
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201710901601.3A priority Critical patent/CN110019645B/en
Publication of CN110019645A publication Critical patent/CN110019645A/en
Application granted granted Critical
Publication of CN110019645B publication Critical patent/CN110019645B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a kind of index base construction method, searching method and devices.Wherein, base construction method is indexed, comprising: be scanned to the data source of point of interest Poi, determine the attribute information and urban information of each Poi;The inverted index of attribute information creation Poi according to each Poi;Cutting is carried out according to inverted index of the urban information to Poi, obtains the index piecemeal based on urban information;Poi index database is constructed according to the index piecemeal.The embodiment of the present invention constructs Poi index database based on the index piecemeal in city, realize the purpose stored according to the affiliated city Poi to Poi, to can corresponding index piecemeal scans in the Poi index database according to the affiliated city Poi during subsequent searches, search range is reduced, and then improves search efficiency.

Description

Index base construction method, searching method and device
Technical field
The present invention relates to search technique field, more particularly to a kind of index base construction method, a kind of based on index database Searching method, a kind of index database construction device, a kind of searcher based on index database, a kind of equipment and a kind of readable storage medium Matter.
Background technique
With mobile Internet fast development, more and more users obtain service using network, are such as based on point of interest The service of information network of (Point of Interest, Poi).Poi refers to the interested data of people, such as restaurant, sight spot, Road informations such as the buildings such as school or Jing Zang high speed etc..
Currently, more and more users use mobile terminal, Poi search is carried out on map.Specifically, existing Poi is searched The query word that rope scheme is inputted usually in accordance with user determines corresponding Poi title, and according to Poi title map differently Corresponding Poi is searched on point as a result, being supplied to user's selection with the Poi result that will be searched.These Poi results may be with The city that family is currently located, it is also possible to which user is currently located the surrounding cities in city.Obviously, existing Poi search plan is main It is to be scanned for according to Poi title, search range is excessive, influences search efficiency.
Summary of the invention
The technical problem to be solved is that provide a kind of index base construction method and a kind of based on index to the embodiment of the present invention The searching method in library, to improve search efficiency.
Correspondingly, the embodiment of the invention also provides a kind of index database construction devices, a kind of search dress based on index database It sets, a kind of equipment and a kind of readable storage medium storing program for executing, to guarantee the implementation and application of the above method.
To solve the above-mentioned problems, the embodiment of the invention discloses a kind of index base construction methods, comprising: to point of interest The data source of Poi is scanned, and determines the attribute information and urban information of each Poi;Attribute information according to each Poi creates Poi Inverted index;Cutting is carried out according to inverted index of the urban information to Poi, obtains the index based on urban information point Block;Poi index database is constructed according to the index piecemeal.
Optionally, the attribute information is what is determined based on the inquiry field of Poi, and the inverted index of the Poi includes: to be based on The inverted index of single character.The inverted index of the attribute information creation Poi according to each Poi, comprising: respectively from each Poi Attribute information in, extract each Poi address domain information and/or title domain information;The title word for including to each Poi title-domain information The address character that symbol and address domain information include is counted, and determines the inverted index based on single character.
Optionally, described to construct Poi index database according to the index piecemeal, comprising: to construct the inverted index and index of Poi The corresponding relationship of piecemeal, the inverted index include at least one of the following: Poi name index and Poi allocation index;Based on described The inverted index of Poi and the corresponding relationship of index piecemeal, construct Poi index database.
Optionally, the name character for including to each Poi title-domain information counts, and determines the row of falling based on single character Index, comprising: name character included in the title domain information is counted, determines the corresponding frequency of each name character Rate;According to the corresponding frequency of each name character, determine that the table of falling row chain of each name character, the table of falling row chain include: title word Symbol number, character position and Poi temperature;For the title domain information of each Poi, the row chain based on the name character Table constructs Poi name index.
Optionally, the address character for including to each Poi address domain information counts, and determines the row of falling based on single character Index, comprising: address character included in the address domain information is counted, determines the corresponding frequency of each address character Rate;According to the corresponding frequency of each address character, determine that the table of falling row chain of each address character, the table of falling row chain include: address word Symbol number, character position and Poi temperature;For the address domain information of each Poi, the row chain based on the address character Table constructs Poi allocation index.
Optionally, cutting is carried out according to inverted index of the urban information to Poi, obtains the index based on urban information Piecemeal, comprising: carry out cutting according to inverted index of the urban information to each Poi, obtain the corresponding Poi's of each urban information Inverted index;Based on the inverted index of the corresponding Poi of same urban information, corresponding index piecemeal is generated.
Optionally, before inverted index of the determination based on single character, the method also includes: obtain historical search Data, the historical search data include that input method record data, webpage click historical data and the map of user clicks history Data;Historical data is clicked to input method record data, webpage click historical data and the map in the historical search data Comprehensive analysis is carried out, the corresponding Poi temperature of each character is obtained.
The embodiment of the invention also discloses a kind of searching methods based on index database, comprising: receives the inquiry of user's input Information, the point of interest Poi character information inputted in search process comprising the user in the query information;From the inquiry Poi character information is identified in information, and the geographical location being currently located based on user determines urban information belonging to Poi;According to According to urban information search index library belonging to the Poi, target inverted index corresponding with the Poi character information, institute are obtained Stating index database includes the index piecemeal based on urban information, and the index piecemeal includes the inverted index of Poi;According to the target Inverted index generates at least one Poi as a result, and being shown to the Poi result.
Optionally, Poi character information is identified from the query information, comprising: the query information is segmented, Obtain at least one word segmentation result;Part-of-speech tagging is carried out to the word segmentation result, obtains corresponding attribute information;According to described point Word result and the attribute information are analyzed, and determine that Poi character information, the Poi character information include name character letter Breath and/or address character information.
Optionally, it according to urban information search index library belonging to the Poi, obtains corresponding with the Poi character information Target inverted index, comprising: in index database, search corresponding with urban information belonging to Poi target index piecemeal; In target index piecemeal, the table of falling row chain for the character that the Poi character information is included is inquired;According to the row chain of falling inquired Table determines that target inverted index, the target inverted index include title Domain Index and/or address Domain Index.
Optionally, at least one Poi result is generated according to the target inverted index, comprising: to the title Domain Index It is merged with address Domain Index, obtains fused Poi index;The table of falling row chain of character is corresponded to based on the Poi index, it is raw At at least one Poi as a result, the table of falling row chain includes: character number, character position and Poi temperature.
Optionally, described that Poi result is shown, comprising: for each Poi as a result, arranging from the character Corresponding Poi temperature and character position are obtained in chained list;Poi temperature, character position and the query information that foundation obtains, Processing is ranked up to the Poi result, determines that each Poi result is corresponding and puts in order;According to putting in order to each Poi result It is shown.
The embodiment of the invention also discloses a kind of index database construction devices, comprising: data source scan module, for interest The data source of point Poi is scanned, and determines the attribute information and urban information of each Poi;Index creation module, for according to each The inverted index of the attribute information creation Poi of Poi;Cutting module is indexed, for the row's of falling rope according to the urban information to Poi Row cutting is introduced, the index piecemeal based on urban information is obtained;Index database constructs module, for constructing according to the index piecemeal Poi index database.
Optionally, the attribute information is what is determined based on the inquiry field of Poi, and the inverted index of the Poi includes: to be based on The inverted index of single character.Index creation module includes: attribute information extracting sub-module, for respectively from the attribute of each Poi In information, each Poi address domain information and/or title domain information are extracted;Inverted index determines submodule, for each Poi title The address character that the name character and address domain information that domain information includes include is counted, and determines falling based on single character Row's index.
Optionally, index database building module includes: corresponding relationship building submodule, for construct the inverted index of Poi with The corresponding relationship of piecemeal is indexed, the inverted index includes at least one of the following: Poi name index and Poi allocation index;Index Library constructs submodule, for the corresponding relationship of inverted index and index piecemeal based on the Poi, constructs Poi index database.
Optionally, inverted index determines that submodule includes: name character statistic unit, for in the title domain information The name character for being included is counted, and determines the corresponding frequency of each name character;First determination unit, for according to each title The corresponding frequency of character determines that the table of falling row chain of each name character, the table of falling row chain include: name character number, character bit It sets and Poi temperature;Name index construction unit is based on the name character for being directed to the title domain information of each Poi Fall row list construction Poi name index.
Optionally, inverted index determines that submodule includes: address character statistic unit, for in the address domain information The address character for being included is counted, and determines the corresponding frequency of each address character;Second determination unit, for according to each address The corresponding frequency of character determines that the table of falling row chain of each address character, the table of falling row chain include: address character number, character bit It sets and Poi temperature;Allocation index construction unit is based on the address character for being directed to the address domain information of each Poi Fall row list construction Poi allocation index.
Optionally, index cutting module includes: inverted index cutting submodule, is used for according to the urban information to each The inverted index of Poi carries out cutting, obtains the inverted index of the corresponding Poi of each urban information;It indexes piecemeal and generates submodule, use In the inverted index based on the corresponding Poi of same urban information, corresponding index piecemeal is generated.
Optionally, described device further include: historical data obtains module, for obtaining historical search data, the history Search data include that input method record data, webpage click historical data and the map of user clicks historical data;Comprehensive analysis Module, for clicking history to input method record data, webpage click historical data and the map in the historical search data Data carry out comprehensive analysis, obtain the corresponding Poi temperature of each character.
The embodiment of the invention also discloses a kind of searchers based on index database, comprising: information receiving module, for connecing Receive the query information of user's input, the point of interest Poi word inputted in search process in the query information comprising the user Accord with information;Poi identification module, for identifying Poi character information from the query information, and be currently located based on user Geographical location determines urban information belonging to Poi;Search index module, for being inquired according to urban information belonging to the Poi Index database obtains target inverted index corresponding with the Poi character information, and the index database includes the rope based on urban information Draw piecemeal, the index piecemeal includes the inverted index of Poi;As a result display module, for raw according to the target inverted index At at least one Poi as a result, and being shown to the Poi result.
Optionally, Poi identification module includes: participle submodule, for segmenting to the query information, is obtained at least One word segmentation result;Submodule is marked, for carrying out part-of-speech tagging to the word segmentation result, obtains corresponding attribute information;Point It analyses submodule and determines Poi character information, the Poi for being analyzed according to the word segmentation result and the attribute information Character information includes name character information and/or address character information.
Optionally, search index module includes: block research submodule, for searching and the Poi institute in index database The corresponding target of the urban information of category indexes piecemeal;Chained list searches submodule, for indexing in piecemeal in target, described in inquiry The table of falling row chain for the character that Poi character information is included;It indexes and determines submodule, for the table of falling row chain that foundation inquires, really Set the goal inverted index, and the target inverted index includes title Domain Index and/or address Domain Index.
Optionally, as a result display module includes: index fusion submodule, for the title Domain Index and address field rope Row fusion is introduced, fused Poi index is obtained;As a result submodule is generated, for corresponding to falling for character based on the Poi index Row chain table generates at least one Poi as a result, the table of falling row chain includes: character number, character position and Poi temperature.
Optionally, as a result display module includes: acquisition submodule, for being directed to each Poi as a result, falling from the character Corresponding Poi temperature and character position are obtained in row chain table;Sorting sub-module, for the Poi temperature according to acquisition, character position And the query information, processing is ranked up to the Poi result, determines that each Poi result is corresponding and puts in order;Show son Module, for being shown according to putting in order to each Poi result.
It include memory and one or more than one program the embodiment of the invention also discloses a kind of equipment, Perhaps more than one program is stored in memory and is configured to be executed by one or more than one processor for one of them The one or more programs include the instruction for performing the following operation: being swept to the data source of point of interest Poi It retouches, determines the attribute information and urban information of each Poi;The inverted index of attribute information creation Poi according to each Poi;According to institute It states urban information and cutting is carried out to the inverted index of Poi, obtain the index piecemeal based on urban information;According to the index piecemeal Construct Poi index database.
Optionally, the attribute information is what is determined based on the inquiry field of Poi, and the inverted index of the Poi includes: to be based on The inverted index of single character.The inverted index of the attribute information creation Poi according to each Poi, comprising: respectively from each Poi Attribute information in, extract each Poi address domain information and/or title domain information;The title word for including to each Poi title-domain information The address character that symbol and address domain information include is counted, and determines the inverted index based on single character.
Optionally, described to construct Poi index database according to the index piecemeal, comprising: to construct the inverted index and index of Poi The corresponding relationship of piecemeal, the inverted index include at least one of the following: Poi name index and Poi allocation index;Based on described The inverted index of Poi and the corresponding relationship of index piecemeal, construct Poi index database.
Optionally, the name character for including to each Poi title-domain information counts, and determines the row of falling based on single character Index, comprising: name character included in the title domain information is counted, determines the corresponding frequency of each name character Rate;According to the corresponding frequency of each name character, determine that the table of falling row chain of each name character, the table of falling row chain include: title word Symbol number, character position and Poi temperature;For the title domain information of each Poi, the row chain based on the name character Table constructs Poi name index.
Optionally, the address character for including to each Poi address domain information counts, and determines the row of falling based on single character Index, comprising: address character included in the address domain information is counted, determines the corresponding frequency of each address character Rate;According to the corresponding frequency of each address character, determine that the table of falling row chain of each address character, the table of falling row chain include: address word Symbol number, character position and Poi temperature;For the address domain information of each Poi, the row chain based on the address character Table constructs Poi allocation index.
Optionally, cutting is carried out according to inverted index of the urban information to Poi, obtains the index based on urban information Piecemeal, comprising: carry out cutting according to inverted index of the urban information to each Poi, obtain the corresponding Poi's of each urban information Inverted index;Based on the inverted index of the corresponding Poi of same urban information, corresponding index piecemeal is generated.
Optionally, described by one or more than one processor before determining the inverted index based on single character Executing the one or more programs includes the instruction for being also used to perform the following operation: obtaining historical search data, institute It states input method record data, webpage click historical data and map that historical search data includes user and clicks historical data;It is right Input method record data, webpage click historical data and map in the historical search data are clicked historical data and are integrated Analysis, obtains the corresponding Poi temperature of each character.
It include memory and one or more than one program the embodiment of the invention also discloses a kind of equipment, Perhaps more than one program is stored in memory and is configured to be executed by one or more than one processor for one of them The one or more programs include the instruction for performing the following operation: receiving the query information of user's input, institute State the point of interest Poi character information inputted in search process in query information comprising the user;From the query information Identify Poi character information, and the geographical location being currently located based on user determines urban information belonging to Poi;According to described in Urban information belonging to Poi inquires target inverted index corresponding with the Poi character information, the index from index database Library includes the index piecemeal based on urban information, and the index piecemeal includes the inverted index of Poi;It falls to arrange rope according to the target At least one Poi is caused into as a result, and being shown to the Poi result.
Optionally, Poi character information is identified from the query information, comprising: the query information is segmented, Obtain at least one word segmentation result;Part-of-speech tagging is carried out to the word segmentation result, obtains corresponding attribute information;According to described point Word result and the attribute information are analyzed, and determine that Poi character information, the Poi character information include name character letter Breath and/or address character information.
Optionally, it according to urban information search index library belonging to the Poi, obtains corresponding with the Poi character information Target inverted index, comprising: in index database, search corresponding with urban information belonging to Poi target index piecemeal; In target index piecemeal, the table of falling row chain for the character that the Poi character information is included is inquired;According to the row chain of falling inquired Table determines that target inverted index, the target inverted index include title Domain Index and/or address Domain Index.
Optionally, at least one Poi result is generated according to the target inverted index, comprising: to the title Domain Index It is merged with address Domain Index, obtains fused Poi index;The table of falling row chain of character is corresponded to based on the Poi index, it is raw At at least one Poi as a result, the table of falling row chain includes: character number, character position and Poi temperature.
Optionally, described that Poi result is shown, comprising: for each Poi as a result, arranging from the character Corresponding Poi temperature and character position are obtained in chained list;Poi temperature, character position and the query information that foundation obtains, Processing is ranked up to the Poi result, determines that each Poi result is corresponding and puts in order;According to putting in order to each Poi result It is shown.
The embodiment of the invention also discloses a kind of readable storage medium storing program for executing, when the instruction in the storage medium is by equipment When managing device execution, enable a device to execute index base construction method described in one or more of embodiment of the present invention.
The embodiment of the invention also discloses a kind of readable storage medium storing program for executing, when the instruction in the storage medium is by equipment When managing device execution, enable a device to execute the searcher based on index database described in one or more of embodiment of the present invention Method.
The embodiment of the present invention includes following advantages:
The embodiment of the present invention can arrange rope to Poi according to the urban information of Poi after the inverted index of creation Poi Row cutting is introduced, the index piecemeal based on urban information is obtained, i.e., according to the difference in the affiliated city Poi, by falling for the Poi of creation Row's index is cut into the index piecemeal based on city, and can construct Poi index database based on the index piecemeal in city, realize according to The purpose that the affiliated city Poi stores Poi, thus can be according to the affiliated city Poi in the Poi index during subsequent searches Corresponding index piecemeal scans in library, that is, does not need to scan in entire Poi index database, reduce search range, into And improve search efficiency.
In addition, the embodiment of the present invention can create corresponding to different inquiry fields during creating Poi inverted index Poi inverted index, such as Poi allocation index is created for title-domain information creating Poi name index, for address domain information; And then can be scanned in Poi search process according to different inquiry fields, so as to search corresponding to different inquiry fields Poi inverted index, and fusion merger can be carried out to the Poi inverted index searched, such as to Poi allocation index and Poi title Index carries out fusion merger, and the recall rate of search can also be promoted while promoting search range, must further improve search Efficiency.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of index database method for building up embodiment of the invention;
Fig. 2 is a kind of step flow chart of index database method for building up alternative embodiment of the invention;
Fig. 3 is the schematic diagram of fall row chain table of one of the example of the present invention based on single name character;
Fig. 4 is the schematic diagram that Poi allocation index and Poi name index are respectively created in an example of the present invention;
Fig. 5 is the schematic diagram of one of example of the present invention creation pinyin indexes;
Fig. 6 is the schematic diagram of the division inverted index in an example of the present invention;
Fig. 7 is a kind of step flow chart of searching method embodiment based on index database of the invention;
Fig. 8 is a kind of step flow chart of searching method alternative embodiment based on index database of the invention;
Fig. 9 is that the user query in one application example of the present invention are intended to understanding system, inquiry display systems and index clothes The connection schematic diagram of business system;
Figure 10 is a kind of structural block diagram of index database construction device embodiment of the invention;
Figure 11 is a kind of structural block diagram of searcher embodiment based on index database of the invention;
Figure 12 is a kind of structural block diagram of equipment shown according to an exemplary embodiment;
Figure 13 is the structural schematic diagram of server in the embodiment of the present invention.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.
During user scans for Poi, user is more clear to region demand.The core of the embodiment of the present invention One of design is: after the inverted index of creation Poi, carrying out piecemeal according to inverted index of the affiliated city Poi to Poi, obtains To the index piecemeal based on city;To indexed according to the affiliated city Poi of required search during subsequent searches It is scanned in piecemeal, i.e. diminution search range, and then can be improved search efficiency.
Referring to Fig.1, a kind of step flow chart of index database method for building up embodiment of the invention is shown, specifically can wrap Include following steps:
Step 102, the data source of point of interest Poi is scanned, determines the attribute information and urban information of each Poi.
In the concrete realization, the data source of Poi may include Poi name data, Poi address date, Poi temperature data, Poi click data, Poi belonging relation, error correction dictionary, thesaurus, user click historical data etc..The embodiment of the present invention can To be scanned by the various data sources to Poi, the global information of Poi is obtained.The global information of the Poi may include Poi Attribute information and urban information.Urban information can be used to indicate that the affiliated city Poi, and the affiliated city Poi can be according to administration Zoning distinguishes, the embodiment of the present invention to this with no restriction.The attribute information of Poi can be used to indicate that the attribute of point of interest, It such as may include title domain information, address domain information, type domain information, synonym domain information.
Wherein, title domain information may include information relevant to Poi title, as Poi title, Poi title are included Character number and the position of character etc..Address domain information may include information relevant to the address Poi, can specifically include Poi The longitude and latitude at place, better address information of Poi etc., such as when Poi is " 15 floor of the mansion X is searched by the institute of Zhong Guan-cun East Road 1 ", The address domain information of Poi can be " institute of Zhong Guan-cun East Road 1 ".Type domain information can be used for including the generic with Poi Relevant information can specifically include the corresponding information on services of point of interest generic, such as service industry's code, service industry's name Claim etc..Synonym domain information may include that the alias, abbreviation and former name of Poi title claim and the alias of word etc..For example, interest Point " Tsinghua University " referred to as " Tsing-Hua University ", can synonym domain information by " Tsing-Hua University " as point of interest " Tsinghua University ";Again If " State Development and Reform Commission person's meeting " and " Committee of Development and Reform " can express the same meaning, can will wherein any one Poi it is referred to as another The synonym domain information of one Poi title, specifically, can synonym domain by " State Development and Reform Commission person's meeting " as " Committee of Development and Reform " Information, alternatively, the synonym domain information by " Committee of Development and Reform " as " State Development and Reform Commission person's meeting ";For another example in point of interest " Chinese-style restaurant " It, can synonym domain information by " Chinese Restaurant " as point of interest " Chinese-style restaurant " in the case that alias is " Chinese Restaurant ";For another example, may be used To claim " Xiangfan City " as synonym domain information of point of interest " Xiangyang City ", etc. the former name of point of interest " Xiangyang City ".This
Certainly, the attribute information of Poi may include other information, such as can also include periphery landmark information, satellite information, The embodiment of the present invention to this with no restriction.Periphery landmark information can be used for characterizing the peripheral information in the geographical location where Poi, It such as can be used for characterizing the surrounding cities information in the affiliated city of point of interest.Satellite information is determined for Poi temperature.Poi heat Degree can be the Thermal Synthetic as obtained from the input method of comprehensive analysis user record, webpage click history and map click history Spend weight.
Step 104, the inverted index according to the attribute information creation Poi of each Poi.
In embodiments of the present invention, it can be directed to the attribute information of Poi, the inverted index of Poi is created, to facilitate subsequent search User can be searched in rope processing based on the inverted index of Poi and inputs the corresponding Poi of information.For example, the address of Poi can be directed to Domain information creates Poi allocation index, so as to search for user's input address according to Poi allocation index in subsequent searches are handled The corresponding Poi of character;For another example, it can be directed to the title domain information of Poi, Poi name index is created, so as at subsequent searches The corresponding Poi of name character is inputted according to Poi name index search user in reason.
Step 106, cutting is carried out according to inverted index of the urban information to Poi, obtains the rope based on urban information Draw piecemeal.
In embodiments of the present invention, the affiliated city Poi can be determined according to the urban information of Poi.In creation inverted index During, it can be corresponding to each city by the inverted index cutting of the Poi of creation according to the difference in the affiliated city Poi Piecemeal is indexed to get the index piecemeal based on urban information is arrived.
Step 108, Poi index database is constructed according to the index piecemeal.
In embodiments of the present invention, the index piecemeal corresponding to different cities can be stored in the database, that is, constituted Poi index database.Specifically, can be stored to the inverted index of Poi, structure according to the index piecemeal corresponding to different cities At building Poi index database.The Poi index database may include one or more index piecemeals and index piecemeal and urban information it Between corresponding relationship.
To sum up, the embodiment of the present invention can fall Poi according to the urban information of Poi after the inverted index of creation Poi Row's index carries out cutting, obtains the index piecemeal based on urban information, i.e., according to the difference in the affiliated city Poi, by the Poi of creation Inverted index be cut into the index piecemeal based on city, and Poi index database can be constructed based on the index piecemeal in city, realized According to the purpose that the affiliated city Poi stores Poi, thus can be according to the affiliated city Poi in the Poi during subsequent searches Corresponding index piecemeal scans in index database, that is, does not need to scan in entire Poi index database, reduce search model It encloses, and then improves search efficiency.
Referring to Fig. 2, a kind of step flow chart of index database method for building up alternative embodiment of the invention is shown, specifically may be used To include the following steps:
Step 202, the data source of point of interest Poi is scanned, determines the attribute information and urban information of each Poi.
As an example of the invention, document can be constructed by the data source of scanning Poi.The document may include Scan the global information of obtained one or more Poi.Wherein, the global information of point of interest may include Poi affiliated city, The information such as the position of character number and character for including in generic, Poi temperature and Poi.
For example, obtaining " soho network mansion ", " Sohu's media mansion ", " search dog scientific & technical corporation ", " Tianjin Binhai in scanning After International airport " and " University Of Tianjin " this five Poi, a document can be constructed based on the global information of this five Poi, it is as follows Shown in table 1:
Poi number Poi title The affiliated city Poi Poi temperature
D10001481 Soho network mansion 01 10000
D10001482 Sohu's media mansion 01 9000
D10001483 Search dog scientific & technical corporation 01 9600
D10001493 Tianjin Binhai International airport 02 600
D10001494 University Of Tianjin 02 900
Table 1
Wherein, Poi number can be the unique encodings (identification, ID) of Poi, can specifically use mark Poi, As " Sohu's media mansion " is identified in table 1 using Poi number D1001481;The affiliated city Poi can using city number come It indicates, i.e. city number can be used to indicate that the affiliated city Poi, as that can use the corresponding city number 01 in Beijing in table 1 Indicate that city belonging to " soho network mansion ", " Sohu's media mansion " and " search dog scientific & technical corporation " these three Poi is Beijing, It is indicated using city number 02 corresponding to Tianjin belonging to " Tianjin Binhai International airport " and " University Of Tianjin " the two Poi City is Tianjin.
Step 204, the inverted index according to the attribute information creation Poi of each Poi.
In the concrete realization, the inverted index of Poi can be created based on character included in Poi.The row's of falling rope of the Poi Draw may include: the inverted index based on single character, word-based inverted index, phrase-based inverted index etc., sheet Inventive embodiments to this with no restriction.Wherein, the inverted index based on single character, which can be, refers to that each individual character for Poi is created The index built;Word-based inverted index can be the index referred to for the creation of word included in Poi;The phrase-based row of falling Index can be the index referred to for phrase creation corresponding in Poi.
In embodiments of the present invention, corresponding index can be created for the various attribute informations of Poi.The attribute information can Can such as determine Poi address domain information based on address field based on being determined according to the inquiry field of Poi;Name can be for another example based on Domain is claimed to determine Poi title domain information, etc..In the concrete realization, index to be created can be indicated using particular community information Corresponding attribute information, so as to the inverted index of Poi creation during, for particular community information creating Poi fall Row's index.The particular community information may include at least one of following: address domain information and title domain information.
In an alternate embodiment of the present invention where, attribute information is the inquiry field determination based on Poi.The inquiry of Poi Domain may include title-domain and/or address field.The inverted index of attribute information creation Poi according to each Poi can specifically include: Respectively from the attribute information of each Poi, each Poi address domain information and/or title domain information are extracted;To each Poi title domain information The address character that the name character and address domain information for including include is counted, and determines the row's of falling rope based on single character Draw.
For example, in conjunction with above-mentioned example, it can construct each Poi's by Poi included in document that scan round constructs Inverted index.Specifically, character dictionary may include creation in need character.It, can during constructing inverted index According to the character in character dictionary, character included in the particular community information to each of document Poi is split, Generate character set;And the character in character set can be counted, determine the corresponding frequency of each character, and based on every The corresponding frequency of one character creates the table of falling row chain based on single character.Wherein, the table of falling row chain may include character number, Frequency, character position, the information such as Poi temperature, representation can be<character ID, frequency, character position, Poi temperature>.Word Symbol ID can indicate the number of single character, and character ID can there are the relationships of deviant with Poi ID, therefore be based on character ID With the offset relationship between Poi ID, the representation for the table of falling row chain can be set to < Poi ID, frequency, character position, Poi Temperature >, this hair embodiment to this with no restriction.Frequency can indicate frequency or number of the single character appeared in Poi;Character Position can indicate position of the single character appeared in Poi;Poi temperature can indicate in Poi Users ' Need-oriented meaning Temperature.
In one embodiment of the invention, the offset between character ID and the Poi ID of character included in Poi is closed System, can be as follows shown in formula:
δ=(A1+A2+A3+A4+ ...+An)+X
Specifically, δ can indicate Poi ID;(A1+A2+A3+A4+ ...+An) can indicate own included in Poi The sum of the character ID of character, wherein A1 can indicate that the character ID, A2 of the first character in Poi can indicate in Poi The character ID, A3 of two characters can indicate the character ID of the third character in Poi, so analogize, An can be indicated in Poi The character ID, n of n-th of character be integer;X can indicate deviant, which can be according to character included in Poi Be configured, the embodiment of the present invention to this with no restriction.
For example, the character ID of each single word is respectively β 1, β 2, β 5, β 6, β 7 and β 21 in " soho network mansion ", and inclined When shifting value X is X1, by calculating (β 1+ β 2+ β 5+ β 6+ β 7+ β 21)+X1, obtaining δ is D10001481, that is, be can determine The Poi ID of " soho network mansion " is D10001481;For another example, the character ID of each single word distinguishes in " Sohu's media mansion " For β 1, β 2, β 3, β 4, β 7 and β 21, and when deviant X is X2, by calculating (β 1+ β 2+ β 3+ β 4+ β 7+ β 21)+X2, Obtaining δ is D10001482, that is, can determine that the Poi ID of " Sohu's media mansion " is D10001482, etc..
The embodiment of the present invention when creating inverted index, can record simultaneously Poi temperature corresponding to each Poi and User clicks historical data, and Poi temperature and user can be clicked historical data and are dissolved into the inverted index of Poi;From And when recommending Poi to user based on user input query word, it can be based on clicking historical data to pushing away by Poi temperature and user The Poi recommended is ranked up, and obtains accurate Poi ranking results, improves recall rate.
Optionally, Poi temperature can be inputted by the click frequency of synthetic user, the search frequency and Poi frequency etc. because Element is polymerized.The click frequency of user can indicate user click Poi number, as user clicked on map it is a certain The number of Poi can click history log according to user and obtain;The search frequency can be used to indicate that user searches for the number of Poi, If user searches for the number of a certain Poi on map, can be obtained according to user's search history log;Poi inputs the frequency can table Show that user inputs the number of Poi, as user inputs the number of a certain Poi in input method.User clicks historical data and can be Data are obtained according to the click history log of Poi intelligent prompt.
It should be noted that Poi intelligent prompt refers to search key intelligent prompt.Specifically, minimum in user's input Search term when, system provides most desired Poi result, navigation or the circuit query of user, to promote the search of user Experience.
It, can be according to the row's of falling rope of the row of the falling list construction Poi of single character after the table of falling row chain for creating single character Draw.Optionally, the table of falling row chain based on single character can be ranked up according to the Poi temperature in the table of falling row chain, is fallen Row chain table puts in order, and then can be stored based on putting in order according to this to the table of falling row chain, to facilitate subsequent be based on Poi temperature searches the inverted index of Poi, improves search efficiency.It is of course also possible to use other modes are to based on single character The table of falling row chain is ranked up, such as can based in the table of falling row chain Poi ID and character position the table of falling row chain is ranked up, The embodiment of the present invention to this with no restriction.
In an alternate embodiment of the present invention where, before inverted index of the determination based on single character, the side Method can also include: acquisition historical search data, and the historical search data includes input method the record data, webpage point of user It hits historical data and map clicks historical data;Input method record data in the historical search data, webpage click are gone through History data and map click historical data and carry out comprehensive analysis, obtain the corresponding Poi temperature of each character.Wherein, input method records Data can determine that user inputs the number of each Poi with user;Webpage click historical data is determined for user in net The number of every Poi is searched for and/or clicked on page;Map, which clicks historical data, can be used for determining that user searches in map application And/or click the number of every Poi.
As an example of the invention, the name character that can included according to each Poi title in above-mentioned table 1, creation The table of falling row chain out based on single name character, as shown in Figure 3.By taking name character " searching " as an example, name character " searching " is corresponding Frequency be 3, can determine that " searching " appears in 3 Poi titles;And the table of falling row chain corresponding to name character " searching " include: < D10001481,3,1,10000>,<D1000182,3,1,9000>and<D1000183,3,1,9600>;Wherein, < D10001481,3,1,10000 > it can indicate that " searching " appears in " soho network mansion " this Poi title, and " searching " is " to search First name character of this Poi title of fox network mansion ", corresponding to Poi temperature be 9600.
In an alternate embodiment of the present invention where, the name character for including to each Poi title-domain information counts, really The fixed inverted index based on single character, comprising: name character included in the title domain information is counted, is determined The corresponding frequency of each name character;According to the corresponding frequency of each name character, determine the table of falling row chain of each name character, it is described fall Row chain table includes: name character number, character position and Poi temperature;For the title domain information of each Poi, it is based on institute State the row's of falling list construction Poi name index of name character.
By taking the Poi name index of building " Sohu " as an example, the table of falling row chain based on single character as shown in Figure 3 can be with Determine " Sohu " corresponding to the table of falling row chain include<D10001481,3,1,10000>,<D1000182,3,1,9000>,< D10001481,2,2,10000>and<D10001482,2,2,9000>;And then it can be based on this Poi ID and word of the table of falling row chain Accord with position, general<D10001481,3,1,10000>and<D10001481, a kind of Poi title of 2,2,10000>composition " Sohu " Index, general<D1000182,3,1,9000>and<D10001482, another Poi title rope of 2,2,9000>composition " Sohu " Draw.
In the concrete realization, a Poi can have title domain information, while also can have address domain information.It is creating During Poi inverted index, it can be not only directed to title-domain information creating Poi name index, can also be believed for address field Breath creation Poi allocation index, as shown in Figure 4.Optionally, it is counted in the address character for including to each Poi address domain information, It determines the inverted index based on single character, may include: to unite to address character included in the address domain information Meter, determines the corresponding frequency of each address character;According to the corresponding frequency of each address character, the row chain of falling of each address character is determined Table, the table of falling row chain include: address character number, character position and Poi temperature;For the address field letter of each Poi Breath, the row's of falling list construction Poi allocation index based on the address character.
Since Poi title and the address Poi are belonging respectively to different inquiry fields, during creating inverted index, Creation Poi name index and Poi allocation index can be separated.By taking point of interest is " soho network mansion " as an example, in " soho network In the case that the address of mansion " is " institute 9 of Zhong Guan-cun East Road 1 ", the address Poi " institute 9 of Zhong Guan-cun East Road 1 " can be made For address domain information, and then " can be searched based on character included in the address Poi " institute 9 of Zhong Guan-cun East Road 1 ", building Poi allocation index corresponding to fox network mansion ";It is of course also possible to which Poi title " soho network mansion " is believed as title-domain Breath, to construct Poi name index based on character included in " soho network mansion ".
User is during common input, it will usually input complete or incomplete pinyin character, such as input "beijing".In response to this, the embodiment of the present invention can be during creating inverted index, the Chinese that is included by Poi Word character is converted to corresponding pinyin character string, to create corresponding pinyin indexes for pinyin character string.For example, such as Fig. 5 institute Show, chinese character contained in Poi title " soho network mansion " is converted into corresponding pinyin character string " sou-hu-wang- luo-da-sha";And the pinyin character string after can converting is converted into single pinyin character, i.e., by pinyin character string " sou- Hu-wang-luo-da-sha " be converted into phonetic corresponding to individual Chinese character " sou ", " hu ", " wang ", " luo ", " da ", "sha".It then, can be according to the principle of prefix trees, for the prefix index tree of single pinyin character creation phonetic, to be based on prefix Index tree generates pinyin indexes corresponding to individual Chinese character.So as to the foundation pinyin indexes in search process, for input Pinyin character string, recommend corresponding Poi to user as a result, improving search efficiency and recall rate.
Step 206, cutting is carried out according to inverted index of the urban information to each Poi, it is corresponding obtains each urban information Poi inverted index.
Step 208, the inverted index based on the corresponding Poi of same urban information generates corresponding index piecemeal.
As an example of the present invention, after the inverted index for all Poi that creation scanning obtains, what can be created is all The inverted index of Poi is stored in an inverted file;And can be according to the difference in the affiliated city Poi, it will be in inverted file The inverted index cutting of each Poi be different cities index piecemeal, city can be according to administrative division define in prefecture-level city. For example, as shown in fig. 6, being divided according to the affiliated city Poi to the inverted index in inverted file, so as to which Beijing will be belonged to Poi corresponding to the table of falling row chain be divided into Beijing index piecemeal in, will belong to the table of falling row chain corresponding to the Poi in Shenzhen divide It is indexed in piecemeal to Shenzhen, and the table of falling row chain corresponding in the Poi for belonging to other cities is divided into other cities index Piecemeal.
Step 210, construct Poi inverted index and index piecemeal corresponding relationship, the inverted index include with down toward One item missing: Poi name index and Poi allocation index.
Specifically, each of index piecemeal can be constructed after the inverted index of Poi to be divided into index piecemeal Corresponding relationship between the row's of falling rope of Poi and the index piecemeal.For example, according to corresponding to point of interest " soho network mansion " The affiliated city Poi is Beijing, and the Poi name index of point of interest " soho network mansion " and Poi allocation index are being divided into north After capital indexes piecemeal, pair between the Poi allocation index of point of interest " soho network mansion " and Beijing index piecemeal can be constructed It should be related to, and can construct corresponding between the Poi name index of point of interest " soho network mansion " and Beijing index piecemeal Relationship.
Step 212, the corresponding relationship of inverted index and index piecemeal based on the Poi, constructs Poi index database.
It, can be based on the inverted index of the Poi of foundation after the inverted index of building Poi and the corresponding relationship of index piecemeal With the corresponding relationship of index piecemeal, the urban information of inverted index and Poi to Poi is saved, and constitutes Poi index database.Example Such as can corresponding relationship between inverted index based on Poi and index piecemeal, the character for being included to Poi it is corresponding fall row chain The corresponding urban information of table and the affiliated city Poi is stored, and Poi index database is generated.So as in Poi search process, The affiliated city manipulative indexing piecemeal of Poi is indexed, and reduces search range, improves search efficiency.
To sum up, the embodiment of the present invention, can be for each included in Poi during creating Poi inverted index A character creates the inverted index based on single character, and the inverted index of creation is divided into correspondence according to the affiliated city Poi Index piecemeal in stored, to construct Poi index database;It, can be according to belonging to Poi in subsequent Poi search process City finds out corresponding index piecemeal in Poi index database, included in the Poi character information that then can be inputted based on user Character searches corresponding target inverted index in index piecemeal, with can generate one or more according to target inverted index A Poi result is simultaneously shown, Poi result is recommended user, meets user demand.
Referring to Fig. 7, a kind of step flow chart of searching method embodiment based on index database of the invention is shown, specifically It may include steps of:
Step 702, the query information of user's input is received, includes the user in the query information in search process The point of interest Poi character information of input.
Specifically, the query information that terminal can input user is sent after user's using terminal input inquiry information To server, so that server can receive the query information of user input.Specifically, when user wants to obtain certain When one Poi, the Poi to be obtained for it can be inputted in terminal and input information, will entered information as with triggering terminal The query information is sent to so as to logical terminal and takes device by query information, so that server can receive the query information, Execute step 704.
Step 704, Poi character information, and the geographical position being currently located based on user are identified from the query information Set urban information belonging to determining Poi.
In embodiments of the present invention, server can carry out identification and the Poi attribute of Poi to the query information that user inputs Mark, to identify Poi character information included in the query information.The Poi character information may include different inquiry fields Corresponding character information may include the letter of name character corresponding to address character information, title-domain corresponding to address field Breath etc..The character style of Poi character information can be determined according to user's input mode, such as use Chinese character coding input method in user When, the character of Poi character information can be the chinese character of Chinese;For another example when user uses English input method, Poi character letter The character of breath can for pinyin character etc., the embodiment of the present invention to this with no restriction.
Meanwhile server can obtain the geographical location that user is currently located by terminal, and then can be worked as based on user Determine urban information belonging to Poi in the geographical location at preceding place.Wherein, urban information belonging to Poi can be used for characterizing user City belonging to the Poi of required acquisition, such as when user Haidian District, Beijing City carry out Poi search when, can by Beijing institute it is right The city code 01 answered carries out Poi search as urban information belonging to Poi, with the index piecemeal corresponding to Beijing;For another example When user is when University Of Tianjin carries out Poi search, the Poi institute that the city code 02 corresponding to Tianjin can be searched for as user The urban information of category carries out Poi search with the index piecemeal corresponding to Tianjin.
Step 706, it according to urban information search index library belonging to the Poi, obtains corresponding with the Poi character information Target inverted index, the index database includes the index piecemeal based on urban information, it is described index piecemeal include Poi fall arrange Index.
Server, can be according to the urban information in the index database constructed in advance after determining urban information belonging to Poi In inquired, so as to search index piecemeal corresponding to urban information belonging to the Poi, and the index that can will be found Piecemeal is determined as target index piecemeal, with the inverted index corresponding to target index block research Poi character information.
After finding inverted index corresponding to Poi character information, can using the inverted index found as with Poi The corresponding target inverted index of character information can index in piecemeal from target and obtain target corresponding with Poi character information Inverted index reduces search range, to improve search efficiency.
For example, the table of falling row chain based on single character as shown in connection with fig. 3, block research can be indexed to " searching in Beijing The table of falling row chain corresponding to fox ", including<D10001481,3,1,10000>,<D1000182,3,1,9000>,<D10001481, 2,2,10000>and<D10001482,2,2,9000>, and then can general<D10001481,3,1,10000>,<D1000182,3,1, 9000>,<D10001481,2,2,10000>and<D10001482,2,2,9000>be determined as target inverted index.
Step 708, at least one Poi is generated according to the target inverted index as a result, and opening up to the Poi result Show.
After determining target inverted index, server can based on the target fall arrange rope corresponding to character constitute one or Multiple Poi are shown Poi result by terminal as a result, Poi result can simultaneously be returned to terminal used by a user.
For example, in conjunction with above-mentioned example, based on<D10001481,3,1,10000>and<D10001481,2,2,10000>, it can Using by Poi corresponding to Poi number D10001481 " soho network mansion " as a Poi result;Meanwhile can be based on < D1000182,3,1,9000>and<D10001482,2,2,9000>, by Poi " Sohu matchmaker corresponding to Poi number D10001482 Body mansion " is used as another Poi result;Then Poi " Sohu's media mansion " and Poi " soho network mansion " can be shown On terminal screen, to recommend user, meet user demand.
In actual treatment, user can input the Poi character letter of different inquiry fields during input inquiry information Breath carries out Poi search in different inquiry fields with trigger the server, so as to promote search range and promote search efficiency.Its In, inquiry field may include but be not limited only to address field and title-domain.For example, the query information in user's input is " Cheng Fulu 1 When number Sohu mansion ", " Cheng Fulu 1 " can be identified as to address character information, and " Sohu mansion " is identified as title word Information is accorded with, that is, the mixing query demand comprising different inquiry fields occurs.For mixing query demand, the present invention is implemented can be according to Poi character information corresponding to different inquiry fields carries out merger inquiry to Poi index, so as to inquire different inquiry field institutes Corresponding Poi inverted index.
Referring to Fig. 8, a kind of step flow chart of searching method alternative embodiment based on index database of the invention is shown, It can specifically include following steps:
Step 802, the query information of user's input is received, includes the user in the query information in search process The point of interest Poi character information of input.
Step 804, Poi character information, and the geographical position being currently located based on user are identified from the query information Set urban information belonging to determining Poi.
In embodiments of the present invention, optionally, Poi character information is identified from the query information, may include: pair The query information is segmented, at least one word segmentation result is obtained;Part-of-speech tagging is carried out to the word segmentation result, is corresponded to Attribute information;It is analyzed according to the word segmentation result and the attribute information, determines Poi character information, the Poi word According with information includes name character information and/or address character information.In the concrete realization, can according to preset part of speech to The query information of family input is segmented, at least one word segmentation result is obtained;And word can be carried out to each word segmentation result Property mark, and attribute information corresponding to each word segmentation result can be determined based on the part of speech of mark, and then can be with base Word segmentation result is analyzed in attribute information, determines to belong to Poi character information corresponding to inquiry field.For example, can be with base In title domain information, the word segmentation result obtained after participle analyzes the character that Poi title is included and believes to get to name character Breath;For another example, it can be based on address domain information, the word segmentation result obtained after participle analyzes the character that the address Poi is included, i.e., Obtain address character information.
Step 806, it in index database, searches target corresponding with urban information belonging to the Poi and indexes piecemeal.
Implement in embodiment in the present invention, it can be according to the corresponding relationship of urban information and index piecemeal, in index database It searches target index piecemeal corresponding with urban information belonging to Poi to be searched, to inquire in target index piecemeal Target inverted index corresponding to Poi character information out.
For example, when user in Beijing using terminal, " search by the input inquiry information in the operation interface of map application Query information " Sohu " can be identified as Poi character information by fox ", server, to look into according to required for " Sohu " search user The Poi of inquiry, and the city number 01 corresponding to Beijing can be determined as to the urban information of the Poi of inquiry required for user, with Beijing corresponding to city number 01, which indexes, searches inverted index corresponding to " Sohu " in piecemeal.
Step 808, in target index piecemeal, the table of falling row chain for the character that inquiry matches with the Poi character information.
Step 810, the table of falling row chain according to the character mutually matched, determines target inverted index, the target inverted index Including title Domain Index and/or address Domain Index.
In embodiments of the present invention, it can be indexed in piecemeal in target, search each word that Poi character information is included The table of falling row chain of symbol, it can obtain the table of falling row chain of character corresponding to different inquiry fields, and then can be based on different inquiries The table of falling row chain of character corresponding to domain determines target inverted index corresponding to different inquiry fields.For example, can basis The name character that the name character information of Poi is included is searched in target index piecemeal, and then can be based on lookup The table of falling row chain of name character determines Poi title Domain Index corresponding on title-domain;It similarly, can be according to the address of Poi The address character that character information is included is searched in target index piecemeal, and then can be based on the address character of lookup The table of falling row chain, determine that Poi claims on domain corresponding address Domain Index in address.
Step 812, at least one Poi is generated according to the target inverted index as a result, and opening up to the Poi result Show.
It when there is compositum inquiry, i.e., include corresponding to different inquiry fields in the query information of user's input In the case where Poi character information, the embodiment of the present invention is after determining that target corresponding to different inquiry fields falls to arrange rope, Ke Yiyi The target inquired is fallen to arrange Suo Jinhang according to Poi ID and merges merger, is needed with falling to arrange Suo Shengcheng based on the target after fusion merger The Poi of recommendation is as a result, reduce the recommended amount of Poi result.Optionally, address field is contained in the target inverted index inquired In the case of title Domain Index corresponding to corresponding address Domain Index and title-domain, according to the target inverted index generate to A few Poi result, comprising: the title Domain Index and address Domain Index are merged, fused Poi index is obtained; The table of falling row chain that character is corresponded to based on the Poi index generates at least one Poi as a result, the table of falling row chain includes: that character is compiled Number, character position and Poi temperature.
For example, server is determining that " Cheng Fulu 1 " is corresponding after user inputs " No. 1 Sohu mansion Cheng Fulu " Address Domain Index and " Sohu mansion " corresponding to title Domain Index,;And by address Domain Index and title Domain Index Merge merger, Poi index corresponding to available " No. 1 Sohu mansion Cheng Fulu ".The Poi index may include " Cheng Fulu 1 The table of falling row chain corresponding to each chinese character included in number Sohu mansion ".So as to based on " No. 1 Sohu of Cheng Fulu is big The table of falling row chain corresponding to each chinese character included in tall building " constitutes " No. 1 Sohu mansion Cheng Fulu " corresponding Poi knot Fruit shows the geographical location of " No. 1 Sohu mansion Cheng Fulu " this point of interest such as on the interface of map application, or shows From user's current geographic position to the how corresponding navigation routine in geographical location of " No. 1 Sohu mansion Cheng Fulu " this point of interest Deng.
The embodiment of the present invention is in the case where the Poi result more than one of generation, i.e., two or more in generation After Poi result, to be ranked up according to Poi result of the Poi temperature to generation, obtain it is corresponding put in order, so as to press Poi result is shown according to putting in order, i.e., the higher Poi result of Poi temperature is preferentially recommended into user.Optionally, right Poi result is shown, and may include: for each Poi as a result, obtaining from the table of falling row chain of the character corresponding Poi temperature and character position;According to obtain Poi temperature, character position and the query information, to the Poi result into Row sequence processing, determines that each Poi result is corresponding and puts in order;Each Poi result is shown according to putting in order.Specifically , the embodiment of the present invention can based in the table of falling row chain character position and Poi ID determine character in the position of Poi, and The degree of association included in Poi result between character and query information can be determined in the position of Poi based on character, in turn The Poi result of generation can be ranked up using the degree of association and Poi temperature, corresponding collating sequence be obtained, according to this The Poi result to put in order pair is shown, i.e., Poi temperature is higher and the higher Poi result of the degree of association recommends user, from And improve recall rate corresponding to query information.
As a kind of application of the invention, the searching method provided in an embodiment of the present invention based on index database be can be applied to User query are intended to understanding system and inquiry display systems, and index database method for building up can be applied in index service system. Wherein, user query, which are intended to understanding system, can be used as the system for judging user's preliminary demand, can be used for receiving user's input Query information, and can identify that the input of the part such as Poi information, Poi are completely inputted from the query information that user inputs Information, Poi mistake input the Poi character information such as information and, and can determine that such as Poi address lookup demand, navigation need The user demands such as summation route query demand, so that inquiry display systems search the Poi word identified according to user demand Poi corresponding to information is accorded with as a result, to provide a user optimal Poi result.
For example, as shown in figure 9, user query be intended to understanding system 910 receive user input query information after, can To segment by word segmentation module 912 to query information, one or more word segmentation results are obtained, which may include One or more characters;And each obtained participle can be tied according to preset part of speech by part-of-speech tagging module 914 Fruit carries out the mark of Poi attribute, that is, determines the attribute information of Poi.Wherein, preset part of speech may include such as stop words, lead Boat jumps word, public transport jumps word, the chain word of brand, administrative division word, bus station, number, simple point of interest, common suffix Word, subway jump word, classification brand word, road word, the noun of locality and letter etc., the embodiment of the present invention to this with no restriction.It can Choosing, these parts of speech can be configured according to word included in dictionary.Wherein, dictionary can be by user journal point What analysis obtained, such as may include Poi namebase, thesaurus, classification and brand dictionary.Optionally, looking into user's input Ask information may not in the case where single some Poi of references, as in query information comprising being looked into corresponding to a variety of attribute informations In the case where asking word, the probability of the Poi attribute currently marked can be determined in conjunction with query information, to handle in subsequent sequence In can be ranked up according to the probability of Poi attribute, to provide reasonable Poi result.In addition, user query are intended to understanding system 910 can also be carried out by query analysis module 916, the result generated to word segmentation module 912 and part-of-speech tagging module 914 into one The analysis of step, the attribute information such as based on default rule template and mark are analyzed word segmentation result, are reasonably looked into Logic is ask, and the query logic is sent to inquiry display systems 920, is patrolled with triggering inquiry display systems 920 according to the inquiry It collects and is inquired.For example, inquiry display systems 920 Poi can not exclusively be inputted according to the query logic information inquired, Filter out stop words, processing navigation jumps word, processing subway jumps word, inquiry address Poi and inquire classificating word brand etc..
Specifically, inquiry display systems 920 can according to user query be intended to understanding system send Poi character information and User demand matches Poi inverted index corresponding to Poi character information, Jin Erke in the index database of index service system 930 Poi is generated as a result, and carrying out rationalization displaying to Poi result to be based on being matched to Poi inverted index.Wherein, rationalize and show It may include that Poi is shown, structuring is shown, polymerization displaying, boss's node display, navigating to jump jumps displaying etc. with route.Knot Structureization shows the displaying that can indicate the affiliated facility of Poi, such as shows " door ", " parking lot ", " entrance ", " ticket office ", " Building A B Seat " etc..
Polymerization shows and can indicate Poi result may be needed to carry out polymerization displaying user.Specifically, when user inputs ratio When less character information, user query are intended to the user demand that understanding system is determined may be very much, and inquiry shows system The Poi result polymerization that user's most probable needs can be showed user by system, and then can be improved user's input efficiency.For example, When user's input Pinyin " jiao ", " Bank of Communications ", " education training is can be obtained by aggregation and sorting in inquiry display systems The words such as instruction ", " dumpling shop ", " church " select to use to supply user.
Boss's node display can be used for showing the belonging relation and correlativity of a kind of brand or classification.For example, " wine It include " express hotel ", " star hotel ", " Youth Hotel " etc. in this classification in shop ", if the query information of user's input is " wine When shop ", then the group for belonging to hotel can be provided by inquiring display systems, such as provide " express hotel ", " star hotel " and " young trip These three classifications of society ";If the query information of user's input is " express hotel ", such as " seven can be provided by inquiring display systems It ", the result of " such as family " hotel brand.
Navigation, which is jumped, to be jumped displaying with route and refers to and open up to the interface for jumping to navigation page or the navigation routine page Show.Specifically, being intended to understanding system 910 by analysis user input query word in user query, determine that user leads When boat and circuit query, the interface jumped that inquiry display systems 920 can be provided according to the Poi result searched, so as to fast Speed reaches the demand of user.
During sequence, inquiry display systems 920 can be based on word order, word tightness, word belonging relation, regularization Power and Poi click volume etc. factor are proposed, processing is ranked up to the Poi result of generation, to improve recall rate and use search efficiency. Wherein, word order can indicate that a word in the position of Poi, such as can use character position and Poi ID in the table of falling row chain to determine Word order;Word tightness can be calculated by the editing distance of word and word.The belonging relation of word can indicate the administrative division of word Subordinate relation and synonym closeness relation.Regularization proposes power and can indicate to need under certain user's demand specific word Weight is adjusted, and such as when user sequentially inputs number, can be improved to the sequencing weight of public bus network;And When containing the partial words in bracket in the query information of user's input, the sequencing weight of this partial words is dropped It is low;It for another example, i.e., can in the result that needs to balance current city and trans-city result when handling trans-city inquiry Trans-city temperature data are carried out to propose power, etc..
In one embodiment, weight factor and Poi can be mentioned based on word order, word tightness, word belonging relation, regularization Click volume creates a model of fit, to carry out comprehensive marking by Poi result of the model of fit to generation, obtains corresponding Comprehensive marking valueFor example, the model of fit of creation can be indicated using following formula:
Wherein, parameter x can indicate word tightness, and parameter y can indicate that word order, parameter z can indicate that word belonging relation, p can Indicate that regularization mentions weight factor, t can indicate Poi click volume.A can indicate the weight of word tightness, and b can indicate the power of word order Weight, c can indicate the weight of word belonging relation.It should be noted that a, b and c can carry out returning based on supervised learning method setting It sets, as can be and be obtained by preset regression model, the embodiment of the present invention is without limitation.
For example, inquiry display systems can be in index service system after user input query information " Beijing traffic administration institute " Index database in the Poi result that finds may include having: A " Xicheng District Public Security Department ", B " Beijing Municipal Bureau of Public Security's traffic pipe Reason office (Fuchengmenbei Dajie) ", C " Traffic Administration Bureau, Beijing Municipal Bureau of Public Security ", D " Yongfeng base hot spring traffic administration institute ".When user query are anticipated User's amount of imports query information is interpreted as " searching traffic administration institute in Beijing ", that is, the Poi character letter recognized by figure understanding system Breath can be " traffic administration institute ", can determine the word tightness of " traffic administration institute " and D " Yongfeng base hot spring traffic administration institute ", the weight of word order Higher, " traffic administration institute " and the word tightness and word order of C " Traffic Administration Bureau, Beijing Municipal Bureau of Public Security " are taken second place, and " traffic administration institute " and " Xicheng District A Public Security Department ", B " Public Security Department, Beijing Municipal Bureau of Public Security (Fuchengmenbei Dajie) " have lower word tightness and word order, If only can determine that putting in order as D " forever for Poi result by the comparison in the character meaning of query information and Poi result " Beijing Municipal Bureau of Public Security hands over by rich base hot spring traffic administration institute ", C " Traffic Administration Bureau, Beijing Municipal Bureau of Public Security ", A " Xicheng District Public Security Department ", B Logical management board (Fuchengmenbei Dajie) ".User's amount of imports query information is interpreted as refering in particular to when user query are intended to understanding system " traffic administration institute, Beijing ", that is, the Poi character information recognized be " traffic administration institute, Beijing ", can determine " traffic administration institute, Beijing " with C " Traffic Administration Bureau, Beijing Municipal Bureau of Public Security ", B " Public Security Department, Beijing Municipal Bureau of Public Security (Fuchengmenbei Dajie) " word belonging relation then more Height, if only can determine that putting in order for Poi result by the comparison in the character meaning of query information and Poi result are as follows: C " Traffic Administration Bureau, Beijing Municipal Bureau of Public Security ", B " Public Security Department, Beijing Municipal Bureau of Public Security (Fuchengmenbei Dajie) ", A " Xicheng District traffic pipe Reason office ", D " Yongfeng base hot spring traffic administration institute ".Comprehensive marking is carried out to each Poi result using the model of fit in this example, it can To determine the final sequence of Poi result then are as follows: " Public Security Department, Beijing Municipal Bureau of Public Security is (abundant by A " Xicheng District Public Security Department ", B At door North Street) ", C " Traffic Administration Bureau, Beijing Municipal Bureau of Public Security ", D " Yongfeng base hot spring traffic administration institute ", that is, combine Poi click volume to Poi As a result it is ranked up, is accurately provided ranking results, and can consider which Poi result from user perspective more Meet user demand, and then promotes recall rate.
Index service system 930 can be used for being scanned various data sources, fall row rope of the creation based on single character Draw, and the inverted index of creation can be divided into index piecemeal corresponding to different cities and be stored;And it can be right The Poi index that inquiry display systems 920 are found is merged and is sorted.Wherein, data source may include Poi name data, Poi address date, Poi temperature data, Poi click data, Poi belonging relation library, error correction dictionary, thesaurus, Yong Hudian Hit historical data etc..
In the concrete realization, Poi address date and Poi belonging relation library can be used for describing the data of Poi particular address; Wherein, Poi address date can not describe the belonging relation of the administrative division of Poi, specifically can be used for describing the specific position of Poi It sets, as Poi address date " No. 1 Enlightenment Technology mansion in Tsinghua East Road, Haidian District, Beijing City Building D " describes the better address of Poi. Poi belonging relation library may include the data for showing the affiliated administrative division of Poi, and administrative division may include having " province-city-county (area)- Town-street " Pyatyi administrative unit.By Poi belonging relation library, index service system 930 can accurately obtain the affiliated of Poi Relationship, while user query may make to be intended to understanding system 910 and can correct to the administrative division of user's mistake input.
Error correction dictionary can be by carrying out common nearly word form, the fallibility that mining analysis obtains to user's input journal The dictionary of word, fuzzy phoneme, and the dictionary can by being counted to obtain the probability of user's most probable input error to log, from And index service system 930 is allowed to carry out error correction to the error message that user inputs by error correction dictionary, it solves existing The error message for having user to input cannot carry out the problem that error correction causes recall rate low by unisonance error correcting model.
The alias, abbreviation and former name that synonym dictionary may include Poi title claim and the alias of word etc..Pass through raising The coverage of synonym dictionary can be improved user query and be intended to understanding system 910 to the discrimination of Poi, and then can be improved The accuracy rate of mark.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented Necessary to example.
Referring to Fig.1 0, a kind of structural block diagram of index database construction device embodiment of the invention is shown, specifically can wrap Include following module:
Data source scan module 1002 is scanned for the data source to point of interest Poi, determines the attribute letter of each Poi Breath and urban information;
Index creation module 1004, the inverted index for the attribute information creation Poi according to each Poi;
Cutting module 1006 is indexed, for carrying out cutting according to inverted index of the urban information to Poi, is based on The index piecemeal of urban information;
Index database constructs module 1008, for constructing Poi index database according to the index piecemeal.
In an optional implementation of the invention, attribute information is the inquiry field determination based on Poi.The row's of falling rope of Poi Drawing may include: the inverted index based on single character.Index creation module 1004 may include following submodule:
Attribute information extracting sub-module, for from the attribute information of each Poi, extract respectively each Poi address domain information and/ Or title domain information;
Inverted index determines submodule, the name character and address domain information for including to each Poi title-domain information The address character for including is counted, and determines the inverted index based on single character.
In an optional implementation of the invention, it may include following submodule that index database, which constructs module 1008:
Corresponding relationship constructs submodule, for constructing the inverted index of Poi and the corresponding relationship of index piecemeal, the row of falling Index includes at least one of the following: Poi name index and Poi allocation index;
Index database constructs submodule, for the corresponding relationship of inverted index and index piecemeal based on the Poi, building Poi index database.
In an optional implementation of the invention, inverted index determines that submodule may include such as lower unit:
Name character statistic unit is determined for counting to name character included in the title domain information The corresponding frequency of each name character;
First determination unit, for determining the table of falling row chain of each name character, institute according to the corresponding frequency of each name character Stating the table of falling row chain includes: name character number, character position and Poi temperature;
Name index construction unit, for being directed to the title domain information of each Poi, the row of falling based on the name character List construction Poi name index.
In an optional implementation of the invention, inverted index determines that submodule may include such as lower unit:
Address character statistic unit is determined for counting to address character included in the address domain information The corresponding frequency of each address character;
Second determination unit, for determining the table of falling row chain of each address character, institute according to the corresponding frequency of each address character Stating the table of falling row chain includes: address character number, character position and Poi temperature;
Allocation index construction unit, for being directed to the address domain information of each Poi, the row of falling based on the address character List construction Poi allocation index.
In an optional implementation of the invention, index cutting module 1006 may include following submodule:
Inverted index cutting submodule is obtained for carrying out cutting according to inverted index of the urban information to each Poi The inverted index of the corresponding Poi of each urban information;
It indexes piecemeal and generates submodule, for the inverted index based on the corresponding Poi of same urban information, generate corresponding Index piecemeal.
In an optional implementation of the invention, described device can also include following module:
Historical data obtains module, and for obtaining historical search data, the historical search data includes the input of user Method records data, webpage click historical data and map and clicks historical data;
Comprehensive analysis module, for recording data, webpage click history number to the input method in the historical search data Comprehensive analysis is carried out according to historical data is clicked with map, obtains the corresponding Poi temperature of each character.
Referring to Fig.1 1, a kind of structural block diagram of searcher embodiment based on index database of the invention is shown, specifically May include following module:
Information receiving module 1102 includes the use in the query information for receiving the query information of user's input The point of interest Poi character information that family inputs in search process;
Poi identification module 1104, for identifying Poi character information from the query information, and it is current based on user The geographical location at place determines urban information belonging to Poi;
Search index module 1106, for according to urban information search index library belonging to the Poi, obtain with it is described The corresponding target inverted index of Poi character information, the index database include the index piecemeal based on urban information, the index point Block includes the inverted index of Poi;
As a result display module 1108, for generating at least one Poi according to the target inverted index as a result, and to described Poi result is shown.
In an optional implementation of the invention, Poi identification module 1104 may include following submodule:
It segments submodule and obtains at least one word segmentation result for segmenting to the query information;
Submodule is marked, for carrying out part-of-speech tagging to the word segmentation result, obtains corresponding attribute information;
Submodule is analyzed, for being analyzed according to the word segmentation result and the attribute information, determines that Poi character is believed Breath, the Poi character information includes name character information and/or address character information.
In an optional implementation of the invention, search index module 1106 may include following submodule:
Block research submodule, for searching target rope corresponding with urban information belonging to the Poi in index database Draw piecemeal;
Chained list searches submodule, for inquiring the character that the Poi character information is included in target index piecemeal The table of falling row chain;
It indexes and determines submodule, for determining target inverted index according to the table of falling row chain inquired, the target falls to arrange Index includes title Domain Index and/or address Domain Index.
In an optional implementation of the invention, as a result display module 1108 may include following submodule:
Index fusion submodule obtains fused for merging to the title Domain Index and address Domain Index Poi index;
As a result submodule is generated, for corresponding to the table of falling row chain of character based on the Poi index, generates at least one Poi As a result, the table of falling row chain includes: character number, character position and Poi temperature.
In an optional implementation of the invention, as a result display module 1108 may include following submodule:
Submodule is obtained, for being directed to each Poi as a result, obtaining corresponding Poi heat from the table of falling row chain of the character Degree and character position;
Sorting sub-module, for being tied to the Poi according to Poi temperature, character position and the query information obtained Fruit is ranked up processing, determines that each Poi result is corresponding and puts in order;
Submodule is shown, for being shown according to putting in order to each Poi result.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.
Figure 12 is a kind of structural block diagram of equipment 1200 shown according to an exemplary embodiment.For example, equipment 1200 can To be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, Medical Devices are good for Body equipment, personal digital assistant etc.;It is also possible to server device, such as server.
Referring to Fig.1 2, equipment 1200 may include following one or more components: processing component 1202, memory 1204, Power supply module 1206, multimedia component 1208, audio component 1210, the interface 1212 of input/output (I/O), sensor module 1214 and communication component 1216.
Processing component 1202 usually control equipment 1200 integrated operation, such as with display, telephone call, data communication, Camera operation and record operate associated operation.Processing component 1202 may include one or more processors 1220 to execute Instruction, to perform all or part of the steps of the methods described above.In addition, processing component 1202 may include one or more moulds Block, convenient for the interaction between processing component 1202 and other assemblies.For example, processing component 1202 may include multi-media module, To facilitate the interaction between multimedia component 1208 and processing component 1202.
Memory 1204 is configured as storing various types of data to support the operation in equipment 1200.These data Example includes the instruction of any application or method for operating in equipment 1200, contact data, telephone book data, Message, picture, video etc..Memory 1204 can by any kind of volatibility or non-volatile memory device or they Combination is realized, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), it is erasable can Program read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash memory Reservoir, disk or CD.
Power supply module 1206 provides electric power for the various assemblies of equipment 1200.Power supply module 1206 may include power management System, one or more power supplys and other with for equipment 1200 generate, manage, and distribute the associated component of electric power.
Multimedia component 1208 includes the screen of one output interface of offer between the equipment 1200 and user.? In some embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, Screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes that one or more touch passes Sensor is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding is dynamic The boundary of work, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more Media component 1208 includes a front camera and/or rear camera.When equipment 1200 is in operation mode, as shot mould When formula or video mode, front camera and/or rear camera can receive external multi-medium data.Each preposition camera shooting Head and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 1210 is configured as output and/or input audio signal.For example, audio component 1210 includes a wheat Gram wind (MIC), when equipment 1200 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone quilt It is configured to receive external audio signal.The received audio signal can be further stored in memory 1204 or via communication Component 1216 is sent.In some embodiments, audio component 1210 further includes a loudspeaker, is used for output audio signal.
I/O interface 1212 provides interface, above-mentioned peripheral interface module between processing component 1202 and peripheral interface module It can be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and Locking press button.
Sensor module 1214 includes one or more sensors, and the state for providing various aspects for equipment 1200 is commented Estimate.For example, sensor module 1214 can detecte the state that opens/closes of equipment 1200, the relative positioning of component, such as institute The display and keypad that component is equipment 1200 are stated, sensor module 1214 can be with detection device 1200 or equipment 1,200 1 It the position change of a component, the existence or non-existence that user contacts with equipment 1200,1200 orientation of equipment or acceleration/deceleration and sets Standby 1200 temperature change.Sensor module 1214 may include proximity sensor, be configured in not any physics It is detected the presence of nearby objects when contact.Sensor module 1214 can also include optical sensor, as CMOS or ccd image are sensed Device, for being used in imaging applications.In some embodiments, which can also include acceleration sensing Device, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 1216 is configured to facilitate the communication of wired or wireless way between equipment 1200 and other equipment.If Standby 1200 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.It is exemplary at one In embodiment, communication component 1216 receives broadcast singal or broadcast correlation from external broadcasting management system via broadcast channel Information.In one exemplary embodiment, the communication component 1216 further includes near-field communication (NFC) module, to promote short distance Communication.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module (UWB) technology, bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, equipment 1200 can be by one or more application specific integrated circuit (ASIC), number Signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 1204 of instruction, above-metioned instruction can be executed by the processor 1220 of equipment 1200 to complete the above method.Example Such as, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, soft Disk and optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is held by the processor of equipment It when row, enables a device to execute a kind of index base construction method, which comprises carry out the data source of point of interest Poi Scanning, determines the attribute information and urban information of each Poi;The inverted index of attribute information creation Poi according to each Poi;Foundation The urban information carries out cutting to the inverted index of Poi, obtains the index piecemeal based on urban information;According to the index point Block constructs Poi index database.
Optionally, the attribute information is what is determined based on the inquiry field of Poi, and the inverted index of the Poi includes: to be based on The inverted index of single character.The inverted index of the attribute information creation Poi according to each Poi, comprising: respectively from each Poi Attribute information in, extract each Poi address domain information and/or title domain information;The title word for including to each Poi title-domain information The address character that symbol and address domain information include is counted, and determines the inverted index based on single character.
Optionally, described to construct Poi index database according to the index piecemeal, comprising: to construct the inverted index and index of Poi The corresponding relationship of piecemeal, the inverted index include at least one of the following: Poi name index and Poi allocation index;Based on described The inverted index of Poi and the corresponding relationship of index piecemeal, construct Poi index database.
Optionally, the name character for including to each Poi title-domain information counts, and determines the row of falling based on single character Index, comprising: name character included in the title domain information is counted, determines the corresponding frequency of each name character Rate;According to the corresponding frequency of each name character, determine that the table of falling row chain of each name character, the table of falling row chain include: title word Symbol number, character position and Poi temperature;For the title domain information of each Poi, the row chain based on the name character Table constructs Poi name index.
Optionally, the address character for including to each Poi address domain information counts, and determines the row of falling based on single character Index, comprising: address character included in the address domain information is counted, determines the corresponding frequency of each address character Rate;According to the corresponding frequency of each address character, determine that the table of falling row chain of each address character, the table of falling row chain include: address word Symbol number, character position and Poi temperature;For the address domain information of each Poi, the row chain based on the address character Table constructs Poi allocation index.
Optionally, cutting is carried out according to inverted index of the urban information to Poi, obtains the index based on urban information Piecemeal, comprising: carry out cutting according to inverted index of the urban information to each Poi, obtain the corresponding Poi's of each urban information Inverted index;Based on the inverted index of the corresponding Poi of same urban information, corresponding index piecemeal is generated.
Optionally, described by one or more than one processor before determining the inverted index based on single character Executing the one or more programs includes the instruction for being also used to perform the following operation: obtaining historical search data, institute It states input method record data, webpage click historical data and map that historical search data includes user and clicks historical data;It is right Input method record data, webpage click historical data and map in the historical search data are clicked historical data and are integrated Analysis, obtains the corresponding Poi temperature of each character.
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is held by the processor of equipment It when row, enables a device to execute a kind of searching method based on index database, which comprises receive the inquiry of user's input Information, the point of interest Poi character information inputted in search process comprising the user in the query information;From the inquiry Poi character information is identified in information, and the geographical location being currently located based on user determines urban information belonging to Poi;According to According to urban information search index library belonging to the Poi, target inverted index corresponding with the Poi character information, institute are obtained Stating index database includes the index piecemeal based on urban information, and the index piecemeal includes the inverted index of Poi;According to the target Inverted index generates at least one Poi as a result, and being shown to the Poi result.
Optionally, optionally, Poi character information is identified from the query information, comprising: to the query information into Row participle, obtains at least one word segmentation result;Part-of-speech tagging is carried out to the word segmentation result, obtains corresponding attribute information;According to It is analyzed according to the word segmentation result and the attribute information, determines Poi character information, the Poi character information includes name Claim character information and/or address character information.
Optionally, it according to urban information search index library belonging to the Poi, obtains corresponding with the Poi character information Target inverted index, comprising: in index database, search corresponding with urban information belonging to Poi target index piecemeal; In target index piecemeal, the table of falling row chain for the character that the Poi character information is included is inquired;According to the row chain of falling inquired Table determines that target inverted index, the target inverted index include title Domain Index and/or address Domain Index.
Optionally, at least one Poi result is generated according to the target inverted index, comprising: to the title Domain Index It is merged with address Domain Index, obtains fused Poi index;The table of falling row chain of character is corresponded to based on the Poi index, it is raw At at least one Poi as a result, the table of falling row chain includes: character number, character position and Poi temperature.
Optionally, described that Poi result is shown, comprising: for each Poi as a result, arranging from the character Corresponding Poi temperature and character position are obtained in chained list;Poi temperature, character position and the query information that foundation obtains, Processing is ranked up to the Poi result, determines that each Poi result is corresponding and puts in order;According to putting in order to each Poi result It is shown.
Figure 13 is the structural schematic diagram of server in the embodiment of the present invention.The server 1300 can be different because of configuration or performance And generate bigger difference, may include one or more central processing units (central processing units, CPU) 1322 (for example, one or more processors) and memory 1332, one or more storage application programs 1342 or data 1344 storage medium 1330 (such as one or more mass memory units).Wherein, memory 1332 It can be of short duration storage or persistent storage with storage medium 1330.Be stored in storage medium 1330 program may include one or More than one module (diagram does not mark), each module may include to the series of instructions operation in server.Further Ground, central processing unit 1322 can be set to communicate with storage medium 1330, and storage medium 1330 is executed on server 1300 In series of instructions operation.
Server 1300 can also include one or more power supplys 1326, one or more wired or wireless nets Network interface 1350, one or more input/output interfaces 1358, one or more keyboards 1356, and/or, one or More than one operating system 1341, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form of the computer program product of implementation.
The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram The device of specified function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram The function of being specified in frame or multiple boxes.
These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart And/or in one or more blocks of the block diagram specify function the step of.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.
Above to a kind of index base construction method provided by the present invention and device, a kind of searching method based on index database It with device, a kind of equipment and a kind of readable storage medium storing program for executing, is described in detail, specific case used herein is to this The principle and embodiment of invention is expounded, method of the invention that the above embodiments are only used to help understand and Its core concept;At the same time, for those skilled in the art in specific embodiment and is answered according to the thought of the present invention With in range, there will be changes, in conclusion the contents of this specification are not to be construed as limiting the invention.

Claims (10)

1. a kind of index base construction method characterized by comprising
The data source of point of interest Poi is scanned, determines the attribute information and urban information of each Poi;
The inverted index of attribute information creation Poi according to each Poi;
Cutting is carried out according to inverted index of the urban information to Poi, obtains the index piecemeal based on urban information;
Poi index database is constructed according to the index piecemeal.
2. the method according to claim 1, wherein the attribute information be based on Poi inquiry field determine, The inverted index of the Poi includes: the inverted index based on single character, the attribute information creation Poi's according to each Poi Inverted index, comprising:
Respectively from the attribute information of each Poi, each Poi address domain information and/or title domain information are extracted;
The address character that the name character and address domain information for include to each Poi title-domain information include counts, and determines Inverted index based on single character.
3. according to the method described in claim 2, it is characterized in that, described construct Poi index database, packet according to the index piecemeal It includes:
The inverted index of Poi and the corresponding relationship of index piecemeal are constructed, the inverted index includes at least one of the following: Poi Claim index and Poi allocation index;
The corresponding relationship of inverted index and index piecemeal based on the Poi, constructs Poi index database.
4. a kind of searching method based on index database characterized by comprising
Receive the query information of user's input, the point of interest inputted in search process in the query information comprising the user Poi character information;
Poi character information is identified from the query information, and the geographical location being currently located based on user is determined belonging to Poi Urban information;
According to urban information search index library belonging to the Poi, obtains target corresponding with the Poi character information and fall to arrange rope Draw, the index database includes the index piecemeal based on urban information, and the index piecemeal includes the inverted index of Poi;
At least one Poi is generated according to the target inverted index as a result, and being shown to the Poi result.
5. a kind of index database construction device characterized by comprising
Data source scan module is scanned for the data source to point of interest Poi, determines attribute information and the city of each Poi Information;
Index creation module, the inverted index for the attribute information creation Poi according to each Poi;
Cutting module is indexed, for carrying out cutting according to inverted index of the urban information to Poi, is obtained based on urban information Index piecemeal;
Index database constructs module, for constructing Poi index database according to the index piecemeal.
6. a kind of searcher based on index database characterized by comprising
Information receiving module is being searched in the query information comprising the user for receiving the query information of user's input The point of interest Poi character information inputted in the process;
Poi identification module, for identifying Poi character information, and the ground being currently located based on user from the query information Reason position determines urban information belonging to Poi;
Search index module, for obtaining and believing with the Poi character according to urban information search index library belonging to the Poi Corresponding target inverted index is ceased, the index database includes the index piecemeal based on urban information, and the index piecemeal includes Poi Inverted index;
As a result display module, for generating at least one Poi according to the target inverted index as a result, and to the Poi result It is shown.
7. a kind of equipment, which is characterized in that include memory and one or more than one program, one of them or More than one program of person is stored in memory, and be configured to be executed by one or more than one processor it is one or More than one program of person includes the instruction for performing the following operation:
The data source of point of interest Poi is scanned, determines the attribute information and urban information of each Poi;
The inverted index of attribute information creation Poi according to each Poi;
Cutting is carried out according to inverted index of the urban information to Poi, obtains the index piecemeal based on urban information;
Poi index database is constructed according to the index piecemeal.
8. a kind of equipment, which is characterized in that include memory and one or more than one program, one of them or More than one program of person is stored in memory, and be configured to be executed by one or more than one processor it is one or More than one program of person includes the instruction for performing the following operation:
Receive the query information of user's input, the point of interest inputted in search process in the query information comprising the user Poi character information;
Poi character information is identified from the query information, and the geographical location being currently located based on user is determined belonging to Poi Urban information;
According to urban information belonging to the Poi, target corresponding with the Poi character information is inquired from index database and falls to arrange rope Draw, the index database includes the index piecemeal based on urban information, and the index piecemeal includes the inverted index of Poi;
At least one Poi is generated according to the target inverted index as a result, and being shown to the Poi result.
9. a kind of readable storage medium storing program for executing, which is characterized in that when the instruction in the storage medium is executed by the processor of equipment, Enable a device to execute the index base construction method as described in one or more of claim to a method 1-3.
10. a kind of readable storage medium storing program for executing, which is characterized in that when the instruction in the storage medium is executed by the processor of equipment When, it enables a device to execute the searching method based on index database as in claim to a method 4.
CN201710901601.3A 2017-09-28 2017-09-28 Index library construction method, search method and device Active CN110019645B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710901601.3A CN110019645B (en) 2017-09-28 2017-09-28 Index library construction method, search method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710901601.3A CN110019645B (en) 2017-09-28 2017-09-28 Index library construction method, search method and device

Publications (2)

Publication Number Publication Date
CN110019645A true CN110019645A (en) 2019-07-16
CN110019645B CN110019645B (en) 2022-04-19

Family

ID=67186336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710901601.3A Active CN110019645B (en) 2017-09-28 2017-09-28 Index library construction method, search method and device

Country Status (1)

Country Link
CN (1) CN110019645B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765262A (en) * 2019-09-24 2020-02-07 北京嘀嘀无限科技发展有限公司 POI text retrieval method and device and electronic equipment
CN111008625A (en) * 2019-12-06 2020-04-14 中国建设银行股份有限公司 Address correction method, device, equipment and storage medium
CN112197779A (en) * 2020-09-14 2021-01-08 汉海信息技术(上海)有限公司 Navigation path planning method and device and printing equipment
CN112214573A (en) * 2020-10-30 2021-01-12 数贸科技(北京)有限公司 Information search system, method, computing device, and computer storage medium
CN112685540A (en) * 2021-01-07 2021-04-20 深圳市欢太科技有限公司 Search method, search device, storage medium and terminal
CN113672627A (en) * 2021-09-08 2021-11-19 湖南惠农科技有限公司 Elasticissearch search engine index construction method and device
CN113743054A (en) * 2021-08-17 2021-12-03 上海明略人工智能(集团)有限公司 Alphabet vector learning method, system, storage medium and electronic device
CN114661688A (en) * 2022-03-25 2022-06-24 马上消费金融股份有限公司 Address error correction method and device

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101432687A (en) * 2006-05-12 2009-05-13 电子地图北美公司 Locality indexes and method for indexing localities
CN102024375A (en) * 2009-09-23 2011-04-20 深圳市天翼方向科技有限公司 Special formatting and calculation method of point of interest (POI) data of electronic map
CN102147795A (en) * 2010-02-05 2011-08-10 北京四维图新科技股份有限公司 Method and device for searching points of interest as well as navigation system
EP2354984A1 (en) * 2010-02-08 2011-08-10 Navteq North America, LLC Full text search in navigation systems
CN102456055A (en) * 2010-10-28 2012-05-16 腾讯科技(深圳)有限公司 Method and device for retrieving interest points
CN102831224A (en) * 2012-08-24 2012-12-19 北京百度网讯科技有限公司 Creating method for data index base and searching suggest generation method and device
CN102944243A (en) * 2012-11-16 2013-02-27 沈阳美行科技有限公司 Navigation device and method capable of updating increment of map data
CN103185581A (en) * 2011-12-28 2013-07-03 上海博泰悦臻电子设备制造有限公司 Information prompting device and prompting method for POI search results
CN103226559A (en) * 2012-01-26 2013-07-31 现代自动车株式会社 Indexing system of spatial information for combined SOI object and content
CN103577442A (en) * 2012-07-30 2014-02-12 腾讯科技(深圳)有限公司 Method and device for calculating map data importance
CN103714092A (en) * 2012-09-29 2014-04-09 北京百度网讯科技有限公司 Geographic position searching method and geographic position searching device
CN103902626A (en) * 2012-12-30 2014-07-02 上海易罗信息科技有限公司 Interest point search method and device and equipment with device
US20150163854A1 (en) * 2013-12-09 2015-06-11 Hyundai Motor Company System and method for providing communication service, and vehicle supporting the same
US20160360336A1 (en) * 2015-05-27 2016-12-08 Apple Inc. Systems and Methods for Proactively Identifying and Surfacing Relevant Content on a Touch-Sensitive Device
CN106874287A (en) * 2015-12-11 2017-06-20 北京四维图新科技股份有限公司 A kind of processing method and processing device of point of interest POI geocodings

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101432687A (en) * 2006-05-12 2009-05-13 电子地图北美公司 Locality indexes and method for indexing localities
CN102024375A (en) * 2009-09-23 2011-04-20 深圳市天翼方向科技有限公司 Special formatting and calculation method of point of interest (POI) data of electronic map
CN102147795A (en) * 2010-02-05 2011-08-10 北京四维图新科技股份有限公司 Method and device for searching points of interest as well as navigation system
EP2354984A1 (en) * 2010-02-08 2011-08-10 Navteq North America, LLC Full text search in navigation systems
CN102456055A (en) * 2010-10-28 2012-05-16 腾讯科技(深圳)有限公司 Method and device for retrieving interest points
CN103185581A (en) * 2011-12-28 2013-07-03 上海博泰悦臻电子设备制造有限公司 Information prompting device and prompting method for POI search results
CN103226559A (en) * 2012-01-26 2013-07-31 现代自动车株式会社 Indexing system of spatial information for combined SOI object and content
CN103577442A (en) * 2012-07-30 2014-02-12 腾讯科技(深圳)有限公司 Method and device for calculating map data importance
CN102831224A (en) * 2012-08-24 2012-12-19 北京百度网讯科技有限公司 Creating method for data index base and searching suggest generation method and device
CN103714092A (en) * 2012-09-29 2014-04-09 北京百度网讯科技有限公司 Geographic position searching method and geographic position searching device
CN102944243A (en) * 2012-11-16 2013-02-27 沈阳美行科技有限公司 Navigation device and method capable of updating increment of map data
CN103902626A (en) * 2012-12-30 2014-07-02 上海易罗信息科技有限公司 Interest point search method and device and equipment with device
US20150163854A1 (en) * 2013-12-09 2015-06-11 Hyundai Motor Company System and method for providing communication service, and vehicle supporting the same
US20160360336A1 (en) * 2015-05-27 2016-12-08 Apple Inc. Systems and Methods for Proactively Identifying and Surfacing Relevant Content on a Touch-Sensitive Device
CN106874287A (en) * 2015-12-11 2017-06-20 北京四维图新科技股份有限公司 A kind of processing method and processing device of point of interest POI geocodings

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
F. WANG 等: ""A visual reasoning approach for data-driven transport assessment on urban roads,"", 《2014 IEEE CONFERENCE ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY 》 *
汪飞 等: ""面向多源城市出行数据的可视化查询模型"", 《计算机辅助设计与图形学学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765262A (en) * 2019-09-24 2020-02-07 北京嘀嘀无限科技发展有限公司 POI text retrieval method and device and electronic equipment
CN111008625A (en) * 2019-12-06 2020-04-14 中国建设银行股份有限公司 Address correction method, device, equipment and storage medium
CN112197779A (en) * 2020-09-14 2021-01-08 汉海信息技术(上海)有限公司 Navigation path planning method and device and printing equipment
CN112214573A (en) * 2020-10-30 2021-01-12 数贸科技(北京)有限公司 Information search system, method, computing device, and computer storage medium
CN112685540A (en) * 2021-01-07 2021-04-20 深圳市欢太科技有限公司 Search method, search device, storage medium and terminal
CN113743054A (en) * 2021-08-17 2021-12-03 上海明略人工智能(集团)有限公司 Alphabet vector learning method, system, storage medium and electronic device
CN113672627A (en) * 2021-09-08 2021-11-19 湖南惠农科技有限公司 Elasticissearch search engine index construction method and device
CN113672627B (en) * 2021-09-08 2023-08-18 湖南惠农科技有限公司 Method and device for constructing index of elastic search engine
CN114661688A (en) * 2022-03-25 2022-06-24 马上消费金融股份有限公司 Address error correction method and device
CN114661688B (en) * 2022-03-25 2023-09-19 马上消费金融股份有限公司 Address error correction method and device

Also Published As

Publication number Publication date
CN110019645B (en) 2022-04-19

Similar Documents

Publication Publication Date Title
CN110019645A (en) Index base construction method, searching method and device
US10095711B2 (en) Method and apparatus of recommending candidate terms based on geographical location
KR101343609B1 (en) Apparatus and Method for Automatically recommending Application using Augmented Reality Data
EP2518642A1 (en) Method and terminal device for updating word stock
CN109564571A (en) Utilize the inquiry recommended method and system of search context
CN108701143A (en) Promote the use of image in the search query
TW201604698A (en) Method and device for pushing track information
CN108241690A (en) A kind of data processing method and device, a kind of device for data processing
CN111984749B (en) Interest point ordering method and device
KR20130090612A (en) Method and system for providing location based contents by analyzing keywords on social network service
CN111382744B (en) Shop information acquisition method and device, terminal equipment and storage medium
JP6756744B2 (en) Location information provision method and equipment
KR20160133304A (en) Apparatus, method and computer program for providing user review
KR20190047200A (en) Platform for providing smart sightseeing information based on bid data
CN104850238A (en) Method and device for sorting candidate items generated by input method
CN109101505A (en) A kind of recommended method, recommendation apparatus and the device for recommendation
CN103955480A (en) Method and equipment for determining target object information corresponding to user
KR20150032141A (en) Semantic searching system and method for smart device
KR20140006516A (en) System and method for providing location based contents service
CN109521888A (en) A kind of input method, device and medium
Tan et al. Preference-oriented mining techniques for location-based store search
CN114301973A (en) Information recommendation processing method and device
CN110309431A (en) A kind of data processing method, device and electronic equipment
CN116662583A (en) Text generation method, place retrieval method and related devices
KR101391532B1 (en) Surrounding search service system based on location information and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220921

Address after: 100084. Room 9, floor 01, cyber building, building 9, building 1, Zhongguancun East Road, Haidian District, Beijing

Patentee after: BEIJING SOGOU TECHNOLOGY DEVELOPMENT Co.,Ltd.

Patentee after: Beijing Sogou Network Technology Co.,Ltd.

Address before: 100084. Room 9, floor 01, cyber building, building 9, building 1, Zhongguancun East Road, Haidian District, Beijing

Patentee before: BEIJING SOGOU TECHNOLOGY DEVELOPMENT Co.,Ltd.

TR01 Transfer of patent right