WO2016050088A1 - 一种地址搜索方法和设备 - Google Patents

一种地址搜索方法和设备 Download PDF

Info

Publication number
WO2016050088A1
WO2016050088A1 PCT/CN2015/079816 CN2015079816W WO2016050088A1 WO 2016050088 A1 WO2016050088 A1 WO 2016050088A1 CN 2015079816 W CN2015079816 W CN 2015079816W WO 2016050088 A1 WO2016050088 A1 WO 2016050088A1
Authority
WO
WIPO (PCT)
Prior art keywords
address
address information
information
searched
sub
Prior art date
Application number
PCT/CN2015/079816
Other languages
English (en)
French (fr)
Inventor
齐泉
张九龙
李航
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP15846022.0A priority Critical patent/EP3153978B1/en
Publication of WO2016050088A1 publication Critical patent/WO2016050088A1/zh
Priority to US15/398,260 priority patent/US10783171B2/en
Priority to US16/929,611 priority patent/US20200349175A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/36Input/output arrangements for on-board computers
    • G01C21/3605Destination input or retrieval
    • G01C21/3611Destination input or retrieval using character input or menus, e.g. menus of POIs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • the present invention relates to the field of data processing technologies, and in particular, to an address search method and device.
  • the user may initiate a navigation request to the mobile terminal by using a voice mode or a text input mode, and the navigation application selects a navigation route according to the received navigation request, and pushes the selected navigation route to the user.
  • the manner in which the navigation application selects the navigation route according to the received navigation request includes but is not limited to:
  • the CRF International Random Field
  • Chinese Conditional Random Field
  • the address information included in the voice is used as a search basis to determine a target address of the navigation request.
  • the defect is that when determining the target address, the address name (or building name/unit name) extracted from the text or voice is used to match the address information in the address database, so that the determined target address set is included.
  • a large number of unrelated addresses reduce the search accuracy of the address.
  • the embodiment of the present invention provides an address search method and device, which is used to solve the problem that a large number of unrelated addresses are searched in the address search process, resulting in low address search accuracy.
  • an address search method comprising:
  • Target address information into at least one sub-address information, where the target address information is composed of a plurality of different sub-address information, and the plurality of different sub-address information respectively correspond to different address types;
  • the address information obtained by the matching with the matching degree greater than the set threshold is output as the searched target address information.
  • obtaining the address search request information includes:
  • the target address information to be searched is obtained by:
  • For each of the determined keywords performing: in a text address dictionary corresponding to different address types set in advance, finding a text address dictionary containing the keyword; using the text for characterizing the found a string of the address type corresponding to the address dictionary, replacing the keyword;
  • the target address information to be searched is extracted in the address search request information based on the CRF algorithm.
  • determining a quasi-address string corresponding to the target address information to be searched according to the string group includes:
  • the character string group includes a plurality of character strings, and the keywords respectively replaced in the plurality of character strings are consecutively located in the address search request information;
  • the positional continuity of the plurality of character strings based on the replaced keywords is merged into a string group as a quasi-address string corresponding to the target address information to be searched;
  • the repeated string is removed, and at least one string after the repeated string is removed is merged into a string group based on the positional continuity of the replaced keyword as the target address to be searched.
  • the quasi-address string corresponding to the information.
  • the splitting the target address information into the at least one sub-address information includes:
  • Sub-address information corresponding to different address types is split from the target address information according to an address type indicated by a text address dictionary corresponding to a different address type set in advance.
  • the address type includes one of the following or A variety of information combinations:
  • Administrative area information road name information, building/unit name information, subsidiary content of the road name, and subsidiary content of the building/unit name.
  • the at least one The subaddress information or the at least one subaddress information and the target address information are matched with different address information included in the address database, including: at least one subaddress information in the target address information and the address database respectively The subaddress information with the same address type is matched accordingly.
  • the at least one The subaddress information or the at least one subaddress information and the target address information are matched with different address information included in the address database, including:
  • the address type of the included subaddress information is the same as the address type of the subaddress information included in the selected address information
  • the total matching degree between the selected address information and the target address information to be searched is obtained according to the first matching degree, including:
  • the sub-address information included in the target address to be searched is matched with the sub-address information included in the selected address information to obtain a first matching degree, including:
  • the first matching degree is calculated according to each edited distance corresponding to each sub-address information included in the target address information to be searched.
  • the matched address information that is greater than the set threshold is output, and is output as the searched target address information, including:
  • the total matching degree of each selected address information and the target address information to be searched is obtained, and the total matching degree of the set number is sequentially determined according to the total matching degree from the largest to the smallest;
  • the determined total matching degree is respectively output corresponding to the selected address information as the searched target address information.
  • an address search device comprising:
  • An obtaining module configured to obtain address search request information, and determine target address information to be searched included in the address search request information
  • a splitting module configured to split the target address information determined by the obtaining module into at least one sub-address information, where the target address information is composed of a plurality of different sub-address information, the multiple different The subaddress information corresponds to different address types respectively;
  • a search module configured to match the at least one sub-address information or the at least one sub-address information and the target address information obtained by the splitting module with different address information included in an address database, where Each piece of address information stored in the address database includes different sub-address information constituting the address information; the address information obtained by matching the matching degree greater than the set threshold is output as the searched target address information.
  • the acquiring module is specifically configured to receive input voice data, where the voice data is used to initiate an address search;
  • the obtaining module is specifically configured to obtain the to-be-searched by: Destination address information:
  • For each of the determined keywords performing: in a text address dictionary corresponding to a different address type set in advance, finding a text address dictionary containing the keyword; using a string for characterizing the address type corresponding to the found text address dictionary , replacing the keyword;
  • each keyword is replaced with a corresponding Whether the string group formed by the string represents the address information
  • the target address information to be searched is extracted in the address search request information based on the CRF algorithm.
  • the acquiring module is specifically configured to determine a quasi-address character corresponding to the target address information to be searched according to the string group String, specifically including:
  • the character string group includes a plurality of character strings, and the keywords respectively replaced in the plurality of character strings are consecutively located in the address search request information;
  • the positional continuity of the plurality of character strings based on the replaced keywords is merged into a string group as a quasi-address string corresponding to the target address information to be searched;
  • the repeated string is removed, and at least one string after the repeated string is removed is merged into a string group based on the positional continuity of the replaced keyword as the target address to be searched.
  • the quasi-address string corresponding to the information.
  • the splitting module is specifically configured to remove the target address information according to an address type represented by a text address dictionary corresponding to a different address type set in advance. Sub-address information corresponding to different address types is separated.
  • the address type includes one or more information combinations of the following:
  • Administrative area information road name information, building/unit name information, subsidiary content of the road name, and subsidiary content of the building/unit name.
  • the search module Specifically, the at least one of the target address information is matched with the sub-address information of the address type in the address database.
  • the search module in combination with a possible embodiment of the second aspect of the invention, or a first possible embodiment of the second aspect of the invention, or a second possible embodiment of the second aspect of the invention, or a third aspect of the second aspect of the invention a possible implementation, or a fourth possible implementation of the second aspect of the present invention, or a fifth possible implementation of the second aspect of the present invention, in a seventh possible manner, the search module, Specifically, the method is used to select an address information from an address database, and determine sub-address information included in the selected address information;
  • the address type of the included subaddress information is the same as the address type of the subaddress information included in the selected address information
  • the searching module is specifically configured to obtain the selected address information and the target address to be searched according to the first matching degree.
  • the total matching of information including:
  • the search module is specifically configured to perform matching calculation on the sub-address information included in the target address to be searched and the sub-address information included in the selected address information to obtain a first matching degree, which specifically includes:
  • the first matching degree is calculated according to each edited distance corresponding to each sub-address information included in the target address information to be searched.
  • the searching module is configured to output the matched address information with a matching degree greater than a set threshold as the searched target address information. Specifically include:
  • the total matching degree of each selected address information and the target address information to be searched is obtained, and the total matching degree of the set number is sequentially determined according to the total matching degree from the largest to the smallest;
  • the determined total matching degree is respectively output corresponding to the selected address information as the searched target address information.
  • an address search device comprising:
  • a signal receiver configured to obtain address search request information, and determine target address information to be searched included in the address search request information
  • a processor configured to split the target address information into at least one sub-address information, where the target address information is composed of a plurality of different sub-address information, where the plurality of different sub-address information respectively correspond to different An address type; matching the at least one sub-address information or the at least one sub-address information and the target address information with different address information included in an address database, wherein each piece of address information stored in the address database Contains different subaddress information constituting the address information;
  • the address information obtained by the matching with the matching degree greater than the set threshold is output as the searched target address information.
  • the processor specifically performs:
  • the processor specifically performs:
  • the target address information to be searched is obtained by:
  • For each of the determined keywords performing: in a text address dictionary corresponding to a different address type set in advance, finding a text address dictionary containing the keyword; using a string for characterizing the address type corresponding to the found text address dictionary , replacing the keyword;
  • the target address information to be searched is extracted in the address search request information based on the CRF algorithm.
  • the processor specifically performs:
  • the character string group includes a plurality of character strings, and the keywords respectively replaced in the plurality of character strings are consecutively located in the address search request information;
  • the positional continuity of the plurality of character strings based on the replaced keywords is merged into a string group as a quasi-address string corresponding to the target address information to be searched;
  • the repeated string is removed, and at least one string after the repeated string is removed is merged into a string group based on the positional continuity of the replaced keyword as the target address to be searched.
  • the quasi-address string corresponding to the information.
  • the processor 31 specifically:
  • Splitting the target address information into at least one sub-address information including:
  • Sub-address information corresponding to different address types is split from the target address information according to an address type indicated by a text address dictionary corresponding to a different address type set in advance.
  • the address type includes one or more information combinations of the following:
  • Administrative area information road name information, building/unit name information, subsidiary content of the road name, and subsidiary content of the building/unit name.
  • the processor In conjunction with a possible embodiment of the third aspect of the invention, or in combination with the first possible embodiment of the third aspect of the invention, or in combination with the second possible embodiment of the third aspect of the invention, or in combination with the third aspect of the third aspect of the invention a possible implementation, or a fourth possible implementation of the third aspect of the present invention, or a fifth possible implementation of the third aspect of the present invention, in a sixth possible manner, the processor, Specifically, the at least one sub-address information in the target address information is matched with the sub-address information of the same address type in the address database.
  • the processor In conjunction with a possible embodiment of the third aspect of the invention, or in combination with the first possible embodiment of the third aspect of the invention, or in combination with the second possible embodiment of the third aspect of the invention, or in combination with the third aspect of the third aspect of the invention a possible implementation, or a fourth possible implementation of the third aspect of the present invention, or a fifth possible implementation of the third aspect of the present invention, in a seventh possible manner, the processor, Specific implementation:
  • Matching the at least one sub-address information or the at least one sub-address information and the target address information with different address information included in the address database including:
  • the address type of the included subaddress information is the same as the address type of the subaddress information included in the selected address information
  • the processor specifically performs:
  • a total matching degree between the selected address information and the target address information to be searched specifically:
  • the processor specifically executes:
  • the sub-address information included in the target address to be searched is matched with the sub-address information included in the selected address information to obtain a first matching degree, including:
  • the first matching degree is calculated according to each edited distance corresponding to each sub-address information included in the target address information to be searched.
  • the matching address information whose matching degree is greater than the set threshold is outputted as the searched target address information, including:
  • the total matching degree of each selected address information and the target address information to be searched is obtained, and the total matching degree of the set number is sequentially determined according to the total matching degree from the largest to the smallest;
  • the determined total matching degree is respectively output corresponding to the selected address information as the searched target address information.
  • the embodiment of the present invention obtains the address search request information, and determines the target address information to be searched in the address search request information, and splits the target address information into at least one sub-address information, where the target address information is The plurality of different sub-address information respectively corresponding to different address types; the at least one sub-address information or the at least one sub-address information and the target address information and address Matching different address information included in the database, wherein each piece of address information stored in the address database includes different sub-address information constituting the address information; and matching the obtained address information with a matching degree greater than a set threshold as a search
  • the destination address information to be output is output.
  • the sub-address information corresponding to the target address information is extracted from the address search request information, and the correct rate of extracting the target address information is improved; and the extracted sub-address information and the extracted The target address information is matched with different address information included in the address database, and the address information with the matching degree greater than the set threshold is used as the searched target address information, which effectively improves the accuracy of the address matching and the accuracy of the address search.
  • FIG. 1 is a schematic structural diagram of an address search system according to Embodiment 1 of the present invention.
  • FIG. 2 is a schematic structural diagram of an address search device according to Embodiment 2 of the present invention.
  • FIG. 3 is a schematic structural diagram of an address search device according to Embodiment 3 of the present invention.
  • FIG. 4 is a schematic flowchart diagram of an address searching method according to Embodiment 4 of the present invention.
  • an embodiment of the present invention provides an address search method and device. Obtaining address search request information, and determining target address information to be searched included in the address search request information; splitting the target address information into at least one sub-address information, where the target address information is composed of multiple different And the plurality of different sub-address information respectively corresponding to different address types; and the at least one sub-address information or the at least one sub-address information and the target address information and the address database are included Matching different address information, wherein each piece of address information stored in the address database includes different sub-address information constituting the address information; and matching the obtained address information with a matching degree greater than a set threshold as the searched target address Information is output.
  • the sub-address information corresponding to the target address information is extracted from the address search request information, and the correct rate of extracting the target address information is improved; and the extracted sub-address information and the extracted The target address information is matched with different address information included in the address database, and the address information with the matching degree greater than the set threshold is used as the searched target address information, which effectively improves the accuracy of the address matching and the accuracy of the address search.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • FIG. 1 is a schematic structural diagram of an address search system according to Embodiment 1 of the present invention.
  • the address search system includes a receiving device 11, an address extracting device 12, and an address matching device 13.
  • the receiving device 11 is configured to obtain address search request information.
  • the address extraction device 12 is configured to determine target address information to be searched in the address search request information, and split the target address information into at least one sub-address information, where the target address information is Composed of different sub-address information, the plurality of different sub-address information respectively correspond to different address types.
  • the receiving device 11 is specifically configured to receive input text data, where the Text data is used to initiate an address search.
  • the address extraction device 12 is specifically configured to identify the text data, and obtain target address information to be searched included in the text data.
  • the text data of the input is “Exit No. 4 of Caoyang Road Station, Shanghai”, and it can be determined that the target address information to be searched in the text data is: Exit 4 of Shanghai Caoyang Road Station.
  • the receiving device 11 is specifically configured to receive input voice data, where the voice data is used to initiate an address search.
  • the address extraction device 12 is specifically configured to identify the voice data, and obtain target address information to be searched included in the voice data.
  • the voice data received is “to the No. 4 exit of Caoyang Road Station in Shanghai”, and it can be determined that the target address information to be searched in the voice data is: Exit 4 of Shanghai Caoyang Road Station.
  • the address search request information may include other auxiliary information according to language habits, for example, "to", “go” and the like.
  • the address extraction device 12 is specifically configured to obtain target address information to be searched by:
  • For each of the determined keywords performing: in a text address dictionary corresponding to a different address type set in advance, finding a text address dictionary containing the keyword; using a string for characterizing the address type corresponding to the found text address dictionary , replacing the keyword;
  • the target address information to be searched is extracted from the address search request information.
  • the address information is hierarchical, that is, address information indicating an address area, for example, address information corresponding to an administrative area, for example, Beijing as an address information, then on the map, "Beijing ” corresponds to an address area; there is also address information indicating a specific location, for example: XX City XX District XX Road XX No. XX Building, then this address information corresponds to the specific location of the XX Building.
  • the address information is divided into different address types according to different levels of the address information.
  • the address type includes one or more of the following combinations of information:
  • Administrative area information road name information, building/unit name information, subsidiary content of the road name, and subsidiary content of the building/unit name.
  • the address type corresponding to “XX City XX Area” is: administrative area information; the address type corresponding to “XX Road” is: Road name information; the address type corresponding to "XX” is: the subsidiary content of the road name; the address type corresponding to "XX Building” is: building/unit name information; the address type corresponding to "XX floor XX room” is: building / affiliated content of the unit name.
  • the ancillary content of the road name indicates that there is no meaning to leave the road name in front of it, and it is not possible to locate the specific address by the subsidiary content of the road name alone; similarly, the attachment content of the building/unit name It does not make sense to leave the name of the building/unit in front of it, and it is not possible to locate the specific address by the attachment of the building/unit name.
  • the pre-set text address dictionary corresponding to different address types includes at least: an administrative area dictionary, which specifically includes administrative area information, such as: province, city, district, county, town, township, village, state, alliance, flag, etc., for example, for example : Beijing, Beijing, Shanghai, Shanghai, Shenzhen, Shenzhen, etc.; unit/architect dictionary, including the name of the unit/architect, such as: police station, building, center, building, etc.; street name dictionary, including The name of the street, for example: XX Road, XX Road, XX Station, etc.; in addition, it also contains a stop word dictionary containing words or words indicating termination in the language, such as: arrival, etc.; symbol dictionary, including punctuation marks; digital dictionary, Including numbers and so on.
  • an administrative area dictionary which specifically includes administrative area information, such as: province, city, district, county, town, township, village, state, alliance, flag, etc., for example, for example : Beijing, Beijing, Shanghai,
  • At least one keyword included in the address search request information is: to, Shanghai, Caoyang Road, Station, and Exit 4.
  • Step 1 In the pre-set text address dictionary corresponding to different address types, find a text address dictionary containing the keyword; replace the keyword with a character string for characterizing the address type corresponding to the found text address dictionary.
  • the character string of the address type corresponding to the stop word dictionary is SSS
  • the character string of the address type corresponding to the administrative area dictionary is AAA
  • the character string of the address type corresponding to the street noun dictionary is RRR
  • the character string of the address type corresponding to the digital dictionary is DDD
  • the string of the address type corresponding to the tail dictionary is OOO.
  • the string group obtained after the replacement may be: SSSAAARRRRRRDDDOOOOOO; or may be: SSSAAA city RRR road station DDDOOOOOO, where the text used to represent the address type in the replacement keyword is not specifically limited.
  • the second step using a regular expression to represent the address information, determining whether each of the keywords is replaced by the corresponding string, and the string group formed by the string represents the address information, and when determining that the string group represents the address information Determining, according to the string group, a quasi-address string corresponding to the target address information to be searched.
  • the regular expression uses a single character string to describe and match a string that conforms to a certain syntax rule.
  • the regular expression involved in the embodiment of the present invention describes address information.
  • a string group formed by replacing each keyword with a corresponding character string, and determining whether the string group represents address information by using a regular expression.
  • the "AAA City” and “RRR Road Station” may represent the address information by the judgment of the regular expression; the "SSS” may not represent the address information by the judgment of the regular expression.
  • the third step using the quasi-address string as a conditional random field CRF feature, extracting the target address information to be searched in the address search request information based on the CRF algorithm.
  • the quasi-address string is used as a conditional random field CRF feature, but the CRF feature for extracting the target address information based on the CRF algorithm is not limited to the quasi-address string, and may also include multiple, but The CRF feature that extracts the target address information based on the CRF algorithm includes the quasi-address string as a conditional random field CRF feature.
  • the target address information extracted at this time is: Exit 4 of Shanghai Caoyang Road Station.
  • the address extraction device 12 is specifically configured to determine a quasi-address string corresponding to the target address information to be searched according to the string group, which specifically includes:
  • the character string group includes a plurality of character strings, and the keywords respectively replaced in the plurality of character strings are consecutively located in the address search request information;
  • the positional continuity of the plurality of character strings based on the replaced keywords is merged into a string group as a quasi-address string corresponding to the target address information to be searched;
  • the repeated string is removed, and at least one string after the repeated string is removed is merged into a string group based on the positional continuity of the replaced keyword as the target address to be searched.
  • the quasi-address string corresponding to the information.
  • the address extraction device 12 is configured to split the target address information into at least one sub-address information, where the target address information is composed of a plurality of different sub-address information, and the plurality of different sub-address information Corresponding to different address types.
  • the address extraction device 12 is specifically configured to: split the corresponding address information from the target address information according to an address type represented by a text address dictionary corresponding to a different address type set in advance. Subaddress information for different address types.
  • the sub-address information obtained by splitting "Shanghai Caoyang Road Station Exit 4" is: administrative area information: Shanghai; street name information: Caoyang Road Station; subsidiary information of street name: Exit 4.
  • the address matching device 13 is configured to match the at least one sub-address information or the at least one sub-address information and the target address information with different address information included in an address database, where the address database is Each of the stored address information includes different sub-address information constituting the address information; and the address information obtained by matching the matching degree greater than the set threshold is output as the searched target address information.
  • the address matching device 13 is specifically configured to perform corresponding matching on the at least one sub-address information in the target address information with the sub-address information in the address database.
  • the address matching device 13 is specifically configured to select an address information from the address database, determine sub-address information included in the selected address information, and respectively select at least one sub-address information included in the target address to be searched and the selected address.
  • the sub-address information included in the information is matched and calculated to obtain a first matching degree, wherein the address type of the sub-address information included in the target address to be searched for the matching calculation and the sub-address information included in the selected address information are obtained.
  • the address types are the same; according to the first matching degree, the total matching degree of the selected address information and the target address information to be searched is obtained.
  • the at least one sub-address information of the target address information is matched with the sub-address information of the same address type in the address database, and the address matching device 13 uses the address type corresponding to the sub-address information as The granularity, in which the sub-address information indicating the address type in the target address information to be searched is matched with the sub-address information indicating the same address type in the selected address information, and the first matching degree corresponding to the sub-address information is calculated. .
  • the total matching degree of the selected address information and the target address information to be searched is obtained.
  • the target address information to be searched is calculated and matched with the selected address information to obtain a second matching degree.
  • the address information obtained by the first matching degree can be effectively excluded from the address information including only one or a few sub-address information of the target address information to be searched, for example, the address information only includes: Address information for Exit 4.
  • the address matching device 13 is configured to perform the matching of the sub-address information included in the target address to be searched with the sub-address information included in the selected address information to obtain the first matching degree, which specifically includes:
  • the first matching degree is calculated according to each edited distance corresponding to each sub-address information included in the target address information to be searched.
  • edit distance refers to the minimum number of edit operations required to convert one string to another between two strings.
  • editing operation refers to replacing one character with another character, or inserting one character, or deleting one character.
  • the address matching device 13 is configured to calculate the first matching degree according to the edit distance that is obtained according to each of the sub-address information included in the target address information to be searched, and specifically includes:
  • the sub-address information included in the target address information to be searched is respectively summed corresponding to the obtained edit distance, and the obtained sum value is the first matching degree.
  • the address matching device 13 is specifically configured to output the matched address information with the matching degree greater than the set threshold as the searched target address information, including:
  • the total matching degree of each selected address information and the target address information to be searched is obtained, and the total matching degree of the set number is sequentially determined according to the total matching degree from the largest to the smallest;
  • the determined total matching degree is respectively output corresponding to the selected address information as the searched target address information.
  • set similar threshold may be a system default parameter definition, or may be determined according to actual needs, and is not specifically limited herein.
  • the number of settings may be a system default parameter definition, or may be determined according to actual needs, and is not specifically limited herein.
  • the address search system extracts the sub-address information corresponding to the target address information from the address search request information when acquiring the address search request information, thereby improving the correct rate of extracting the target address information;
  • the sub-address information and the target address information are matched with different address information included in the address database, and the address information with the matching degree greater than the set threshold is used as the searched target address information, thereby effectively improving the correct rate of the address matching. And the accuracy of the address search.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • FIG. 2 is a schematic structural diagram of an address search device according to Embodiment 2 of the present invention.
  • the address searching device includes: an obtaining module 21, a splitting module 22, and a searching module 23, wherein:
  • the obtaining module 21 is configured to obtain address search request information, and determine target address information to be searched included in the address search request information;
  • the splitting module 22 is configured to split the target address information determined by the obtaining module into at least one sub-address information, where the target address information is composed of a plurality of different sub-address information, the plurality of different The subaddress information corresponds to different address types;
  • the searching module 23 is configured to match the at least one sub-address information or the at least one sub-address information and the target address information obtained by the splitting module with different address information included in an address database, where Each piece of address information stored in the address database includes different sub-address information constituting the address information; the address information whose matching degree is greater than a set threshold is output as the searched target address information.
  • the acquiring module 21 is specifically configured to receive input voice data, where the Voice data is used to initiate an address search;
  • the obtaining module 21 is specifically configured to obtain target address information to be searched by:
  • For each of the determined keywords performing: in a text address dictionary corresponding to a different address type set in advance, finding a text address dictionary containing the keyword; using a string for characterizing the address type corresponding to the found text address dictionary , replacing the keyword;
  • the target address information to be searched is extracted in the address search request information based on the CRF algorithm.
  • the obtaining module 21 is specifically configured to determine, according to the string group, a quasi-address string corresponding to the target address information to be searched, which specifically includes:
  • the character string group includes a plurality of character strings, and the keywords respectively replaced in the plurality of character strings are consecutively located in the address search request information;
  • the positional continuity of the plurality of character strings based on the replaced keywords is merged into a string group as a quasi-address string corresponding to the target address information to be searched;
  • the repeated string is removed, and at least one string after the repeated string is removed is merged into a string group based on the positional continuity of the replaced keyword as the target address to be searched.
  • the quasi-address string corresponding to the information.
  • the splitting module 22 is specifically configured to split corresponding address types from the target address information according to an address type represented by a text address dictionary corresponding to different address types set in advance. Subaddress information.
  • the address type includes one or more of the following combinations of information:
  • Administrative area information road name information, building/unit name information, subsidiary content of the road name, and subsidiary content of the building/unit name.
  • the searching module 23 is specifically configured to perform corresponding matching on the at least one sub-address information in the target address information with the sub-address information in the address database.
  • the searching module 23 is specifically configured to select an address information from the address database, and determine sub-address information included in the selected address information;
  • the address type of the included subaddress information is the same as the address type of the subaddress information included in the selected address information
  • the searching module 23 is specifically configured to obtain a total matching degree between the selected address information and the target address information to be searched according to the first matching degree, and specifically includes:
  • the searching module 23 is specifically configured to perform matching calculation on the sub-address information included in the target address to be searched with the sub-address information included in the selected address information to obtain the first matching degree, which specifically includes:
  • the first matching degree is calculated according to each edited distance corresponding to each sub-address information included in the target address information to be searched.
  • the search module 23 is specifically configured to output, as the searched target address information, the address information whose matching degree is greater than the set threshold, including:
  • the total matching degree of each selected address information and the target address information to be searched is obtained, and the total matching degree of the set number is sequentially determined according to the total matching degree from the largest to the smallest;
  • the determined total matching degree is respectively output corresponding to the selected address information as the searched target address information.
  • address search device in the embodiment of the present invention may be implemented in a hardware manner or in a software manner, and the implementation manner is not limited herein.
  • the address search device When obtaining the address search request information, the address search device extracts the sub-address information corresponding to the target address information from the address search request information, improves the correct rate of extracting the target address information, and uses the extracted sub-address information and the target address.
  • the information is matched with different address information contained in the address database, and the address information with the matching degree greater than the set threshold is used as the searched target address information, which effectively improves the accuracy of the address matching and the accuracy of the address search.
  • FIG. 3 is a schematic structural diagram of an address search device according to Embodiment 3 of the present invention.
  • the address search device is provided with the functions described in Embodiment 4 of the present invention.
  • the address search device can employ a general purpose computer system structure, which can be, in particular, a processor based computer.
  • the address search device includes at least one processor 31 and a signal receiver 32.
  • the processor 31 and the signal receiver 32 are connected by a communication bus 33.
  • the signal receiver 32 is configured to acquire address search request information, and determine target address information to be searched included in the address search request information;
  • the processor 31 is configured to split the target address information into at least one sub-address information, where
  • the target address information is composed of a plurality of different sub-address information corresponding to different address types; the at least one sub-address information or the at least one sub-address information and the The target address information is matched with different address information included in the address database, wherein each piece of address information stored in the address database includes different sub-address information constituting the address information;
  • the address information obtained by the matching with the matching degree greater than the set threshold is output as the searched target address information.
  • the processor 31 specifically executes:
  • the processor 31 specifically executes:
  • the target address information to be searched is obtained by:
  • For each of the determined keywords performing: in a text address dictionary corresponding to a different address type set in advance, finding a text address dictionary containing the keyword; using a string for characterizing the address type corresponding to the found text address dictionary , replacing the keyword;
  • the target address information to be searched is extracted in the address search request information based on the CRF algorithm.
  • the processor 31 specifically executes:
  • the character string group includes a plurality of character strings, and the keywords respectively replaced in the plurality of character strings are consecutively located in the address search request information;
  • the positional continuity of the plurality of character strings based on the replaced keywords is merged into a string group as a quasi-address string corresponding to the target address information to be searched;
  • the repeated string is removed, and at least one string after the repeated string is removed is merged into a string group based on the positional continuity of the replaced keyword as the target address to be searched.
  • the quasi-address string corresponding to the information.
  • the processor 31 specifically executes:
  • Splitting the target address information into at least one sub-address information including:
  • Sub-address information corresponding to different address types is split from the target address information according to an address type indicated by a text address dictionary corresponding to a different address type set in advance.
  • the address type includes one or more combination of information:
  • Administrative area information road name information, building/unit name information, subsidiary content of the road name, and subsidiary content of the building/unit name.
  • the processor 31 specifically performs: matching at least one sub-address information in the target address information with sub-address information having the same address type in the address database.
  • the processor 31 specifically executes:
  • Matching the at least one sub-address information or the at least one sub-address information and the target address information with different address information included in the address database including:
  • the calculated address type of the sub-address information included in the target address to be searched is the same as the address type of the sub-address information included in the selected address information;
  • the processor 31 specifically executes:
  • a total matching degree between the selected address information and the target address information to be searched specifically:
  • the processor 31 specifically executes:
  • the sub-address information included in the target address to be searched is matched with the sub-address information included in the selected address information to obtain a first matching degree, including:
  • the first matching degree is calculated according to each edited distance corresponding to each sub-address information included in the target address information to be searched.
  • the processor 31 specifically executes:
  • the matching address information whose matching degree is greater than the set threshold is outputted as the searched target address information, including:
  • the total matching degree of each selected address information and the target address information to be searched is obtained, and the total matching degree of the set number is sequentially determined according to the total matching degree from the largest to the smallest;
  • the determined total matching degree is respectively output corresponding to the selected address information as the searched target address information.
  • the processor 31 may be a general purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present invention.
  • CPU general purpose central processing unit
  • ASIC application-specific integrated circuit
  • the address search device When obtaining the address search request information, the address search device provided by the embodiment of the present invention extracts the sub-address information corresponding to the target address information from the address search request information, thereby improving the correct rate of extracting the target address information; and using the extracted sub-address
  • the information and the target address information are matched with different address information included in the address database, and the address information with the matching degree greater than the set threshold is used as the searched target address information, thereby effectively improving the correct rate and address of the address matching.
  • the precision of the search is compared to improve the correct rate and address of the address matching.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • FIG. 4 is a schematic flowchart diagram of an address search method according to Embodiment 4 of the present invention. The method can be as follows.
  • Step 401 Acquire address search request information.
  • the address search request information includes target address information to be searched.
  • step 401 the manner of obtaining the address search request information includes but is not limited to the following manners:
  • the input voice data is received, wherein the voice data is used to initiate an address search.
  • the method further includes:
  • the received voice data is identified to obtain text data corresponding to the voice data.
  • Step 402 Determine target address information to be searched included in the address search request information.
  • step 402 the target address information to be searched is obtained by:
  • For each of the determined keywords performing: in a text address dictionary corresponding to different address types set in advance, finding a text address dictionary containing the keyword; using the text for characterizing the found a string of the address type corresponding to the address dictionary, replacing the keyword;
  • the target address information to be searched is extracted in the address search request information based on the CRF algorithm.
  • the address information is hierarchical, that is, address information indicating an address area, for example, address information corresponding to the administrative area, for example, Beijing as an address information, then on the map, "Beijing" corresponds to An address area; there is also address information indicating a specific location, for example: XX Building, XX Road, XX Road, XX City, then the address information corresponds to the specific location of the XX Building.
  • the address information is divided into different address types according to different levels of the address information.
  • the address type includes one or more of the following combinations of information:
  • Administrative area information road name information, building/unit name information, subsidiary content of the road name, and subsidiary content of the building/unit name.
  • the address type corresponding to “XX City XX Area” is: administrative area information; the address type corresponding to “XX Road” is: Road name information; the address type corresponding to "XX” is: the subsidiary content of the road name; the address type corresponding to "XX Building” is: building/unit name information; the address type corresponding to "XX floor XX room” is: building / affiliated content of the unit name.
  • the ancillary content of the road name indicates that there is no meaning to leave the road name in front of it, and it is not possible to locate the specific address by the subsidiary content of the road name alone; similarly, the attachment content of the building/unit name It does not make sense to leave the name of the building/unit in front of it, and it is not possible to locate the specific address by the attachment of the building/unit name.
  • the pre-set text address dictionary corresponding to different address types includes at least: an administrative area dictionary, Specifically includes administrative area information, such as: provinces, cities, districts, counties, towns, townships, villages, states, alliances, flags, etc., such as: Beijing, Beijing, Shanghai, Shanghai, Shenzhen, Shenzhen, etc.; /Architectural word dictionary, including the name of the unit/architectural term, such as: police station, building, center, building, etc.; street name dictionary, including street names, such as: XX Road, XX Road, XX Station, etc.; Contains a stop word dictionary containing words or words that indicate termination in the language, such as: arrival, etc.; symbol dictionary, including punctuation; digital dictionary, including numbers.
  • an administrative area dictionary Specifically includes administrative area information, such as: provinces, cities, districts, counties, towns, townships, villages, states, alliances, flags, etc., such as: Beijing, Beijing, Shanghai, Shanghai, Shenzhen, Shenzhen
  • At least one keyword included in the address search request information is: to, Shanghai, Caoyang Road, Station, and Exit 4.
  • Step 1 In the pre-set text address dictionary corresponding to different address types, find a text address dictionary containing the keyword; replace the keyword with a character string for characterizing the address type corresponding to the found text address dictionary.
  • the character string of the address type corresponding to the stop word dictionary is SSS
  • the character string of the address type corresponding to the administrative area dictionary is AAA
  • the character string of the address type corresponding to the street noun dictionary is RRR
  • the character string of the address type corresponding to the digital dictionary is DDD
  • the string of the address type corresponding to the tail dictionary is OOO.
  • the string group obtained after the replacement may be: SSSAAARRRRRRDDDOOOOOO; or may be: SSSAAA city RRR road station DDDOOOOOO, where the text used to represent the address type in the replacement keyword is not specifically limited.
  • the second step using a regular expression to represent the address information, determining whether each of the keywords is replaced by the corresponding string, and the string group formed by the string represents the address information, and when determining that the string group represents the address information Determining, according to the string group, a quasi-address string corresponding to the target address information to be searched.
  • the regular expression uses a single character string to describe and match a string that conforms to a certain syntax rule.
  • the regular expression involved in the embodiment of the present invention describes address information.
  • a string group formed by replacing each keyword with a corresponding character string, and determining whether the string group represents address information by using a regular expression.
  • the "AAA City” and “RRR Road Station” may represent the address information by the judgment of the regular expression; the "SSS” may not represent the address information by the judgment of the regular expression.
  • the third step using the quasi-address string as a conditional random field CRF feature, extracting the target address information to be searched in the address search request information based on the CRF algorithm.
  • the quasi-address string is used as a conditional random field CRF feature
  • the CRF feature for extracting the target address information based on the CRF algorithm is not limited to the CRF feature of the quasi-address string, and may also include multiple The CRF feature
  • the CRF feature that extracts the target address information based on the CRF algorithm includes the quasi-address string as a conditional random field CRF feature.
  • the target address information extracted at this time is: Exit 4 of Shanghai Caoyang Road Station.
  • determining, according to the string group, a quasi-address string corresponding to the target address information to be searched specifically:
  • the character string group includes a plurality of character strings, and the keywords respectively replaced in the plurality of character strings are consecutively located in the address search request information;
  • the positional continuity of the plurality of character strings based on the replaced keywords is merged into a string group as a quasi-address string corresponding to the target address information to be searched;
  • the repeated string is removed, and at least one string after the repeated string is removed is merged into a string group based on the positional continuity of the replaced keyword as the target address to be searched.
  • the quasi-address string corresponding to the information.
  • Step 403 Split the target address information into at least one sub-address information, where the target address information is composed of a plurality of different sub-address information, and the plurality of different sub-address information respectively Corresponding to different address types.
  • step 403 sub-address information corresponding to different address types is split from the target address information according to an address type indicated by a text address dictionary corresponding to a different address type set in advance.
  • the sub-address information obtained by splitting "Shanghai Caoyang Road Station Exit 4" is: administrative area information: Shanghai; street name information: Caoyang Road Station; subsidiary information of street name: Exit 4.
  • Step 404 Match the at least one sub-address information or the at least one sub-address information and the target address information with different address information included in the address database; and match the obtained address information with a matching degree greater than a set threshold. , output as the searched target address information.
  • Each piece of address information stored in the address database includes different sub-address information constituting the address information.
  • step 404 first, an address information is selected from the address database to determine sub-address information included in the selected address information.
  • the at least one sub-address information included in the target address to be searched is matched with the sub-address information included in the selected address information to obtain a first matching degree, wherein the target to be searched for matching calculation is performed.
  • the address type of the subaddress information included in the address is the same as the address type of the subaddress information included in the selected address information.
  • the sub-address information included in the target address to be searched is matched with the sub-address information included in the selected address information to obtain a first matching degree, which specifically includes:
  • edit distance refers to the minimum number of edit operations required to convert one string to another between two strings.
  • editing operation refers to replacing one character with another character, or inserting one character, or deleting one character.
  • an address data selected from the address database is: XXX station in XXX area of XX city.
  • the first matching degree between the sub-address information of the same address type in the target address information to be searched and the sub-address information of the same address type in the selected address information is obtained.
  • the sub-address information and the selected address included in the target address information to be searched are calculated according to the obtained first matching degree. The first match between the subaddress information contained in the message.
  • the total matching degree of the selected address information and the target address information to be searched is obtained.
  • the obtained first matching degree is taken as the selected address information and the target to be searched The total matching of the address information.
  • the selected address information is continuously matched with the target address information to be searched to obtain a second matching degree.
  • the address information obtained by matching the matching degree that is greater than the set threshold is outputted as the searched target address information, and specifically includes:
  • the total matching degree of each selected address information and the target address information to be searched is obtained, and the total matching degree of the set number is sequentially determined according to the total matching degree from the largest to the smallest;
  • the determined total matching degree is respectively output corresponding to the selected address information as the searched target address information.
  • set similar threshold may be a system default parameter definition, or may be determined according to actual needs, and is not specifically limited herein.
  • the number of settings may be a system default parameter definition, or may be determined according to actual needs, and is not specifically limited herein.
  • the target address information is composed of a plurality of different sub-address information corresponding to different address types; the at least one sub-address information or the at least one sub-address information and the The target address information is matched with different address information included in the address database, where each piece of address information stored in the address database includes different sub-address information constituting the address information; and the matching degree obtained by the matching is greater than a set threshold.
  • the address information is output as the searched target address information.
  • the sub-address information corresponding to the target address information is extracted from the address search request information, and the correct rate of extracting the target address information is improved; and the extracted sub-address information and the extracted The target address information is matched with the different address information contained in the address database.
  • the address information with the matching degree greater than the set threshold is used as the searched target address information, which effectively improves the accuracy of the address matching and the accuracy of the address search.
  • embodiments of the present invention can be provided as a method, apparatus (device), or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Automation & Control Theory (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Navigation (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种地址搜索方法和设备,包括:获取地址搜索请求信息,并确定所述地址搜索请求信息中包含的待搜索的目标地址信息;将所述目标地址信息拆分为至少一个子地址信息,所述目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别对应不同的地址类型;将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,所述地址数据库中存储的每一条地址信息包含构成该地址信息的不同子地址信息;将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出,提升了提取目标地址信息的正确率;而且有效地提高了地址匹配的正确率及地址搜索的精度。

Description

一种地址搜索方法和设备
本申请要求在2014年9月30日提交中国专利局、申请号为201410525978.X、发明名称为“一种地址搜索方法和设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据处理技术领域,尤其涉及一种地址搜索方法和设备。
背景技术
随着通信技术和终端技术的发展,越来越多的应用依赖移动终端。尤其是导航类应用。例如:用户可以通过语音方式或者文字输入方式向移动终端发起导航请求,由导航类应用根据接收到的导航请求选择导航路线,并将选择的导航路线推送给用户。
具体地,导航类应用根据接收到的导航请求选择导航路线的方式包括但不限于:
当所述导航请求以文字形式输入时,采用CRF(英文:Condition Random Field;中文:条件随机场)算法,从接收到的导航请求中提取地址名称、建筑名称/单位名称,利用提取出的地址名称、建筑名称/单位名称确定导航请求的目标地址。
当所述导航请求以语音形式输入时,以该语音中包含的地址信息作为搜索依据,确定导航请求的目标地址。
存在的缺陷是:在确定目标地址时,单一利用从文字或者语音中提取出的地址名称(或者建筑名称/单位名称)和地址数据库中的地址信息进行匹配,使得确定出的目标地址集合中包含大量不相关的地址,降低了地址的搜索精度。
发明内容
有鉴于此,本发明实施例提供了一种地址搜索方法和设备,用于解决目前存在的在地址搜索过程中,搜索到大量不相关的地址,导致地址的搜索精度较低的问题。
根据本发明的第一方面,提供了一种地址搜索方法,包括:
获取地址搜索请求信息,并确定所述地址搜索请求信息中包含的待搜索的目标地址信息;
将所述目标地址信息拆分为至少一个子地址信息,所述目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别对应不同的地址类型;
将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,其中,所述地址数据库中存储的每一条地址信息包含构成该地址信息的不同子地址信息;
将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出。
结合本发明第一方面可能的实施方式,在第一种可能的方式中,获取地址搜索请求信息,包括:
接收输入的语音数据,其中,所述语音数据用以发起地址搜索;
对所述语音数据进行识别,得到所述语音数据中包含的待搜索的目标地址信息。
结合本发明第一方面可能的实施方式,或者结合本发明第一方面第一种可能的实施方式,在第二种可能的方式中,通过以下方式得到待搜索的目标地址信息:
确定所述地址搜索请求信息中包含的至少一个关键词;
针对确定的每一个关键词,执行:在预先设置的对应不同地址类型的文本地址词典中,找到包含该关键词的文本地址词典;利用用于表征找到的文 本地址词典对应的地址类型的字符串,替换该关键词;
利用用以表示地址信息的正则表达式,判断每一个关键词被替换为对应的字符串后构成的字符串组是否表示地址信息;
在确定所述字符串组表示地址信息时,根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串;
将所述准地址字符串作为一个条件随机场CRF特征,基于CRF算法在所述地址搜索请求信息中提取待搜索的目标地址信息。
结合本发明第一方面第二种可能的实施方式,在第三种可能的实施方式中,根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串,包括:
确定所述字符串组包含的字符串为多个、且在多个字符串分别替换的关键词在所述地址搜索请求信息中位置连续;
若多个字符串不存在重复时,将所述多个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串;
若多个字符串存在重复时,去除重复的字符串,并将去除重复的字符串后的至少一个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串。
结合本发明第一方面可能的实施方式,或者结合本发明第一方面第一种可能的实施方式,或者结合本发明第一方面第二种可能的实施方式,或者结合本发明第一方面第三种可能的实施方式,在第四种可能的方式中,将所述目标地址信息拆分为至少一个子地址信息,包括:
根据预先设置的对应不同地址类型的文本地址词典所表示的地址类型,从所述目标地址信息中拆分出对应不同地址类型的子地址信息。
结合本发明第一方面可能的实施方式,或者结合本发明第一方面第一种可能的实施方式,或者结合本发明第一方面第二种可能的实施方式,或者结 合本发明第一方面第三种可能的实施方式,或者结合本发明第一方面第四种可能的实施方式,在第五种可能的方式中,所述地址类型包括下述中的一种或多种信息组合:
行政区域信息、道路名称信息、建筑/单位名称信息、所述道路名称的附属内容、所述建筑/单位名称的附属内容。
结合本发明第一方面可能的实施方式,或者结合本发明第一方面第一种可能的实施方式,或者结合本发明第一方面第二种可能的实施方式,或者结合本发明第一方面第三种可能的实施方式,或者结合本发明第一方面第四种可能的实施方式,或者结合本发明第一方面第五种可能的实施方式,在第六种可能的方式中,将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,包括:将所述目标地址信息中的至少一个子地址信息分别与所述地址数据库中地址类型相同的子地址信息进行相应匹配。
结合本发明第一方面可能的实施方式,或者结合本发明第一方面第一种可能的实施方式,或者结合本发明第一方面第二种可能的实施方式,或者结合本发明第一方面第三种可能的实施方式,或者结合本发明第一方面第四种可能的实施方式,或者结合本发明第一方面第五种可能的实施方式,在第七种可能的方式中,将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,包括:
从地址数据库中选择一个地址信息,确定选择的地址信息中包含的子地址信息;
分别将待搜索的目标地址中包含的至少一个子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,其中,进行匹配计算的所述待搜索的目标地址中包含的子地址信息的地址类型与选择的地址信息中包含的子地址信息的地址类型相同;
根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的 总匹配度。
结合本发明第一方面第七种可能的实施方式,在第八种可能的方式中,根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度,包括:
将选择的地址信息与待搜索的目标地址信息进行匹配计算,得到第二匹配度;
根据所述第一匹配度和所述第二匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
结合本发明第一方面第六种可能的实施方式,或者结合本发明第一方面第七种可能的实施方式,或者结合本发明第一方面第八种可能的实施方式,在第九种可能的方式中,将待搜索的目标地址中包含的子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,包括:
针对待搜索的目标地址信息中包含的每一个子地址信息,执行:
针对该子地址信息,从选择的地址信息中查找出与该子地址信息属于同一地址类型的子地址信息;
计算将该子地址信息与查找到的子地址信息进行相互转换所需的编辑距离;
根据待搜索的目标地址信息中包含的每一个子地址信息分别对应得到的编辑距离,计算所述第一匹配度。
结合本发明第一方面第六种可能的实施方式,或者结合本发明第一方面第七种可能的实施方式,或者结合本发明第一方面第八种可能的实施方式,或者结合本发明第一方面第九种可能的实施方式,在第十种可能的方式中,将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出,包括:
根据计算得到每一次选择的地址信息与待搜索的目标地址信息的总匹配度,按照总匹配度从大到小的顺序,依次确定设定个数的总匹配度;
将确定的总匹配度分别对应选择的地址信息,作为搜索到的目标地址信息进行输出。
根据本发明的第二方面,提供了一种地址搜索设备,包括:
获取模块,用于获取地址搜索请求信息,并确定所述地址搜索请求信息中包含的待搜索的目标地址信息;
拆分模块,用于将所述获取模块确定的所述目标地址信息拆分为至少一个子地址信息,所述目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别对应不同的地址类型;
搜索模块,用于将所述拆分模块得到的所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,其中,所述地址数据库中存储的每一条地址信息包含构成该地址信息的不同子地址信息;将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出。
结合本发明第二方面可能的实施方式,在第一种可能的方式中,所述获取模块,具体用于接收输入的语音数据,其中,所述语音数据用以发起地址搜索;
对所述语音数据进行识别,得到所述语音数据中包含的待搜索的目标地址信息。
结合本发明第二方面可能的实施方式,或者结合本发明第二方面第一种可能的实施方式,在第二种可能的方式中,所述获取模块,具体用于通过以下方式得到待搜索的目标地址信息:
确定所述地址搜索请求信息中包含的至少一个关键词;
针对确定的每一个关键词,执行:在预先设置的对应不同地址类型的文本地址词典中,找到包含该关键词的文本地址词典;利用用于表征找到的文本地址词典对应的地址类型的字符串,替换该关键词;
利用用以表示地址信息的正则表达式,判断每一个关键词被替换为对应 的字符串后构成的字符串组是否表示地址信息;
在确定所述字符串组表示地址信息时,根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串;
将所述准地址字符串作为一个条件随机场CRF特征,基于CRF算法在所述地址搜索请求信息中提取待搜索的目标地址信息。
结合本发明第二方面第二种可能的实施方式,在第三种可能的实施方式中,所述获取模块,具体用于根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串,具体包括:
确定所述字符串组包含的字符串为多个、且在多个字符串分别替换的关键词在所述地址搜索请求信息中位置连续;
若多个字符串不存在重复时,将所述多个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串;
若多个字符串存在重复时,去除重复的字符串,并将去除重复的字符串后的至少一个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串。
结合本发明第二方面可能的实施方式,或者结合本发明第二方面第一种可能的实施方式,或者结合本发明第二方面第二种可能的实施方式,或者结合本发明第二方面第三种可能的实施方式,在第四种可能的方式中,所述拆分模块,具体用于根据预先设置的对应不同地址类型的文本地址词典所表示的地址类型,从所述目标地址信息中拆分出对应不同地址类型的子地址信息。
结合本发明第二方面可能的实施方式,或者结合本发明第二方面第一种可能的实施方式,或者结合本发明第二方面第二种可能的实施方式,或者结合本发明第二方面第三种可能的实施方式,或者结合本发明第二方面第四种可能的实施方式,在第五种可能的方式中,所述地址类型包括下述中的一种或多种信息组合:
行政区域信息、道路名称信息、建筑/单位名称信息、所述道路名称的附属内容、所述建筑/单位名称的附属内容。
结合本发明第二方面可能的实施方式,或者结合本发明第二方面第一种可能的实施方式,或者结合本发明第二方面第二种可能的实施方式,或者结合本发明第二方面第三种可能的实施方式,或者结合本发明第二方面第四种可能的实施方式,或者结合本发明第二方面第五种可能的实施方式,在第六种可能的方式中,所述搜索模块,具体用于将所述目标地址信息中的至少一个子地址信息分别与所述地址数据库中地址类型相同的子地址信息进行相应匹配。
结合本发明第二方面可能的实施方式,或者结合本发明第二方面第一种可能的实施方式,或者结合本发明第二方面第二种可能的实施方式,或者结合本发明第二方面第三种可能的实施方式,或者结合本发明第二方面第四种可能的实施方式,或者结合本发明第二方面第五种可能的实施方式,在第七种可能的方式中,所述搜索模块,具体用于从地址数据库中选择一个地址信息,确定选择的地址信息中包含的子地址信息;
分别将待搜索的目标地址中包含的至少一个子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,其中,进行匹配计算的所述待搜索的目标地址中包含的子地址信息的地址类型与选择的地址信息中包含的子地址信息的地址类型相同;
根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
结合本发明第二方面第七种可能的实施方式,在第八种可能的方式中,所述搜索模块,具体用于根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度,具体包括:
将选择的地址信息与待搜索的目标地址信息进行匹配计算,得到第二匹配度;
根据所述第一匹配度和所述第二匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
结合本发明第二方面第六种可能的实施方式,或者结合本发明第二方面第七种可能的实施方式,或者结合本发明第二方面第八种可能的实施方式,在第九种可能的方式中,所述搜索模块,具体用于将待搜索的目标地址中包含的子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,具体包括:
针对待搜索的目标地址信息中包含的每一个子地址信息,执行:
针对该子地址信息,从选择的地址信息中查找出与该子地址信息属于同一地址类型的子地址信息;
计算将该子地址信息与查找到的子地址信息进行相互转换所需的编辑距离;
根据待搜索的目标地址信息中包含的每一个子地址信息分别对应得到的编辑距离,计算所述第一匹配度。
结合本发明第二方面第六种可能的实施方式,或者结合本发明第二方面第七种可能的实施方式,或者结合本发明第二方面第八种可能的实施方式,或者结合本发明第二方面第九种可能的实施方式,在第十种可能的方式中,所述搜索模块,具体用于将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出,具体包括:
根据计算得到每一次选择的地址信息与待搜索的目标地址信息的总匹配度,按照总匹配度从大到小的顺序,依次确定设定个数的总匹配度;
将确定的总匹配度分别对应选择的地址信息,作为搜索到的目标地址信息进行输出。
根据本发明的第三方面,提供了一种地址搜索设备,包括:
信号接收器,用于获取地址搜索请求信息,并确定所述地址搜索请求信息中包含的待搜索的目标地址信息;
处理器,用于将所述目标地址信息拆分为至少一个子地址信息,所述目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别对应不同的地址类型;将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,其中,所述地址数据库中存储的每一条地址信息包含构成该地址信息的不同子地址信息;
将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出。
结合本发明第三方面可能的实施方式,在第一种可能的方式中,所述处理器,具体执行:
接收输入的语音数据,其中,所述语音数据用以发起地址搜索;
对所述语音数据进行识别,得到所述语音数据中包含的待搜索的目标地址信息。
结合本发明第三方面可能的实施方式,或者结合本发明第三方面第一种可能的实施方式,在第二种可能的方式中,所述处理器,具体执行:
通过以下方式得到待搜索的目标地址信息:
确定所述地址搜索请求信息中包含的至少一个关键词;
针对确定的每一个关键词,执行:在预先设置的对应不同地址类型的文本地址词典中,找到包含该关键词的文本地址词典;利用用于表征找到的文本地址词典对应的地址类型的字符串,替换该关键词;
利用用以表示地址信息的正则表达式,判断每一个关键词被替换为对应的字符串后构成的字符串组是否表示地址信息;
在确定所述字符串组表示地址信息时,根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串;
将所述准地址字符串作为一个条件随机场CRF特征,基于CRF算法在所述地址搜索请求信息中提取待搜索的目标地址信息。
结合本发明第三方面第二种可能的实施方式,在第三种可能的实施方式中,所述处理器,具体执行:
根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串,包括:
确定所述字符串组包含的字符串为多个、且在多个字符串分别替换的关键词在所述地址搜索请求信息中位置连续;
若多个字符串不存在重复时,将所述多个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串;
若多个字符串存在重复时,去除重复的字符串,并将去除重复的字符串后的至少一个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串。
结合本发明第三方面可能的实施方式,或者结合本发明第三方面第一种可能的实施方式,或者结合本发明第三方面第二种可能的实施方式,或者结合本发明第三方面第三种可能的实施方式,在第四种可能的方式中,所述处理器31,具体执行:
将所述目标地址信息拆分为至少一个子地址信息,包括:
根据预先设置的对应不同地址类型的文本地址词典所表示的地址类型,从所述目标地址信息中拆分出对应不同地址类型的子地址信息。
结合本发明第三方面可能的实施方式,或者结合本发明第三方面第一种可能的实施方式,或者结合本发明第三方面第二种可能的实施方式,或者结合本发明第三方面第三种可能的实施方式,或者结合本发明第三方面第四种可能的实施方式,在第五种可能的方式中,所述地址类型包括下述中的一种或多种信息组合:
行政区域信息、道路名称信息、建筑/单位名称信息、所述道路名称的附属内容、所述建筑/单位名称的附属内容。
结合本发明第三方面可能的实施方式,或者结合本发明第三方面第一种可能的实施方式,或者结合本发明第三方面第二种可能的实施方式,或者结合本发明第三方面第三种可能的实施方式,或者结合本发明第三方面第四种可能的实施方式,或者结合本发明第三方面第五种可能的实施方式,在第六种可能的方式中,所述处理器,具体执行:将所述目标地址信息中的至少一个子地址信息分别与所述地址数据库中地址类型相同的子地址信息进行相应匹配。
结合本发明第三方面可能的实施方式,或者结合本发明第三方面第一种可能的实施方式,或者结合本发明第三方面第二种可能的实施方式,或者结合本发明第三方面第三种可能的实施方式,或者结合本发明第三方面第四种可能的实施方式,或者结合本发明第三方面第五种可能的实施方式,在第七种可能的方式中,所述处理器,具体执行:
将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,包括:
从地址数据库中选择一个地址信息,确定选择的地址信息中包含的子地址信息;
分别将待搜索的目标地址中包含的至少一个子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,其中,进行匹配计算的所述待搜索的目标地址中包含的子地址信息的地址类型与选择的地址信息中包含的子地址信息的地址类型相同;
根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
结合本发明第三方面第七种可能的实施方式,在第八种可能的方式中,所述处理器,具体执行:
根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度,具体包括:
将选择的地址信息与待搜索的目标地址信息进行匹配计算,得到第二匹配度;
根据所述第一匹配度和所述第二匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
结合本发明第三方面第六种可能的实施方式,或者结合本发明第三方面第七种可能的实施方式,或者结合本发明第三方面第八种可能的实施方式,在第九种可能的方式中,所述处理器,具体执行:
将待搜索的目标地址中包含的子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,包括:
针对待搜索的目标地址信息中包含的每一个子地址信息,执行:
针对该子地址信息,从选择的地址信息中查找出与该子地址信息属于同一地址类型的子地址信息;
计算将该子地址信息与查找到的子地址信息进行相互转换所需的编辑距离;
根据待搜索的目标地址信息中包含的每一个子地址信息分别对应得到的编辑距离,计算所述第一匹配度。
结合本发明第三方面第六种可能的实施方式,或者结合本发明第三方面第七种可能的实施方式,或者结合本发明第三方面第八种可能的实施方式,或者结合本发明第三方面第九种可能的实施方式,在第十种可能的方式中,所述处理器,具体执行:
将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出,包括:
根据计算得到每一次选择的地址信息与待搜索的目标地址信息的总匹配度,按照总匹配度从大到小的顺序,依次确定设定个数的总匹配度;
将确定的总匹配度分别对应选择的地址信息,作为搜索到的目标地址信息进行输出。
本发明有益效果如下:
本发明实施例获取地址搜索请求信息,并确定所述地址搜索请求信息中包含的待搜索的目标地址信息;将所述目标地址信息拆分为至少一个子地址信息,所述目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别对应不同的地址类型;将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,其中,所述地址数据库中存储的每一条地址信息包含构成该地址信息的不同子地址信息;将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出。由于本发明实施例在获取地址搜索请求信息时,从地址搜索请求信息中提取出目标地址信息对应的子地址信息,提升了提取目标地址信息的正确率;利用提取出的子地址信息以及所述目标地址信息与地址数据库中包含的不同的地址信息进行匹配,进而将匹配度大于设定阈值的地址信息作为搜索到的目标地址信息,有效地提高了地址匹配的正确率以及地址搜索的精度。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例一提供的一种地址搜索系统的结构示意图;
图2为本发明实施例二提供的一种地址搜索设备的结构示意图;
图3为本发明实施例三提供的一种地址搜索设备的结构示意图;
图4为本发明实施例四提供的一种地址搜索方法的流程示意图。
具体实施方式
为了实现本发明的目的,本发明实施例提供了一种地址搜索方法和设备, 获取地址搜索请求信息,并确定所述地址搜索请求信息中包含的待搜索的目标地址信息;将所述目标地址信息拆分为至少一个子地址信息,所述目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别对应不同的地址类型;将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,其中,所述地址数据库中存储的每一条地址信息包含构成该地址信息的不同子地址信息;将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出。由于本发明实施例在获取地址搜索请求信息时,从地址搜索请求信息中提取出目标地址信息对应的子地址信息,提升了提取目标地址信息的正确率;利用提取出的子地址信息以及所述目标地址信息与地址数据库中包含的不同的地址信息进行匹配,进而将匹配度大于设定阈值的地址信息作为搜索到的目标地址信息,有效地提高了地址匹配的正确率以及地址搜索的精度。
下面结合说明书附图对本发明各个实施例作进一步地详细描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
实施例一:
如图1所示,为本发明实施例一提供的一种地址搜索系统的结构示意图。所述地址搜索系统包括:接收设备11、地址提取设备12和地址匹配设备13。
所述接收设备11,用于获取地址搜索请求信息。
所述地址提取设备12,用于确定所述地址搜索请求信息中包含的待搜索的目标地址信息,并将所述目标地址信息拆分为至少一个子地址信息,所述目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别对应不同的地址类型。
具体地,所述接收设备11,具体用于接收输入的文本数据,其中,所述 文本数据用以发起地址搜索。
所述地址提取设备12,具体用于对所述文本数据进行识别,得到所述文本数据中包含的待搜索的目标地址信息。
例如:接收输入的文本数据为“上海市曹杨路站4号出口”,由此可以确定所述文本数据中包含的待搜索的目标地址信息为:上海市曹杨路站4号出口。
所述接收设备11,具体用于接收输入的语音数据,其中,所述语音数据用以发起地址搜索。
所述地址提取设备12,具体用于对所述语音数据进行识别,得到所述语音数据中包含的待搜索的目标地址信息。
例如:接收输入的语音数据为“到上海市曹杨路站4号出口”,由此可以确定所述语音数据中包含的待搜索的目标地址信息为:上海市曹杨路站4号出口。
也就是说,所述地址搜索请求信息中除了包含待搜索的目标地址信息之外,还可以根据语言习惯包含其他辅助信息,例如:“到”、“去”等文字信息。
所述地址提取设备12,具体用于通过以下方式得到待搜索的目标地址信息:
确定所述地址搜索请求信息中包含的至少一个关键词;
针对确定的每一个关键词,执行:在预先设置的对应不同地址类型的文本地址词典中,找到包含该关键词的文本地址词典;利用用于表征找到的文本地址词典对应的地址类型的字符串,替换该关键词;
利用用以表示地址信息的正则表达式,判断每一个关键词被替换为对应的字符串后构成的字符串组是否表示地址信息;
在确定所述字符串组表示地址信息时,根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串;
将所述准地址字符串作为一个条件随机场CRF特征,基于CRF算法在所 述地址搜索请求信息中提取待搜索的目标地址信息。
需要说明的是,由于地址信息是分层级的,即有表示一个地址区域的地址信息,例如:行政区域对应的地址信息,例如:北京市作为一个地址信息,那么在地图上,“北京市”对应一个地址区域;还有表示一个具体位置的地址信息,例如:XX市XX区XX路XX号XX大厦,那么这条地址信息对应的就是XX大厦所在的具体位置。
为此,根据地址信息的不同层级,将地址信息划分成不同的地址类型。所述地址类型包括下述中的一种或多种信息组合:
行政区域信息、道路名称信息、建筑/单位名称信息、所述道路名称的附属内容、所述建筑/单位名称的附属内容。
例如:对于“XX市XX区XX路XX号XX大厦XX层XX室”这一条地址信息,“XX市XX区”对应的地址类型为:行政区域信息;“XX路”对应的地址类型为:道路名称信息;“XX号”对应的地址类型为:道路名称的附属内容;“XX大厦”对应的地址类型为:建筑/单位名称信息;“XX层XX室”对应的地址类型为:建筑/单位名称的附属内容。
需要说明的是,道路名称的附属内容说明离开了其前面的道路名称是没有任何意义的,无法单一地通过道路名称的附属内容定位到具体的地址;同样地,建筑/单位名称的附属内容说明离开了其前面的建筑/单位名称也是没有任何意义的,无法单一地通过建筑/单位名称的附属内容定位到具体的地址。
预先设置的对应不同地址类型的文本地址词典至少包括:行政区域词典,具体包含了行政区域信息,例如:省、市、区、县、镇、乡、村、州、盟、旗等,具体例如:北京市、北京、上海市、上海、深圳市、深圳等;单位/建筑尾部词词典,包含了单位/建筑尾部词名称,例如:派出所、大厦、中心、大楼等;街道名称词典,包含了街道名称,例如:XX路、XX道、XX站等;此外,还包含了停止词词典,包含的语言中表示终止的字或词,例如:到达等;符号词典,包括标点符号;数字词典,包括数字等。
例如:所述地址搜索请求信息中包含的至少一个关键字为:到、上海市、曹杨路、站、4号出口。
此时,针对得到的每一个关建字执行:
第一步:在预先设置的对应不同地址类型的文本地址词典中,找到包含该关键词的文本地址词典;利用用于表征找到的文本地址词典对应的地址类型的字符串,替换该关键词。
例如:“到”属于停止词词典;“上海市”属于行政区域词典;“曹杨路”和“站”属于街道名称词典;“4号出口”中“4”属于数字词典;“4号出口”中“号”和“出口”属于尾部词典。
而停止词词典对应的地址类型的字符串为SSS,行政区域词典对应的地址类型的字符串为AAA,街道名词词典对应的地址类型的字符串为RRR,数字词典对应的地址类型的字符串为DDD,尾部词典对应的地址类型的字符串为OOO。
那么替换后得到的字符串组可以为:SSSAAARRRRRRDDDOOOOOO;也可以为:SSSAAA市RRR路站DDDOOOOOO,这里对于是否替换关键字中用于表示地址类型的文字不做具体限定。
第二步:利用用以表示地址信息的正则表达式,判断每一个关键词被替换为对应的字符串后构成的字符串组是否表示地址信息,并在确定所述字符串组表示地址信息时,根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串。
需要说明的是,正则表达式使用单个字符串来描述、匹配符合某个句法规则的字符串,本发明实施例所涉及的正则表达式描述的是地址信息。
例如:对于每一个关键词被替换为对应的字符串后构成的字符串组,通过正则表达式,判断该字符串组是否表示地址信息。
“AAA市”、“RRR路站”通过正则表达式的判断可能表示地址信息;“SSS”通过正则表达式的判断可能不表示地址信息。
第三步:将所述准地址字符串作为一个条件随机场CRF特征,基于CRF算法在所述地址搜索请求信息中提取待搜索的目标地址信息。
需要说明的是,将所述准地址字符串作为一个条件随机场CRF特征,但是在基于CRF算法提取目标地址信息的CRF特征不限于所述准地址字符串这么一个,还可以包含多个,但是基于CRF算法提取目标地址信息的CRF特征中包含所述准地址字符串作为的一个条件随机场CRF特征。
此时提取得到的目标地址信息为:上海市曹杨路站4号出口。
所述地址提取设备12,具体用于根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串,具体包括:
确定所述字符串组包含的字符串为多个、且在多个字符串分别替换的关键词在所述地址搜索请求信息中位置连续;
若多个字符串不存在重复时,将所述多个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串;
若多个字符串存在重复时,去除重复的字符串,并将去除重复的字符串后的至少一个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串。
例如:得到的多个字符串:AAA市、RRR路站、DDD,不存在重复,则合并得到一个准地址字符串:AAA市RRR路站DDD。
得到的多个字符串:AAA、RRR、RRR、DDD、OOO、OOO,存在重复的字符串,则合并得到的一个准地址字符串:AAARRRDDDOOO。
所述地址提取设备12,用于将所述目标地址信息拆分为至少一个子地址信息,所述目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别对应不同的地址类型。
具体地,所述地址提取设备12,具体用于根据预先设置的对应不同地址类型的文本地址词典所表示的地址类型,从所述目标地址信息中拆分出对应 不同地址类型的子地址信息。
例如:将“上海市曹杨路站4号出口”拆分得到的子地址信息为:行政区域信息:上海市;街道名称信息:曹杨路站;街道名称的附属信息:4号出口。
所述地址匹配设备13,用于将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,其中,所述地址数据库中存储的每一条地址信息包含构成该地址信息的不同子地址信息;将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出。
所述地址匹配设备13,具体用于将所述目标地址信息中的至少一个子地址信息分别与所述地址数据库中地址类型相同的子地址信息进行相应匹配。
所述地址匹配设备13,具体用于从地址数据库中选择一个地址信息,确定选择的地址信息中包含的子地址信息;分别将待搜索的目标地址中包含的至少一个子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,其中,进行匹配计算的所述待搜索的目标地址中包含的子地址信息的地址类型与选择的地址信息中包含的子地址信息的地址类型相同;根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
也就是说,将所述目标地址信息中的至少一个子地址信息分别与所述地址数据库中地址类型相同的子地址信息进行相应匹配,所述地址匹配设备13以子地址信息对应的地址类型为粒度,依次将待搜索的目标地址信息中表示一种地址类型的子地址信息与选择的地址信息中表示同一种地址类型的子地址信息进行匹配计算,计算得到子地址信息对应的第一匹配度。
此时,根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
可选地,在此基础上,计算待搜索的目标地址信息与选择的地址信息进行匹配计算,得到第二匹配度。
通过这两步匹配度的计算,可以有效排除掉通过第一匹配度获取的地址信息中仅仅包含了待搜索的目标地址信息一个或者少数个子地址信息的地址信息,例如:地址信息中仅包含:4号出口的地址信息。
根据所述第一匹配度和所述第二匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
所述地址匹配设备13,具体用于将待搜索的目标地址中包含的子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,具体包括:
针对待搜索的目标地址信息中包含的每一个子地址信息,执行:
针对该子地址信息,从选择的地址信息中查找出与该子地址信息属于同一地址类型的子地址信息;
计算将该子地址信息与查找到的子地址信息进行相互转换所需的编辑距离;
根据待搜索的目标地址信息中包含的每一个子地址信息分别对应得到的编辑距离,计算所述第一匹配度。
需要说明的是,编辑距离是指两个字符串之间,由一个字符串转换成另一个字符串所需的最少编辑操作次数。所谓编辑操作是指一个字符替换成另一个字符,或者插入一个字符,或者删除一个字符等。
所述地址匹配设备13,具体用于根据待搜索的目标地址信息中包含的每一个子地址信息分别对应得到的编辑距离,计算所述第一匹配度,具体包括:
将待搜索的目标地址信息中包含的每一个子地址信息分别对应得到的编辑距离进行求和,得到的和值即为计算所述第一匹配度。
所述地址匹配设备13,具体用于将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出,具体包括:
根据计算得到每一次选择的地址信息与待搜索的目标地址信息的总匹配度,按照总匹配度从大到小的顺序,依次确定设定个数的总匹配度;
将确定的总匹配度分别对应选择的地址信息,作为搜索到的目标地址信息进行输出。
需要说明的是,设定的相似阈值可以是系统默认参数定义,还可以是根据实际需要确定,这里不做具体限定。
需要说明的是,设定个数可以是系统默认参数定义,还可以是根据实际需要确定,这里不做具体限定。
通过本发明实施例提供的地址搜索系统,在获取地址搜索请求信息时,从地址搜索请求信息中提取出目标地址信息对应的子地址信息,提升了提取目标地址信息的正确率;利用提取出的子地址信息以及所述目标地址信息与地址数据库中包含的不同的地址信息进行匹配,进而将匹配度大于设定阈值的地址信息作为搜索到的目标地址信息,有效地提高了地址匹配的正确率以及地址搜索的精度。
实施例二:
如图2所示,为本发明实施例二提供的一种地址搜索设备的结构示意图。所述地址搜索设备包括:获取模块21、拆分模块22和搜索模块23,其中:
获取模块21,用于获取地址搜索请求信息,并确定所述地址搜索请求信息中包含的待搜索的目标地址信息;
拆分模块22,用于将所述获取模块确定的所述目标地址信息拆分为至少一个子地址信息,所述目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别对应不同的地址类型;
搜索模块23,用于将所述拆分模块得到的所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,其中,所述地址数据库中存储的每一条地址信息包含构成该地址信息的不同子地址信息;将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出。
具体地,所述获取模块21,具体用于接收输入的语音数据,其中,所述 语音数据用以发起地址搜索;
对所述语音数据进行识别,得到所述语音数据中包含的待搜索的目标地址信息。
所述获取模块21,具体用于通过以下方式得到待搜索的目标地址信息:
确定所述地址搜索请求信息中包含的至少一个关键词;
针对确定的每一个关键词,执行:在预先设置的对应不同地址类型的文本地址词典中,找到包含该关键词的文本地址词典;利用用于表征找到的文本地址词典对应的地址类型的字符串,替换该关键词;
利用用以表示地址信息的正则表达式,判断每一个关键词被替换为对应的字符串后构成的字符串组是否表示地址信息;
在确定所述字符串组表示地址信息时,根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串;
将所述准地址字符串作为一个条件随机场CRF特征,基于CRF算法在所述地址搜索请求信息中提取待搜索的目标地址信息。
所述获取模块21,具体用于根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串,具体包括:
确定所述字符串组包含的字符串为多个、且在多个字符串分别替换的关键词在所述地址搜索请求信息中位置连续;
若多个字符串不存在重复时,将所述多个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串;
若多个字符串存在重复时,去除重复的字符串,并将去除重复的字符串后的至少一个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串。
所述拆分模块22,具体用于根据预先设置的对应不同地址类型的文本地址词典所表示的地址类型,从所述目标地址信息中拆分出对应不同地址类型 的子地址信息。
所述地址类型包括下述中的一种或多种信息组合:
行政区域信息、道路名称信息、建筑/单位名称信息、所述道路名称的附属内容、所述建筑/单位名称的附属内容。
所述搜索模块23,具体用于将所述目标地址信息中的至少一个子地址信息分别与所述地址数据库中地址类型相同的子地址信息进行相应匹配。
所述搜索模块23,具体用于从地址数据库中选择一个地址信息,确定选择的地址信息中包含的子地址信息;
分别将待搜索的目标地址中包含的至少一个子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,其中,进行匹配计算的所述待搜索的目标地址中包含的子地址信息的地址类型与选择的地址信息中包含的子地址信息的地址类型相同;
根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
具体地,所述搜索模块23,具体用于根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度,具体包括:
将选择的地址信息与待搜索的目标地址信息进行匹配计算,得到第二匹配度;
根据所述第一匹配度和所述第二匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
所述搜索模块23,具体用于将待搜索的目标地址中包含的子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,具体包括:
针对待搜索的目标地址信息中包含的每一个子地址信息,执行:
针对该子地址信息,从选择的地址信息中查找出与该子地址信息属于同一地址类型的子地址信息;
计算将该子地址信息与查找到的子地址信息进行相互转换所需的编辑距离;
根据待搜索的目标地址信息中包含的每一个子地址信息分别对应得到的编辑距离,计算所述第一匹配度。
所述搜索模块23,具体用于将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出,具体包括:
根据计算得到每一次选择的地址信息与待搜索的目标地址信息的总匹配度,按照总匹配度从大到小的顺序,依次确定设定个数的总匹配度;
将确定的总匹配度分别对应选择的地址信息,作为搜索到的目标地址信息进行输出。
需要说明的是,本发明实施例所述的地址搜索设备可以通过硬件方式实现,也可以通过软件方式实现,对于实现方式这里不做限定。
地址搜索设备在获取地址搜索请求信息时,从地址搜索请求信息中提取出目标地址信息对应的子地址信息,提升了提取目标地址信息的正确率;利用提取出的子地址信息以及所述目标地址信息与地址数据库中包含的不同的地址信息进行匹配,进而将匹配度大于设定阈值的地址信息作为搜索到的目标地址信息,有效地提高了地址匹配的正确率以及地址搜索的精度。
实施例三
如图3所示,为本发明实施例三提供的一种地址搜索设备的结构示意图。所述地址搜索设备具备了本发明实施例四所述的功能。所述地址搜索设备可以采用通用计算机系统结构,计算机系统可具体是基于处理器的计算机。所述地址搜索设备包含了至少一个处理器31和信号接收器32。其中,处理器31和信号接收器32之间通过通信总线33连接。
信号接收器32,用于获取地址搜索请求信息,并确定所述地址搜索请求信息中包含的待搜索的目标地址信息;
处理器31,用于将所述目标地址信息拆分为至少一个子地址信息,所述 目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别对应不同的地址类型;将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,其中,所述地址数据库中存储的每一条地址信息包含构成该地址信息的不同子地址信息;
将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出。
在一种可能的实现方式中,所述处理器31,具体执行:
接收输入的语音数据,其中,所述语音数据用以发起地址搜索;
对所述语音数据进行识别,得到所述语音数据中包含的待搜索的目标地址信息。
在一种可能的实现方式中,所述处理器31,具体执行:
通过以下方式得到待搜索的目标地址信息:
确定所述地址搜索请求信息中包含的至少一个关键词;
针对确定的每一个关键词,执行:在预先设置的对应不同地址类型的文本地址词典中,找到包含该关键词的文本地址词典;利用用于表征找到的文本地址词典对应的地址类型的字符串,替换该关键词;
利用用以表示地址信息的正则表达式,判断每一个关键词被替换为对应的字符串后构成的字符串组是否表示地址信息;
在确定所述字符串组表示地址信息时,根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串;
将所述准地址字符串作为一个条件随机场CRF特征,基于CRF算法在所述地址搜索请求信息中提取待搜索的目标地址信息。
在一种可能的实现方式中,所述处理器31,具体执行:
根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串,包括:
确定所述字符串组包含的字符串为多个、且在多个字符串分别替换的关键词在所述地址搜索请求信息中位置连续;
若多个字符串不存在重复时,将所述多个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串;
若多个字符串存在重复时,去除重复的字符串,并将去除重复的字符串后的至少一个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串。
在一种可能的实现方式中,所述处理器31,具体执行:
将所述目标地址信息拆分为至少一个子地址信息,包括:
根据预先设置的对应不同地址类型的文本地址词典所表示的地址类型,从所述目标地址信息中拆分出对应不同地址类型的子地址信息。
在一种可能的实现方式中,所述地址类型包括下述中的一种或多种信息组合:
行政区域信息、道路名称信息、建筑/单位名称信息、所述道路名称的附属内容、所述建筑/单位名称的附属内容。
在一种可能的实现方式中,所述处理器31,具体执行:将所述目标地址信息中的至少一个子地址信息分别与所述地址数据库中地址类型相同的子地址信息进行相应匹配。
在一种可能的实现方式中,所述处理器31,具体执行:
将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,包括:
从地址数据库中选择一个地址信息,确定选择的地址信息中包含的子地址信息;
分别将待搜索的目标地址中包含的至少一个子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,其中,进行匹配计 算的所述待搜索的目标地址中包含的子地址信息的地址类型与选择的地址信息中包含的子地址信息的地址类型相同;
根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
在一种可能的实现方式中,所述处理器31,具体执行:
根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度,具体包括:
将选择的地址信息与待搜索的目标地址信息进行匹配计算,得到第二匹配度;
根据所述第一匹配度和所述第二匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
在一种可能的实现方式中,所述处理器31,具体执行:
将待搜索的目标地址中包含的子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,包括:
针对待搜索的目标地址信息中包含的每一个子地址信息,执行:
针对该子地址信息,从选择的地址信息中查找出与该子地址信息属于同一地址类型的子地址信息;
计算将该子地址信息与查找到的子地址信息进行相互转换所需的编辑距离;
根据待搜索的目标地址信息中包含的每一个子地址信息分别对应得到的编辑距离,计算所述第一匹配度。
在一种可能的实现方式中,所述处理器31,具体执行:
将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出,包括:
根据计算得到每一次选择的地址信息与待搜索的目标地址信息的总匹配度,按照总匹配度从大到小的顺序,依次确定设定个数的总匹配度;
将确定的总匹配度分别对应选择的地址信息,作为搜索到的目标地址信息进行输出。
其中,处理器31可以是一个通用中央处理器(CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本发明方案程序执行的集成电路。
本发明实施例提供的地址搜索设备在获取地址搜索请求信息时,从地址搜索请求信息中提取出目标地址信息对应的子地址信息,提升了提取目标地址信息的正确率;利用提取出的子地址信息以及所述目标地址信息与地址数据库中包含的不同的地址信息进行匹配,进而将匹配度大于设定阈值的地址信息作为搜索到的目标地址信息,有效地提高了地址匹配的正确率以及地址搜索的精度。
实施例四:
如图4所示,为本发明实施例四提供的一种地址搜索方法的流程示意图。所述方法可以如下所示。
步骤401:获取地址搜索请求信息。
其中,所述地址搜索请求信息中包含了待搜索的目标地址信息。
在步骤401中,获取地址搜索请求信息的方式包括但不限于以下方式:
接收输入的文本数据,其中,所述文本数据用以发起地址搜索;或,
接收输入的语音数据,其中,所述语音数据用以发起地址搜索。
需要说明的是,若地址搜索请求信息为语音数据,那么所述方法还包括:
将接收到的语音数据进行识别,得到该语音数据对应的文字数据。
步骤402:确定所述地址搜索请求信息中包含的待搜索的目标地址信息。
在步骤402中,通过以下方式得到待搜索的目标地址信息:
确定所述地址搜索请求信息中包含的至少一个关键词;
针对确定的每一个关键词,执行:在预先设置的对应不同地址类型的文本地址词典中,找到包含该关键词的文本地址词典;利用用于表征找到的文 本地址词典对应的地址类型的字符串,替换该关键词;
利用用以表示地址信息的正则表达式,判断每一个关键词被替换为对应的字符串后构成的字符串组是否表示地址信息;
在确定所述字符串组表示地址信息时,根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串;
将所述准地址字符串作为一个条件随机场CRF特征,基于CRF算法在所述地址搜索请求信息中提取待搜索的目标地址信息。
具体地,由于地址信息是分层级的,即有表示一个地址区域的地址信息,例如:行政区域对应的地址信息,例如:北京市作为一个地址信息,那么在地图上,“北京市”对应一个地址区域;还有表示一个具体位置的地址信息,例如:XX市XX区XX路XX号XX大厦,那么这条地址信息对应的就是XX大厦所在的具体位置。
为此,根据地址信息的不同层级,将地址信息划分成不同的地址类型。所述地址类型包括下述中的一种或多种信息组合:
行政区域信息、道路名称信息、建筑/单位名称信息、所述道路名称的附属内容、所述建筑/单位名称的附属内容。
例如:对于“XX市XX区XX路XX号XX大厦XX层XX室”这一条地址信息,“XX市XX区”对应的地址类型为:行政区域信息;“XX路”对应的地址类型为:道路名称信息;“XX号”对应的地址类型为:道路名称的附属内容;“XX大厦”对应的地址类型为:建筑/单位名称信息;“XX层XX室”对应的地址类型为:建筑/单位名称的附属内容。
需要说明的是,道路名称的附属内容说明离开了其前面的道路名称是没有任何意义的,无法单一地通过道路名称的附属内容定位到具体的地址;同样地,建筑/单位名称的附属内容说明离开了其前面的建筑/单位名称也是没有任何意义的,无法单一地通过建筑/单位名称的附属内容定位到具体的地址。
预先设置的对应不同地址类型的文本地址词典至少包括:行政区域词典, 具体包含了行政区域信息,例如:省、市、区、县、镇、乡、村、州、盟、旗等,具体例如:北京市、北京、上海市、上海、深圳市、深圳等;单位/建筑尾部词词典,包含了单位/建筑尾部词名称,例如:派出所、大厦、中心、大楼等;街道名称词典,包含了街道名称,例如:XX路、XX道、XX站等;此外,还包含了停止词词典,包含的语言中表示终止的字或词,例如:到达等;符号词典,包括标点符号;数字词典,包括数字等。
例如:所述地址搜索请求信息中包含的至少一个关键字为:到、上海市、曹杨路、站、4号出口。
此时,针对得到的每一个关建字执行:
第一步:在预先设置的对应不同地址类型的文本地址词典中,找到包含该关键词的文本地址词典;利用用于表征找到的文本地址词典对应的地址类型的字符串,替换该关键词。
例如:“到”属于停止词词典;“上海市”属于行政区域词典;“曹杨路”和“站”属于街道名称词典;“4号出口”中“4”属于数字词典;“4号出口”中“号”和“出口”属于尾部词典。
而停止词词典对应的地址类型的字符串为SSS,行政区域词典对应的地址类型的字符串为AAA,街道名词词典对应的地址类型的字符串为RRR,数字词典对应的地址类型的字符串为DDD,尾部词典对应的地址类型的字符串为OOO。
那么替换后得到的字符串组可以为:SSSAAARRRRRRDDDOOOOOO;也可以为:SSSAAA市RRR路站DDDOOOOOO,这里对于是否替换关键字中用于表示地址类型的文字不做具体限定。
第二步:利用用以表示地址信息的正则表达式,判断每一个关键词被替换为对应的字符串后构成的字符串组是否表示地址信息,并在确定所述字符串组表示地址信息时,根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串。
需要说明的是,正则表达式使用单个字符串来描述、匹配符合某个句法规则的字符串,本发明实施例所涉及的正则表达式描述的是地址信息。
例如:对于每一个关键词被替换为对应的字符串后构成的字符串组,通过正则表达式,判断该字符串组是否表示地址信息。
“AAA市”、“RRR路站”通过正则表达式的判断可能表示地址信息;“SSS”通过正则表达式的判断可能不表示地址信息。
第三步:将所述准地址字符串作为一个条件随机场CRF特征,基于CRF算法在所述地址搜索请求信息中提取待搜索的目标地址信息。
需要说明的是,将所述准地址字符串作为一个条件随机场CRF特征,但是在基于CRF算法提取目标地址信息的CRF特征不限于所述准地址字符串这么一个CRF特征,还可以包含多个CRF特征,但是基于CRF算法提取目标地址信息的CRF特征中包含所述准地址字符串作为的一个条件随机场CRF特征。
此时提取得到的目标地址信息为:上海市曹杨路站4号出口。
可选地,根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串,具体包括:
确定所述字符串组包含的字符串为多个、且在多个字符串分别替换的关键词在所述地址搜索请求信息中位置连续;
若多个字符串不存在重复时,将所述多个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串;
若多个字符串存在重复时,去除重复的字符串,并将去除重复的字符串后的至少一个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串。
步骤403:将所述目标地址信息拆分为至少一个子地址信息,所述目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别 对应不同的地址类型。
在步骤403中,根据预先设置的对应不同地址类型的文本地址词典所表示的地址类型,从所述目标地址信息中拆分出对应不同地址类型的子地址信息。
例如:将“上海市曹杨路站4号出口”拆分得到的子地址信息为:行政区域信息:上海市;街道名称信息:曹杨路站;街道名称的附属信息:4号出口。
步骤404:将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配;将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出。
其中,所述地址数据库中存储的每一条地址信息包含构成该地址信息的不同子地址信息。
在步骤404中,首先,从地址数据库中选择一个地址信息,确定选择的地址信息中包含的子地址信息。
其次,分别将待搜索的目标地址中包含的至少一个子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,其中,进行匹配计算的所述待搜索的目标地址中包含的子地址信息的地址类型与选择的地址信息中包含的子地址信息的地址类型相同。
具体地,将待搜索的目标地址中包含的子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,具体包括:
针对待搜索的目标地址信息中包含的每一个子地址信息,执行:
针对该子地址信息,从选择的地址信息中查找出与该子地址信息属于同一地址类型的子地址信息;
计算将该子地址信息与查找到的子地址信息进行相互转换所需的编辑距离;
根据待搜索的目标地址信息中包含的每一个子地址信息分别对应得到的 编辑距离,计算所述第一匹配度。
需要说明的是,编辑距离是指两个字符串之间,由一个字符串转换成另一个字符串所需的最少编辑操作次数。所谓编辑操作是指一个字符替换成另一个字符,或者插入一个字符,或者删除一个字符等。
假设从地址数据库中选择的一条地址数据为:XX市XXX区XXX站。此时,从待搜索的目标地址信息“上海市曹杨路站4号出口”中选择一个子地址信息“上海市”,确定其对应的地址类型为行政区域,那么从“XX市XXX区XXX站”查找出表示行政区域的子地址信息“XX市”,计算“XX市”与“上海市”进行相互转换所需的编辑距离。
若“XX市”为上海市,那么“XX市”与“上海市”进行相互转换所需的编辑距离为0;若“XX市”为北京市,那么“XX市”与“上海市”进行相互转换所需的编辑距离为2。
在确定编辑距离之后,根据计算得到的编辑距离,得到待搜索的目标地址信息中该子地址信息与选择的地址信息中相同地址类型的子地址信息之间的第一匹配度。
例如:第一匹配度=1-编辑距离/MAX(待搜索的目标地址信息中该子地址信息对应的字符串个数,选择的地址信息中相同地址类型的子地址信息对应的字符串个数);或者,第一匹配度=编辑距离/MAX(待搜索的目标地址信息中该子地址信息对应的字符串个数,选择的地址信息中相同地址类型的子地址信息对应的字符串个数)。
在得到待搜索的目标地址信息中包含的每一个子地址信息对应的第一匹配度时,根据得到的第一匹配度,计算得到待搜索的目标地址信息中包含的子地址信息与选择的地址信息中包含的子地址信息之间的第一匹配度。
最后,根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
一种方式,将得到的第一匹配度作为选择的地址信息与待搜索的目标地 址信息的总匹配度。
另一种方式,继续将选择的地址信息与待搜索的目标地址信息进行匹配计算,得到第二匹配度。
根据所述第一匹配度和所述第二匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
具体地,将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出,具体包括:
根据计算得到每一次选择的地址信息与待搜索的目标地址信息的总匹配度,按照总匹配度从大到小的顺序,依次确定设定个数的总匹配度;
将确定的总匹配度分别对应选择的地址信息,作为搜索到的目标地址信息进行输出。
需要说明的是,设定的相似阈值可以是系统默认参数定义,还可以是根据实际需要确定,这里不做具体限定。
需要说明的是,设定个数可以是系统默认参数定义,还可以是根据实际需要确定,这里不做具体限定。
通过本发明实施例四的方案,获取地址搜索请求信息,并确定所述地址搜索请求信息中包含的待搜索的目标地址信息;将所述目标地址信息拆分为至少一个子地址信息,所述目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别对应不同的地址类型;将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,其中,所述地址数据库中存储的每一条地址信息包含构成该地址信息的不同子地址信息;将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出。由于本发明实施例在获取地址搜索请求信息时,从地址搜索请求信息中提取出目标地址信息对应的子地址信息,提升了提取目标地址信息的正确率;利用提取出的子地址信息以及所述目标地址信息与地址数据库中包含的不同的地址信息进行匹配,进 而将匹配度大于设定阈值的地址信息作为搜索到的目标地址信息,有效地提高了地址匹配的正确率以及地址搜索的精度。
本领域的技术人员应明白,本发明的实施例可提供为方法、装置(设备)、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、装置(设备)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本 发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (22)

  1. 一种地址搜索方法,其特征在于,包括:
    获取地址搜索请求信息,并确定所述地址搜索请求信息中包含的待搜索的目标地址信息;
    将所述目标地址信息拆分为至少一个子地址信息,所述目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别对应不同的地址类型;
    将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,其中,所述地址数据库中存储的每一条地址信息包含构成该地址信息的不同子地址信息;
    将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出。
  2. 如权利要求1所述的地址搜索方法,其特征在于,获取地址搜索请求信息,包括:
    接收输入的语音数据,其中,所述语音数据用以发起地址搜索;
    对所述语音数据进行识别,得到所述语音数据中包含的待搜索的目标地址信息。
  3. 如权利要求1或2所述的地址搜索方法,其特征在于,通过以下方式得到待搜索的目标地址信息:
    确定所述地址搜索请求信息中包含的至少一个关键词;
    针对确定的每一个关键词,执行:在预先设置的对应不同地址类型的文本地址词典中,找到包含该关键词的文本地址词典;利用用于表征找到的文本地址词典对应的地址类型的字符串,替换该关键词;
    利用用以表示地址信息的正则表达式,判断每一个关键词被替换为对应的字符串后构成的字符串组是否表示地址信息;
    在确定所述字符串组表示地址信息时,根据所述字符串组确定待搜索的目 标地址信息对应的准地址字符串;
    将所述准地址字符串作为一个条件随机场CRF特征,基于CRF算法在所述地址搜索请求信息中提取待搜索的目标地址信息。
  4. 如权利要求3所述的地址搜索方法,其特征在于,根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串,包括:
    确定所述字符串组包含的字符串为多个、且在多个字符串分别替换的关键词在所述地址搜索请求信息中位置连续;
    若多个字符串不存在重复时,将所述多个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串;
    若多个字符串存在重复时,去除重复的字符串,并将去除重复的字符串后的至少一个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串。
  5. 如权利要求1至4任一所述的地址搜索方法,其特征在于,将所述目标地址信息拆分为至少一个子地址信息,包括:
    根据预先设置的对应不同地址类型的文本地址词典所表示的地址类型,从所述目标地址信息中拆分出对应不同地址类型的子地址信息。
  6. 如权利要求1至5任一所述的地址搜索方法,其特征在于,所述地址类型包括下述中的一种或多种信息组合:
    行政区域信息、道路名称信息、建筑/单位名称信息、所述道路名称的附属内容、所述建筑/单位名称的附属内容。
  7. 如权利要求1至6任一所述的地址搜索方法,其特征在于,将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,包括:将所述目标地址信息中的至少一个子地址信息分别与所述地址数据库中地址类型相同的子地址信息进行相应匹配。
  8. 如权利要求1至6任一所述的地址搜索方法,其特征在于,将所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,包括:
    从地址数据库中选择一个地址信息,确定选择的地址信息中包含的子地址信息;
    分别将待搜索的目标地址中包含的至少一个子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,其中,进行匹配计算的所述待搜索的目标地址中包含的子地址信息的地址类型与选择的地址信息中包含的子地址信息的地址类型相同;
    根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
  9. 如权利要求8所述的地址搜索方法,其特征在于,根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度,包括:
    将选择的地址信息与待搜索的目标地址信息进行匹配计算,得到第二匹配度;
    根据所述第一匹配度和所述第二匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
  10. 如权利要求7至9任一所述的地址搜索方法,其特征在于,将待搜索的目标地址中包含的子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,包括:
    针对待搜索的目标地址信息中包含的每一个子地址信息,执行:
    针对该子地址信息,从选择的地址信息中查找出与该子地址信息属于同一地址类型的子地址信息;
    计算将该子地址信息与查找到的子地址信息进行相互转换所需的编辑距离;
    根据待搜索的目标地址信息中包含的每一个子地址信息分别对应得到的 编辑距离,计算所述第一匹配度。
  11. 如权利要求8至10任一所述的地址搜索方法,其特征在于,将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出,包括:
    根据计算得到每一次选择的地址信息与待搜索的目标地址信息的总匹配度,按照总匹配度从大到小的顺序,依次确定设定个数的总匹配度;
    将确定的总匹配度分别对应选择的地址信息,作为搜索到的目标地址信息进行输出。
  12. 一种地址搜索设备,其特征在于,包括:
    获取模块,用于获取地址搜索请求信息,并确定所述地址搜索请求信息中包含的待搜索的目标地址信息;
    拆分模块,用于将所述获取模块确定的所述目标地址信息拆分为至少一个子地址信息,所述目标地址信息为由多个不同的子地址信息组成的,所述多个不同的子地址信息分别对应不同的地址类型;
    搜索模块,用于将所述拆分模块得到的所述至少一个子地址信息或者所述至少一子地址信息和所述目标地址信息与地址数据库中包含的不同地址信息进行匹配,其中,所述地址数据库中存储的每一条地址信息包含构成该地址信息的不同子地址信息;将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出。
  13. 如权利要求12所述的地址搜索设备,其特征在于,
    所述获取模块,具体用于接收输入的语音数据,其中,所述语音数据用以发起地址搜索;
    对所述语音数据进行识别,得到所述语音数据中包含的待搜索的目标地址信息。
  14. 如权利要求12或13所述的地址搜索设备,其特征在于,所述获取模块,具体用于通过以下方式得到待搜索的目标地址信息:
    确定所述地址搜索请求信息中包含的至少一个关键词;
    针对确定的每一个关键词,执行:在预先设置的对应不同地址类型的文本地址词典中,找到包含该关键词的文本地址词典;利用用于表征找到的文本地址词典对应的地址类型的字符串,替换该关键词;
    利用用以表示地址信息的正则表达式,判断每一个关键词被替换为对应的字符串后构成的字符串组是否表示地址信息;
    在确定所述字符串组表示地址信息时,根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串;
    将所述准地址字符串作为一个条件随机场CRF特征,基于CRF算法在所述地址搜索请求信息中提取待搜索的目标地址信息。
  15. 如权利要求14所述的地址搜索设备,其特征在于,
    所述获取模块,具体用于根据所述字符串组确定待搜索的目标地址信息对应的准地址字符串,具体包括:
    确定所述字符串组包含的字符串为多个、且在多个字符串分别替换的关键词在所述地址搜索请求信息中位置连续;
    若多个字符串不存在重复时,将所述多个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串;
    若多个字符串存在重复时,去除重复的字符串,并将去除重复的字符串后的至少一个字符串基于替换的关键词的位置连续性合并为一个字符串组,作为待搜索的目标地址信息对应的准地址字符串。
  16. 如权利要求12至15任一所述的地址搜索设备,其特征在于,
    所述拆分模块,具体用于根据预先设置的对应不同地址类型的文本地址词典所表示的地址类型,从所述目标地址信息中拆分出对应不同地址类型的子地址信息。
  17. 如权利要求12至16任一所述的地址搜索设备,其特征在于,所述地 址类型包括下述中的一种或多种信息组合:
    行政区域信息、道路名称信息、建筑/单位名称信息、所述道路名称的附属内容、所述建筑/单位名称的附属内容。
  18. 如权利要求12至17任一所述的地址搜索设备,其特征在于,
    所述搜索模块,具体用于将所述目标地址信息中的至少一个子地址信息分别与所述地址数据库中地址类型相同的子地址信息进行相应匹配。
  19. 如权利要求12至17任一所述的地址搜索设备,其特征在于,
    所述搜索模块,具体用于从地址数据库中选择一个地址信息,确定选择的地址信息中包含的子地址信息;
    分别将待搜索的目标地址中包含的至少一个子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,其中,进行匹配计算的所述待搜索的目标地址中包含的子地址信息的地址类型与选择的地址信息中包含的子地址信息的地址类型相同;
    根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
  20. 如权利要求19所述的地址搜索设备,其特征在于,所述搜索模块,具体用于根据所述第一匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度,具体包括:
    将选择的地址信息与待搜索的目标地址信息进行匹配计算,得到第二匹配度;
    根据所述第一匹配度和所述第二匹配度,得到选择的地址信息与待搜索的目标地址信息的总匹配度。
  21. 如权利要求18或20所述的地址搜索设备,其特征在于,
    所述搜索模块,具体用于将待搜索的目标地址中包含的子地址信息与选择的地址信息中包含的子地址信息进行匹配计算,得到第一匹配度,具体包括:
    针对待搜索的目标地址信息中包含的每一个子地址信息,执行:
    针对该子地址信息,从选择的地址信息中查找出与该子地址信息属于同一地址类型的子地址信息;
    计算将该子地址信息与查找到的子地址信息进行相互转换所需的编辑距离;
    根据待搜索的目标地址信息中包含的每一个子地址信息分别对应得到的编辑距离,计算所述第一匹配度。
  22. 如权利要求18至21任一所述的地址搜索设备,其特征在于,
    所述搜索模块,具体用于将匹配得到的匹配度大于设定阈值的地址信息,作为搜索到的目标地址信息进行输出,具体包括:
    根据计算得到每一次选择的地址信息与待搜索的目标地址信息的总匹配度,按照总匹配度从大到小的顺序,依次确定设定个数的总匹配度;
    将确定的总匹配度分别对应选择的地址信息,作为搜索到的目标地址信息进行输出。
PCT/CN2015/079816 2014-09-30 2015-05-26 一种地址搜索方法和设备 WO2016050088A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP15846022.0A EP3153978B1 (en) 2014-09-30 2015-05-26 Address search method and device
US15/398,260 US10783171B2 (en) 2014-09-30 2017-01-04 Address search method and device
US16/929,611 US20200349175A1 (en) 2014-09-30 2020-07-15 Address Search Method and Device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410525978.X 2014-09-30
CN201410525978.XA CN105528372B (zh) 2014-09-30 2014-09-30 一种地址搜索方法和设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/398,260 Continuation US10783171B2 (en) 2014-09-30 2017-01-04 Address search method and device

Publications (1)

Publication Number Publication Date
WO2016050088A1 true WO2016050088A1 (zh) 2016-04-07

Family

ID=55629404

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/079816 WO2016050088A1 (zh) 2014-09-30 2015-05-26 一种地址搜索方法和设备

Country Status (4)

Country Link
US (2) US10783171B2 (zh)
EP (1) EP3153978B1 (zh)
CN (1) CN105528372B (zh)
WO (1) WO2016050088A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884390A (zh) * 2019-11-29 2021-06-01 北京三快在线科技有限公司 订单处理的方法、装置、可读存储介质及电子设备
CN113515677A (zh) * 2021-07-22 2021-10-19 中移(杭州)信息技术有限公司 地址匹配方法、装置及计算机可读存储介质

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766383B (zh) * 2016-08-22 2020-04-07 平安科技(深圳)有限公司 地址定位的方法和装置
CN108132956A (zh) * 2016-12-01 2018-06-08 北京搜狗科技发展有限公司 一种搜索方法、装置及电子设备
CN109255565B (zh) * 2017-07-14 2022-12-16 菜鸟智能物流控股有限公司 地址的归属识别和物流任务的分发方法及其装置
CN110998589B (zh) * 2017-07-31 2023-06-27 北京嘀嘀无限科技发展有限公司 用于分割文本的系统和方法
CN107577744A (zh) * 2017-08-28 2018-01-12 苏州科技大学 非标地址自动匹配模型、匹配方法以及模型建立方法
CN108197188B (zh) * 2017-12-26 2020-06-30 北京星选科技有限公司 地址信息处理方法及装置
CN108416062A (zh) * 2018-03-26 2018-08-17 国家电网公司客户服务中心 一种基于地址匹配技术的电网数据关联方法
WO2019200636A1 (zh) 2018-04-17 2019-10-24 华为技术有限公司 一种图片处理方法及相关设备
CN108960645B (zh) * 2018-07-10 2020-11-13 创新先进技术有限公司 一种风险防控方法、系统及终端设备
JP7183600B2 (ja) * 2018-07-20 2022-12-06 株式会社リコー 情報処理装置、システム、方法およびプログラム
CN109388634B (zh) * 2018-09-18 2024-05-03 平安科技(深圳)有限公司 地址信息的处理方法、终端设备及计算机可读存储介质
CN111488409A (zh) * 2019-01-25 2020-08-04 阿里巴巴集团控股有限公司 一种城市地址库构建方法、检索方法及装置
CN112115214B (zh) * 2019-06-20 2024-04-02 中科聚信信息技术(北京)有限公司 地址标准化方法、地址标准化装置和电子设备
CN113111229B (zh) * 2020-02-13 2024-04-12 北京明亿科技有限公司 基于正则表达式的接处警文本轨迹地地址提取方法和装置
CN113111230B (zh) * 2020-02-13 2024-04-12 北京明亿科技有限公司 基于正则表达式的接处警文本户籍地地址提取方法和装置
CN111522901B (zh) * 2020-03-18 2023-10-20 大箴(杭州)科技有限公司 文本中地址信息的处理方法及装置
US11523250B1 (en) * 2021-05-12 2022-12-06 Valassis Digital Corp. Computer system with features for determining reliable location data using messages with unreliable location data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350012A (zh) * 2007-07-18 2009-01-21 北京灵图软件技术有限公司 一种地址匹配的方法和系统
CN101719128A (zh) * 2009-12-31 2010-06-02 浙江工业大学 一种基于模糊匹配的中文地理编码确定方法
CN101882163A (zh) * 2010-06-30 2010-11-10 中国科学院地理科学与资源研究所 一种基于匹配规则的模糊中文地址地理赋值方法
CN101996248A (zh) * 2010-11-10 2011-03-30 百度在线网络技术(北京)有限公司 地址查询方法及装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6487495B1 (en) 2000-06-02 2002-11-26 Navigation Technologies Corporation Navigation applications using related location-referenced keywords
EP1160694A3 (en) 2000-06-02 2005-08-03 Navteq North America, LLC Method and system for forming a keyword database for referencing physical locations
US7376636B1 (en) * 2002-06-07 2008-05-20 Oracle International Corporation Geocoding using a relational database
US6934634B1 (en) * 2003-09-22 2005-08-23 Google Inc. Address geocoding
US8150848B2 (en) * 2008-01-04 2012-04-03 Google Inc. Geocoding multi-feature addresses
US8867999B2 (en) * 2009-01-26 2014-10-21 Qualcomm Incorporated Downlink interference cancellation methods
US20110270815A1 (en) * 2010-04-30 2011-11-03 Microsoft Corporation Extracting structured data from web queries
US9575963B2 (en) * 2012-04-20 2017-02-21 Maluuba Inc. Conversational agent
US9544721B2 (en) * 2013-07-26 2017-01-10 Apple Inc. Address point data mining
CN103914544A (zh) * 2014-04-03 2014-07-09 浙江大学 一种基于地址特征词的多层次快速中文地址匹配方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350012A (zh) * 2007-07-18 2009-01-21 北京灵图软件技术有限公司 一种地址匹配的方法和系统
CN101719128A (zh) * 2009-12-31 2010-06-02 浙江工业大学 一种基于模糊匹配的中文地理编码确定方法
CN101882163A (zh) * 2010-06-30 2010-11-10 中国科学院地理科学与资源研究所 一种基于匹配规则的模糊中文地址地理赋值方法
CN101996248A (zh) * 2010-11-10 2011-03-30 百度在线网络技术(北京)有限公司 地址查询方法及装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884390A (zh) * 2019-11-29 2021-06-01 北京三快在线科技有限公司 订单处理的方法、装置、可读存储介质及电子设备
CN113515677A (zh) * 2021-07-22 2021-10-19 中移(杭州)信息技术有限公司 地址匹配方法、装置及计算机可读存储介质
CN113515677B (zh) * 2021-07-22 2023-10-27 中移(杭州)信息技术有限公司 地址匹配方法、装置及计算机可读存储介质

Also Published As

Publication number Publication date
CN105528372B (zh) 2019-05-24
US10783171B2 (en) 2020-09-22
EP3153978A1 (en) 2017-04-12
EP3153978B1 (en) 2020-04-22
EP3153978A4 (en) 2017-10-18
US20200349175A1 (en) 2020-11-05
CN105528372A (zh) 2016-04-27
US20170116224A1 (en) 2017-04-27

Similar Documents

Publication Publication Date Title
WO2016050088A1 (zh) 一种地址搜索方法和设备
CN108038183B (zh) 结构化实体收录方法、装置、服务器和存储介质
CN109213844B (zh) 一种文本处理方法、装置以及相关设备
CN104142915B (zh) 一种添加标点的方法和系统
WO2017215370A1 (zh) 构建决策模型的方法、装置、计算机设备及存储设备
WO2018177316A1 (zh) 信息识别方法、计算设备及存储介质
WO2016091174A1 (zh) 图数据的搜索方法和装置
CN107203526B (zh) 一种查询串语义需求分析方法及装置
CN104679801B (zh) 一种兴趣点搜索方法和装置
CN105095178B (zh) 实现文本语义容错理解的方法及系统
WO2022095256A1 (zh) 一种地理编码方法、系统、终端以及存储介质
CN108572999B (zh) 兴趣面aoi轮廓的搜索方法及装置
CN110990520B (zh) 一种地址编码方法、装置、电子设备和存储介质
CN105893524A (zh) 一种智能问答方法及装置
CN110688434B (zh) 一种兴趣点处理方法、装置、设备和介质
CN109918664B (zh) 分词方法和装置
CN112528174A (zh) 基于知识图谱和多重匹配的地址修整补全方法及应用
WO2011082628A1 (zh) 查找信息的方法和装置
CN112364113A (zh) 一种地址纠错方法及系统
CN103559177A (zh) 一种地名识别方法及装置
CN110781657A (zh) 导航播报的管理方法、装置及设备
WO2024066903A1 (zh) 识别待识别医药行业目标对象的方法、设备和介质
CN111310450B (zh) 一种字符串分词方法、装置、设备及存储介质
US20230075033A1 (en) Ride-hailing method and apparatus, electronic device and readable storage medium
CN103176953B (zh) 一种文本处理方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15846022

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2015846022

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015846022

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE