US20150356173A1

US20150356173A1 - Search device

Info

Publication number: US20150356173A1
Application number: US14/762,125
Authority: US
Inventors: Takeyuki Aikawa; Yusuke Koji
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2013-03-04
Filing date: 2013-03-04
Publication date: 2015-12-10
Also published as: JPWO2014136173A1; WO2014136173A1; JP5951105B2; DE112013006764T5; CN105027119A

Abstract

A search device includes: a similar word candidate acquirer including a word dictionary searcher to perform a comparison between an input character string and word character string data, and search for word character string data similar to the input character string to acquire, as similar word candidates, the word character string data, and a number-of-similar-word-candidates controller to select similar word candidates from the similar word candidates according to a preset threshold; a similar word selector to calculate an edit distance between each of the similar word candidates selected and the input character string, and select, as a similar word, a similar word candidate whose edit distance is equal to or less than a predetermined distance; and a name searcher to refer to a name search index data storage to search for a search text including the similar word selected by the similar word selector.

Description

FIELD OF THE INVENTION

The present invention relates to a search device that performs an ambiguous search through the inside of data registered in advance by using, as a search key, not only an official name but also an abbreviation, a half-remembered name, or the like.

BACKGROUND OF THE INVENTION

There is a case in which when searching for an address or a facility name by using a search device, the user does not necessarily remember its exact name, but causes the search device to perform a search by using, as a search key, a common name, an abbreviation, a half-remembered incorrect name or the like. Further, in a terminal or equipment, such as a car navigation device or a smart phone, which does not have a keyboard as an input device, there is a case in which a search is performed on the basis of a result of having performed voice recognition on a voice signal inputted via a microphone, a result of having performed character recognition on an input done via a touch panel, or the like. In the case of an input using either one of these input devices, there exists an input error caused by a failure of the user, such as a recognition error or a keying error.
In either of the case in which a common name, an abbreviation, a half-remembered incorrect name or the like is used as a search key, and the case in which an input error caused by the user exists, a technique of performing an ambiguous search for not only an official name but also a name whose character string or pronunciation is similar to that of its official name is required.
As a technique of performing an ambiguous search, for example, patent reference 1 is disclosed. In patent reference 1, a technique of searching for similar word candidates by using the matching degree of a partial character string from an inputted key word, further extracting a similar word having a shorter edit distance with the input keyword from these similar words candidates, and performing an ambiguous preamble search by adding the similar word as a search keyword is disclosed.
For example, when “acetaldehyde” is inputted as a search keyword, similar word candidates including “acet”, “alde”, and “hyde” which are partial character strings, e.g., similar words candidates, such as “acetaldeyde” and “acetaldol”, are searched for. Next, by calculating an edit distance between the input keyword “acetaldehyde” and each of the similar word candidates, and then performing a full-text search by using a similar word “acetaldeyde” having a smaller edit distance among the similar word candidates, search omissions are prevented.

Claims

1. A search device that performs a search process by using, as a search key, an input character string including ambiguity, to acquire a search text, said search device comprising:

a word dictionary to store word character string data about each of words into which said search text is divided;

a similar word candidate acquirer including a word dictionary searcher to perform a comparison between said input character string and word character string data stored in said word dictionary, and search for word character string data similar to said input character string to acquire, as similar word candidates, the word character string data which have been searched for, and a number-of-similar-word-candidates controller to select similar word candidates from the similar word candidates acquired by said word dictionary searcher according to a preset threshold;

a similar word selector to calculate an edit distance between each of the similar word candidates selected by said number-of-similar-word-candidates controller and said input character string, and select, as a similar word, a similar word candidate whose calculated edit distance is equal to or less than a predetermined distance;

a search index data storage to store said search text; and

a text searcher to refer to said search index data storage to search for a search text including the similar word selected by said similar word selector,

wherein said similar word candidate acquirer includes a number-of-input-characters determinator to determine whether a number of characters of said input character string is large or small, and calculate said threshold according to a result of the determination.

2. (canceled)

3. The search device according to claim 1, wherein said similar word candidate acquirer includes a number-of-input-words determinator to, when said input character string consists of a plurality of words, determine whether a number of words of said input character string is large or small, and calculate said threshold according to a result of the determination.

4. The search device according to claim 1, wherein said similar word candidate acquirer includes a specific character string determinator to determine whether said input character string matches a specific character string which is preset, and acquire said threshold corresponding to a result of the determination.

5. The search device according to claim 1, wherein said similar word candidate acquirer includes an arithmetic load determinator to acquire an arithmetic load on said search device, determine whether said arithmetic load is high or low, and calculate said threshold according to a result of the determination.

6. The search device according to claim 1, wherein said search device includes a similar character string weight table to define combinations of similar character strings, and said similar word candidate acquirer includes a similar character string expander to refer to said similar character string weight table to expand said input character string to similar character strings, and wherein said word dictionary searcher performs a comparison between said input character string and the similar character strings after the expansion by said similar character string expander, and the word character string data stored in said word dictionary, and searches for word character string data similar to said input character string and said similar character strings after the expansion, to acquire the word character string data as said similar word candidates.

7. The search device according to claim 1, wherein said search device includes a similar word integrator to compare each of similar words selected by said similar word selector with said input character string, search through said similar words for a plurality of similar words each of whose leading character string matches said input character string, and integrate the plurality of similar words which said similar word integrator has searched for into a similar word, and wherein said text searcher refers to said search index data storage, and searches for a search text including the similar word after the integration by said similar word integrator.

8. The search device according to claim 1, wherein said search device includes:

an input character string divider to, when said input character string consists of a plurality of words, generate an after-division input character string in which said input character string is divided on a per word basis;

a number-of-yet-to-be-processed-words determinator to determine whether processes of said similar word candidate acquirer, said similar word selector, and said text searcher are performed on all character strings of said after-division input character string on a basis of a search text which said text searcher has searched for; and

a search result integrator to, when said number-of-yet-to-be-processed-words determinator determines that said processes have been performed on all the character strings of said after-division input character string, integrate search texts which said text searcher has searched for.