CN101751465A - Method and system for determining similar word with input string - Google Patents

Method and system for determining similar word with input string Download PDF

Info

Publication number
CN101751465A
CN101751465A CN200910250398A CN200910250398A CN101751465A CN 101751465 A CN101751465 A CN 101751465A CN 200910250398 A CN200910250398 A CN 200910250398A CN 200910250398 A CN200910250398 A CN 200910250398A CN 101751465 A CN101751465 A CN 101751465A
Authority
CN
China
Prior art keywords
character string
language
input
similar word
candidate character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910250398A
Other languages
Chinese (zh)
Other versions
CN101751465B (en
Inventor
金泰壹
寄允舒
李道佶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NHN Corp
Original Assignee
NHN Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NHN Corp filed Critical NHN Corp
Publication of CN101751465A publication Critical patent/CN101751465A/en
Application granted granted Critical
Publication of CN101751465B publication Critical patent/CN101751465B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a method and a system for determining similar words. The method includes the following steps: judging whether input string is a first language or a second language; calculating an editing distance of the input string and a string marked by the second language for the sonifaction of a candidate string marked by the first language when the input string is the first language; determining the candidate string marked by the first language in the candidate string as the similar word of the input string, and the candidate string determined as the similar word corresponding to the string marked by the second language that the editing distance of the input string is less than a standard value.

Description

Similar word determining method and system
Technical field
The present invention relates to retrieval service, relating in particular to provides fremdsprachig similar word or at the similar word of the Korean mark of the relevant foreign language pronunciation method and system as the recommendation query speech.
Background technology
Recently, raising along with science and technology development and economic level, communication networks such as hypervelocity the Internet have obtained popularizing, hypervelocity communication network user also sharply increases thereupon, and hypervelocity communication network user's rapid increase has simultaneously promoted the exploitation of the new service item of utilizing communication network and the variation of service item.In this service that utilizes communication network, the most general service is a retrieval service.
Retrieval service is meant when the user input query speech, to the service of (for example, comprising picture of comprising the query word of being imported in the website of the query word of being imported, the report that comprises the query word of being imported or the filename etc.) of query word corresponding retrieval results that the user provides and imports.
But, the user who utilizes retrieval service is when the input inquiry speech, query word that may the input error owing to typing error perhaps may can not be imported the own query word that will import exactly owing to the correct literary style of not knowing own needed query word.At this moment, retrieval service supplier can only retrieve based on the query word of reality input, and its result causes the user can't obtain own needed result for retrieval.
In order to address this is that, nearest retrieval service provides the query word at user's input to provide abundant retrieval service such as recommendation query speech or relevant inquiring speech.At this, provide the recommendation query speech to be meant the service of in the similar query word of the query word of being imported with the user, selecting a part and being provided as the recommendation query speech.
But, when this recommendation query speech is provided, there are the following problems: when the user is not familiar with foreign language and does not know relevant fremdsprachig accurate mark or when accurately pronouncing, the user can wrongly import the foreign language mark of the query word that will retrieve or the Korean mark of relevant foreign language pronunciation, causes providing the user desired result for retrieval exactly.
Summary of the invention
The present invention provides in order to address the above problem, its technical matters that will solve is for providing a kind of similar word determining method and system, to be used for when the user does not know with the query word corresponding foreign language accurately mark that will retrieve or foreign language pronunciation accurately, also similar foreign language character string or the foreign language character string corresponding with similar pronunciation can being provided as the recommendation query speech.
And, another technical matters that the present invention will solve is for providing a kind of similar word determining method and system, when being foreign language with the query word that is used for to retrieve as the user, even the user does not know relevant foreign language, as long as the user knows the native language mark of relevant foreign language pronunciation, just the similar foreign language of pronunciation can be provided as the recommendation query speech.
And the another technical matters that the present invention will solve when being various language such as Chinese, English with the query word that is used for will retrieving as the user, also can be provided as similar word the recommendation query speech for a kind of similar word determining method and system are provided.
To achieve these goals, similar word determining method according to an aspect of the present invention comprises the steps: to judge whether input of character string is first language or second language; When described input of character string is described second language, calculating will be carried out the character string of mark and the editing distance of described input of character string with described second language with the pronunciation that described first language carries out the described candidate character string in the candidate character string of mark; The candidate character string that the described first language of usefulness in the described candidate character string is carried out mark is defined as the similar word of described input of character string, and the candidate character string that is defined as similar word is corresponding to carrying out the character string of mark with the described second language of the usefulness of described editing distance below reference value of described input of character string.
At this, when described second language was native language, the character string that the pronunciation of described candidate character string is labeled as described second language was by described candidate character string the sound converter (using other language tags pronunciations) of borrowing of described native language is changed.Preferably, when described second language was Korean, was that sound converter (using the pronunciation of Korean mark) borrowed in Korean at described native language by means of the sound converter.
In addition, in described determining step, also comprise: when being described first language, calculates described input of character string the step of the editing distance of described candidate character string and described input of character string, and the step that the candidate character string of the described editing distance in the described candidate character string below reference value is defined as the similar word of described input of character string.
And, comprise also in the step of described calculating editing distance that can input of character string that judgement is made of described second language be converted into the step of described first language, when the input of character string that is made of described second language can not be converted into described first language, calculate the pronunciation of described candidate character string is carried out the character string of mark and the editing distance of described input of character string with described second language.
And, when the input of character string that is made of described second language can be converted into described first language, comprise the steps: that also the input of character string that will be made of described second language is converted to described first language; Calculate the described editing distance that is converted to the character string and the described candidate character string of first language; The candidate character string of described editing distance in the described candidate character string below reference value is defined as the similar word of described input of character string.
At this moment, in described switch process, when described first language is foreign language, utilizes at this fremdsprachig sound restoring device (borrowing the reverse operating of sound conversion) of borrowing and described input of character string is converted to the character string that constitutes by described foreign language.In an embodiment of the present invention, when described first language is Japanese, can borrow sound restoring device (character string is converted to the corresponding with it Japanese character string of pronunciation) for Japanese at the described fremdsprachig sound restoring device of borrowing.
In an embodiment of the present invention, described first language is a certain foreign language, and described second language is a native language.And described input of character string and candidate character string can be the retrieval and inquisition speech.
In addition, similar word determining method according to an embodiment of the invention, before described determining step, also comprise the step that receives described input of character string from user terminal, and after the step of described definite similar word, also comprise the step that described definite similar word is offered described user terminal as the recommendation query speech.
In an embodiment of the present invention, described candidate character string be from the candidate character string of storage in advance with the candidate character string of editing distance below reference value of described input of character string in selected, or from the candidate character string of storage in advance with described input of character string comprise in the candidate character string of common character with the character similarity score of described input of character string selected in interior candidate character string in preceding N position, or from the candidate character string of storage in advance and the candidate character string of editing distance below reference value described input of character string and comprise in the candidate character string of common character with the character similarity score of described input of character string selected in interior candidate character string in preceding N position with described input of character string.
At this, the candidate character string of editing distance described in the described candidate character string below reference value utilizes the asterisk wildcard retrieval to select according to each operation that is used to calculate described editing distance respectively, the candidate character string that comprises common character with described input of character string is the candidate character string that comprises common n meta structure with described input of character string, the size of the n meta structure that described character similarity score utilization and described input of character string are common, the quantity of described common n meta structure, the similarity of the described common found position of n meta structure and the length difference between described input of character string and described each candidate character string decide.
To achieve these goals, similar word decision systems according to a further aspect of the invention, comprise: user interface section, this user interface section receives input of character string from user terminal, and will offer described user terminal as the recommendation query speech at the similar word of described input of character string; Similar word decision unit, be used for when described input of character string is second language, the candidate character string that the described first language of the pairing usefulness of character string that the described second language of the usefulness of editing distance below reference value that the pronunciation that will carry out the described candidate character string in the candidate character string of mark with first language is carried out the character string of mark and described input of character string with described second language carries out mark carries out mark is defined as the similar word of described input of character string and offers described user interface section.
According to the present invention, when the query word that will retrieve as the user is foreign language,, also can be provided as the recommendation query speech by the foreign language that mark is similar even do not know foreign language accurately.
In addition, according to the present invention, when the query word that will retrieve as the user is foreign language,,, just the similar foreign language that pronounce can be provided as the recommendation query speech as long as the user knows the native language mark of relevant foreign language pronunciation even the user does not know relevant foreign language.
In addition, according to the present invention, when the query word that will retrieve as the user is various language such as Chinese, English, also similar word can be provided as the recommendation query speech.
Description of drawings
Fig. 1 is the schematic block diagram according to the similar word decision systems of the embodiment of the invention;
Fig. 2 is the concrete structure synoptic diagram of the similar word decision unit among Fig. 1;
Fig. 3 is the process flow diagram that is used to represent according to the similar word determining method of the embodiment of the invention.
Main symbol description:
100: similar word decision systems 110: the Internet
120: user terminal 130: user interface section
140: the candidate character string provides unit 150: similar word decision unit
Embodiment
Below, describe embodiments of the invention in detail with reference to accompanying drawing.
Fig. 1 is for the schematic network structure of the similar word decision systems that provides according to the embodiment of the invention is provided.As shown in the figure, similar word decision systems 100 receives input of character string from the user terminal 120 that is connected by the Internet 110, determine similar word, and determined similar word is offered user terminal 120 as the recommendation query speech at the input of character string that is received.As shown in the figure, this similar word decision systems 100 comprises that user interface section 130, candidate character string provide unit 140 and similar word decision unit 150.
At first, user interface section 130 receives with foreign language from user terminal 120 and carries out the input of character string of mark or with the input of character string of the described fremdsprachig pronunciation of Korean mark, and similar word determines that with the similar word of describing unit 150 is provided to user terminal 120 from behind as the recommendation query speech at input of character string.
The candidate character string provides unit 140 in order to determine to offer user's recommendation query speech, the candidate character string is offered similar word decision unit 150, with the editing distance of calculating with input of character string.Be that the part in the query word of storage in advance is chosen to be the candidate character string among the present invention, similar word decision unit 150 is not at all query words calculating of storage in advance and the editing distance of input of character string, provide candidate character string that unit 140 provided and the editing distance between the input of character string but calculate, thereby can improve the response speed that service is provided at the similar word of the query word of being imported from the candidate character string.
At this, the candidate character string provide candidate character string that unit 140 provides for the candidate character string of editing distance below reference value of input of character string, or with input of character string comprise in the candidate character string of common character with character similarity score input of character string in preceding N position with interior candidate character string, or with the candidate character string of editing distance below reference value of input of character string and with input of character string comprise in the candidate character string of common character character similarity score with input of character string in preceding N position with interior candidate character string, can be stored in database (not shown) in advance.
And the candidate character string of described candidate character string inediting distance below reference value can be respectively utilizes asterisk wildcard (Wild Card Character) retrieval to select according to each operation that is used to calculate editing distance.
At this, each operation comprises inserts operation, deletion action, replacement operation and ex-situ operations, insert and operate the operation that is meant the new character of increase in specific character string and takes place, deletion action is meant the character that is comprised in the deletion specific character string and the operation that takes place, the operation that the character replacement that will be comprised in the specific character string of being meant replacement operation takes place for new character, ex-situ operations are meant the order that changes the adjacent character that is comprised in the specific character string and the operation that takes place.
And, the candidate character string that comprises common character with described input of character string is the candidate character string that comprises common n meta structure (ngram) with described input of character string, and described character similarity score can utilize and the quantity of the size of the n meta structure that described input of character string is common, described common n meta structure, the similarity of the described common found position of n meta structure and the length difference between described input of character string and described each candidate character string decide.
When input of character string is corresponding with the pronunciation of first language when carrying out the character string of mark with second language, similar word decision unit 150 provides the candidate character string candidate character string that satisfies predetermined condition in the candidate character string of the storage in advance that unit 140 provides to be defined as the similar word of described input of character string and to offer user interface section 130, and this predetermined condition is carried out the character string of mark and the editing distance between the described input of character string below reference value for the described second language of the corresponding usefulness of pronunciation with determined candidate character string.For this reason, as shown in Figure 2, similar word decision unit 150 comprises that whether input of character string judging unit 210, character string change judging unit 220, the first editing distance computing unit 230, first determines unit 240, the second editing distance computing unit 250, second decision unit 260 and the character string converting unit 270.Below, specify similar word decision unit 150 with reference to Fig. 2.
Input of character string judging unit 210 judges that whether input of character string is the character string or corresponding with the pronunciation of the first language character string of carrying out mark with second language of carrying out mark with first language.As an embodiment, first language can be a certain foreign language, and second language can be a native language.For example, in Korea S, first language can be as a kind of language in the multiple foreign language such as fremdsprachig Japanese, Chinese and English, and second language can be the Korean as native language.If described first language is a Japanese, then described input of character string can comprise at least one in hiragana, katakana and the Chinese character.At this, described input of character string and candidate character string may be the retrieval and inquisition speech.
As an embodiment, input of character string judging unit 210 can judge that whether input of character string is to carry out the character string of mark or carry out the character string of mark with second language with first language according to the character code of each character of input of character string.For example, suppose that first language is that Japanese, second language are Korean, then input of character string judging unit 210 is confirmed the character code of all syllables of input of character string, have only when all characters all are the Korean, can judge that just input of character string is a Korean, and when being marked with Japanese and Korean simultaneously, can judge that then input of character string is a Japanese.More preferably, each character conversion of input of character string can be become UCS-2 coding, when UNICODE value when 0xAC00~0xD7A3 is regional, can judge that input of character string is a Korean.
In addition, the language that first language and second language are not confined in the present specification to be put down in writing among the present invention, first language and second language can be various language.And, for ease of the explanation, below the supposition first language be that Japanese and second language are to describe under the Korean prerequisite.
That is to say that input of character string judging unit 210 judges that whether the character string imported by user terminal is the Japanese character string or the character string of pronouncing with Korean mark Japanese.
When judging input of character string by input of character string judging unit 210 for the input of character string of described second language mark the time, whether character string is changed 220 pairs of described input of character string of judging unit and could be converted into the character string of described first language mark and judge.For example, when input of character string be
Figure G2009102503983D00061
Figure G2009102503983D00062
The time, there is corresponding with it Japanese mark " つ ぷ り ", therefore be judged as and can convert Japanese to; And work as input of character string be The time, owing to there is not corresponding with it Japanese mark, therefore is judged as and can not be converted into Japanese.
When judging described input of character string by input of character string judging unit 210 for the time, the character string of the described second language mark of usefulness of the pronunciation correspondence of 230 calculating of the first editing distance computing unit and described candidate character string and the editing distance between the described input of character string with the character string of described second language mark.For example, when the character string of input of character string for pronouncing with the Korean mark
Figure G2009102503983D00071
The time, calculate input of character string
Figure G2009102503983D00072
Editing distance with the Korean character string of mark candidate character string pronunciation.
As an embodiment, when whether changing judging unit 220 by described character string and judge input of character string with described second language mark and can not be converted into character string with described first language mark, can calculate the character string of the described second language mark of usefulness of pronunciation of corresponding described candidate character string and the editing distance of described input of character string by the first editing distance computing unit 230.That is to say, as mentioned above, when as The input of character string with the Korean mark can not be converted into Japanese the time, calculate
Figure G2009102503983D00074
With the pronunciation of corresponding candidate character string
Figure G2009102503983D00076
Etc. the editing distance between the character string.
The first decision unit 240 is defined as the similar word of described input of character string with the candidate character string in the described candidate character string, the candidate character string that is confirmed as similar word corresponding to the character string of the described second language mark of the usefulness of described editing distance below reference value of described input of character string.For example, when the character string of input of character string for usefulness Korean mark
Figure G2009102503983D00077
The time, with
Figure G2009102503983D00078
Editing distance below the reference value
Figure G2009102503983D00079
Pairing candidate character string " つ ぷ り ", " つ The Ru ", " え Ru " can be confirmed as the similar word of input of character string.In addition, the reference value of the editing distance among the present invention can according to circumstances be changed, and is not defined as special value.
At this, when described second language is Korean, can utilizes at the Korean of described candidate character string and borrow the pronunciation corresponding character string of with second language carrying out mark of sound converter (using the pronunciation of Korean mark candidate character string) acquisition with described candidate character string.
When judge described input of character string by described input of character string judging unit 210 is when carrying out the character string of mark with described first language, and the second editing distance computing unit 250 calculates the editing distance of described candidate character strings and described input of character string.For example, when input of character string is Japanese " つ ぷ Ru ", calculate the editing distance of candidate character string and input of character string " つ ぷ Ru ".
The second decision unit 260 is with the similar word that the candidate character string of editing distance below reference value that the described second editing distance computing unit 250 calculated is defined as described input of character string of passing through in the described candidate character string.For example, when input of character string is Japanese " つ ぷ Ru ", will be defined as the similar word of input of character string " つ ぷ Ru " with candidate character string " つ ぷ り ", " つ The Ru ", " the え Ru " of editing distance below reference value of input of character string " つ ぷ Ru ".
Judge the input of character string that carries out mark with described second language and can be converted into when carrying out the character string of mark with described first language when whether change judging unit 220 by described character string, character string converting unit 270 will be converted to the character string of carrying out mark with described first language with the input of character string that described second language carries out mark.At this, the second editing distance computing unit 250 can calculate to be converted to described first language by character string converting unit 270 and carry out the input of character string of character string of mark and the editing distance of described candidate character string.
As an embodiment, when described first language is Japanese, described character string converting unit 270 can be that described input of character string is converted to the Japanese of Japanese character string by means of sound restoring device (borrow the reverse operating device of sound converter, character string is converted to the corresponding with it Japanese character string of pronunciation).
In addition, the candidate character string provides unit 140 to be included in the similar word decision systems 100 in the foregoing description, but it can be included in the other system in variation, also can replace its effect by database (not shown).
Below, illustrate according to similar word determining method of the present invention with reference to Fig. 3.Fig. 3 is the process flow diagram that is used to represent according to the similar word determining method of the embodiment of the invention.
As shown in the figure, receive the character string (S300) of user's input by user terminal.
Then, judge whether input of character string is first language or second language (S310).At this, first language can be a certain language in Japanese, Chinese and the English, and second language can be a Korean.That is to say, judge whether input of character string is the Korean mark that Japanese or corresponding Japanese pronounce.And when described first language was Japanese, described input of character string can comprise at least one in hiragana, katakana and the Chinese character.
Then, when input of character string is second language, judge that can the input of character string that be made of described second language be converted into described first language (S320).For example, when input of character string be
Figure G2009102503983D00081
The time, there is corresponding with it Japanese mark " つ ぷ り ", therefore be judged as and can convert Japanese to; And work as input of character string be
Figure G2009102503983D00082
The time, owing to there is not corresponding with it Japanese mark, therefore is judged as and can not be converted into Japanese.
Then, when input of character string can not be converted into described first language, calculating will be carried out the character string of mark and the editing distance of described input of character string (S330) with described second language with the pronunciation that described first language carries out the described candidate character string in the candidate character string of mark.At this, described input of character string and candidate character string may be the retrieval and inquisition speech.For example, when the character string of input of character string for pronouncing with the Korean mark
Figure G2009102503983D00083
The time, calculate input of character string And with the editing distance between the character string of the pronunciation of Korean mark candidate character string.
At this, the candidate character string be in the candidate character string of storing in advance and editing distance input of character string below reference value the candidate character string or with input of character string comprise in the candidate character string of common character with the character similarity score of input of character string in preceding N position with interior candidate character string or with the candidate character string of editing distance below reference value of input of character string and with input of character string comprise in the candidate character string of common character character similarity score with input of character string in preceding N position with interior candidate character string.
And the candidate character string of described candidate character string inediting distance below reference value can be respectively utilizes asterisk wildcard (Wild Card Character) retrieval to select according to each operation that is used to calculate editing distance.
And, the candidate character string that comprises common character with described input of character string is the candidate character string that comprises common n meta structure (ngram) with described input of character string, and described character similarity score can utilize and the quantity of the size of the n meta structure that described input of character string is common, described common n meta structure, the similarity of the described common found position of n meta structure and the length difference between described input of character string and described each candidate character string decide.
Then, the described first language of usefulness that the described second language of the usefulness of described editing distance below reference value corresponding to described input of character string in the described candidate character string is carried out the character string of the mark candidate character string of carrying out mark is defined as the similar word (S340) of described input of character string.For example, when the character string of input of character string for usefulness Korean mark
Figure G2009102503983D00091
The time, with
Figure G2009102503983D00092
Editing distance below the reference value
Figure G2009102503983D00093
Pairing candidate character string " つ ぷ り ", " つ The Ru ", " え Ru " can be confirmed as the similar word of input of character string.
At last, determined similar word is offered user terminal (S350) as the recommendation query speech.
In addition, is first language in described S310 step as if input of character string, then calculate the editing distance (S360) of described candidate character string and described input of character string, the candidate character string of the described editing distance in the described candidate character string below reference value is defined as the similar word (S370) of described input of character string.For example, when input of character string is Japanese " つ ぷ Ru ", will be defined as the similar word of input of character string " つ ぷ Ru " with candidate character string " つ ぷ り ", " つ The Ru ", " the え Ru " of editing distance below reference value of input of character string " つ ぷ Ru ".
And, can be converted into described first language in described S320 step as if input of character string, to be converted to described first language (S380) by the input of character string that described second language constitutes, and calculating is converted to the input of character string of described first language and the editing distance (S360) of described candidate character string.
Described similar word determining method can be presented as and can be recorded in by the programmed instruction form that multiple computer means are carried out on the recording medium that can read by computer.At this moment, can comprise alone or in combination programmed instruction, data file, data structure etc. by the recording medium that computer reads.In addition, the programmed instruction that is recorded on the recording medium can and be formed at the special design of the present invention, perhaps can be that the technician in computer software field is known and operable.
Can comprise magnetic media such as hard disk, floppy disk and tape by the recording medium that computer reads, optical recording media such as CD-ROM, DVD, floptical disk magneto-optic media (Magneto-Optical Media) such as (Floptical Disk), and ROM, RAM, flash memory etc. are in order to store and execution of program instructions and the special hardware unit that constitutes.In addition, this recording medium can also be transmission mediums such as the metal wire that transmits the signal be used to specify programmed instruction, data structure etc., waveguide.
And programmed instruction not only comprises the machine language code that is produced by compiler, but also comprises and can use interpreter (interpreter) to wait higher-level language code by the computer execution.Above-mentioned hardware unit moves in order to carry out operation of the present invention can constitute more than one software module, vice versa.
In addition, those skilled in the art in the invention are appreciated that above-mentioned the present invention need not to change its technological thought or essential feature can be implemented according to other concrete modes.
Therefore, embodiment described above only illustrates from any aspect, rather than limitation of the invention.Compare with instructions, claims embody scope of the present invention more fully, implication of claims and scope and by its equivalent concepts derived institute changes and mode of texturing all should be included in the scope of the present invention.

Claims (22)

1. a similar word determining method is characterized in that comprising the steps:
Judge whether input of character string is first language or second language;
When described input of character string is described second language, calculating will be carried out the character string of mark and the editing distance of described input of character string with described second language with the pronunciation that described first language carries out the described candidate character string in the candidate character string of mark;
The candidate character string that the described first language of usefulness in the described candidate character string is carried out mark is defined as the similar word of described input of character string, and the candidate character string that is defined as similar word is corresponding to carrying out the character string of mark with the described second language of the usefulness of described editing distance below reference value of described input of character string.
2. similar word determining method as claimed in claim 1, it is characterized in that when described second language is native language, is by described candidate character string the sound converter of borrowing of described native language is changed with the character string of the pronunciation of the described candidate character string of described second language mark.
3. similar word determining method as claimed in claim 1 is characterized in that also comprising in described determining step:
When described input of character string is described first language, calculate the step of the editing distance of described candidate character string and described input of character string,
And the step that the candidate character string of the described editing distance in the described candidate character string below reference value is defined as the similar word of described input of character string.
4. similar word determining method as claimed in claim 1, it is characterized in that also comprising in the step of described calculating editing distance that can input of character string that judgement is made of described second language be converted into the step of described first language, when the input of character string that is made of described second language can not be converted into described first language, calculate the pronunciation of described candidate character string is carried out the character string of mark and the editing distance of described input of character string with described second language.
5. similar word determining method as claimed in claim 1 is characterized in that also comprising the steps: when the input of character string that is made of described second language can be converted into described first language
To be converted to described first language by the input of character string that described second language constitutes;
Calculate the described editing distance that is converted to the character string and the described candidate character string of first language;
The candidate character string of described editing distance in the described candidate character string below reference value is defined as the similar word of described input of character string.
6. similar word determining method as claimed in claim 5, it is characterized in that in being converted to the step of described first language, when described first language is foreign language, utilizes at this fremdsprachig sound restoring device of borrowing and to be converted to the character string that constitutes by described foreign language by the input of character string that described second language constitutes.
7. similar word determining method as claimed in claim 1 is characterized in that described first language is a certain foreign language, and described second language is a native language.
8. similar word determining method as claimed in claim 1 is characterized in that described input of character string and candidate character string are the retrieval and inquisition speech.
9. similar word determining method as claimed in claim 1, it is characterized in that before described determining step, also comprising the step that receives described input of character string from user terminal, and after the step of described definite similar word, also comprise the step that described definite similar word is offered described user terminal as the recommendation query speech.
10. similar word determining method as claimed in claim 1, it is characterized in that described candidate character string be from the candidate character string of storage in advance with the candidate character string of editing distance below reference value of described input of character string in selected, or from the candidate character string of storage in advance with described input of character string comprise in the candidate character string of common character with the character similarity score of described input of character string selected in interior candidate character string in preceding N position, or from the candidate character string of storage in advance and the candidate character string of editing distance below reference value described input of character string and comprise in the candidate character string of common character with the character similarity score of described input of character string selected in interior candidate character string in preceding N position with described input of character string.
11. similar word determining method as claimed in claim 10 is characterized in that the candidate character string of editing distance described in the described candidate character string below reference value utilize the asterisk wildcard retrieval to select according to each operation that is used to calculate described editing distance respectively.
12. similar word determining method as claimed in claim 10, the candidate character string that it is characterized in that comprising with described input of character string common character is the candidate character string that comprises common n meta structure with described input of character string, and the quantity of the size of the n meta structure that utilization of described character similarity score and described input of character string are common, described common n meta structure, the similarity of the described common found position of n meta structure and the length difference between described input of character string and described each candidate character string decide.
13. a recording medium records the program that is used for enforcement of rights requirement 1 to 12 any described method.
14. a similar word decision systems is characterized in that comprising:
User interface section, this user interface section receives input of character string from user terminal, and will offer described user terminal as the recommendation query speech at the similar word of described input of character string;
Similar word decision unit, be used for when described input of character string is second language, to be defined as the similar word of described input of character string and offer described user interface section with the candidate character string that first language carries out mark, the candidate character string that is defined as similar word is corresponding to carrying out the character string of mark with the editing distance of described input of character string with the pronunciation that first language carries out the candidate character string of mark with described second language below reference value.
15. similar word decision systems as claimed in claim 14 is characterized in that described similar word decision unit comprises:
The input of character string judging unit is used to judge whether described input of character string is first language or second language;
The first editing distance computing unit is used for calculating when described input of character string is described second language the pronunciation of described candidate character string is carried out the character string of mark and the editing distance between the described input of character string with described second language;
The first decision unit, the candidate character string that in the described candidate character string and the described second language of the usefulness of described editing distance described input of character string below reference value are carried out mark is defined as the similar word of described input of character string.
16. similar word decision systems as claimed in claim 15, it is characterized in that described similar word decision unit also comprises judges that can the input of character string that carry out mark with described second language be converted into the character string of carrying out the character string of mark with described first language and whether change judging unit, when the character string that is made of described second language can not be converted into described first language, the described first editing distance computing unit calculated the pronunciation of described candidate character string is carried out the character string of mark and the editing distance of described input of character string with described second language.
17. similar word decision systems as claimed in claim 14, it is characterized in that when described second language is native language, is by described candidate character string the sound converter of borrowing of described native language is changed with the character string of the pronunciation of the described candidate character string of described second language mark.
18. similar word decision systems as claimed in claim 14 is characterized in that described similar word decision unit also comprises:
The second editing distance computing unit is used for calculating the editing distance of described candidate character string and described input of character string when described input of character string is described first language;
Second determines the unit, is used for the candidate character string of described editing distance below reference value of described candidate character string is defined as the similar word of described input of character string.
19. similar word decision systems as claimed in claim 18, it is characterized in that described similar word determines the unit also to comprise when the input of character string that is made of described second language can be converted into described first language and will be converted to the character string converting unit of described first language by the input of character string that described second language constitutes, the described second editing distance computing unit calculates the editing distance of the character string and the described candidate character string that are converted to described first language.
20. similar word decision systems as claimed in claim 19, it is characterized in that described character string converting unit for will be converted to by the input of character string that described second language constitutes during when described first language for foreign language the character string that constitutes by these foreign language, at described fremdsprachig by means of the sound restoring device.
21. similar word decision systems as claimed in claim 14 is characterized in that described first language is a certain foreign language, described second language is a native language.
22. similar word decision systems as claimed in claim 14 is characterized in that described input of character string and candidate character string are the retrieval and inquisition speech.
CN2009102503983A 2008-12-08 2009-12-07 Method and system for determining similar word with input string Active CN101751465B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2008-0124248 2008-12-08
KR1020080124248A KR101049358B1 (en) 2008-12-08 2008-12-08 Method and system for determining synonyms

Publications (2)

Publication Number Publication Date
CN101751465A true CN101751465A (en) 2010-06-23
CN101751465B CN101751465B (en) 2013-05-08

Family

ID=42346105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009102503983A Active CN101751465B (en) 2008-12-08 2009-12-07 Method and system for determining similar word with input string

Country Status (3)

Country Link
JP (1) JP5323652B2 (en)
KR (1) KR101049358B1 (en)
CN (1) CN101751465B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268176A (en) * 2012-06-26 2015-01-07 北京奇虎科技有限公司 Recommendation method and system based on search keyword
CN105027119A (en) * 2013-03-04 2015-11-04 三菱电机株式会社 Search device

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101286296B1 (en) 2012-11-29 2013-07-15 김건오 Method and system for managing a wordgraph
KR101483433B1 (en) * 2013-03-28 2015-01-16 (주)이스트소프트 System and Method for Spelling Correction of Misspelled Keyword
CN104239495B (en) * 2014-09-09 2018-06-05 百度在线网络技术(北京)有限公司 Searching method and searcher
KR101699478B1 (en) * 2015-06-23 2017-01-25 주식회사 비엔알아이 Server for analyzing naming and method for analyzing the same
KR102353381B1 (en) * 2019-04-30 2022-01-19 정철환 Electronic device, method, and computer program for supporting naming process

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3285149B2 (en) * 1990-04-27 2002-05-27 富士ゼロックス株式会社 Foreign language electronic dictionary search method and apparatus
JPH0628396A (en) * 1992-07-06 1994-02-04 Canon Inc Electronic dictionary
JPH08339376A (en) * 1995-06-12 1996-12-24 Toshiba Corp Foreign language retrieving device and information retrieving system
JP2000127647A (en) * 1998-04-27 2000-05-09 Nobuyuki Sotani English vocabulary retrieval/check dictionary with kana heading and english vocabulary retrieval/check device
JP2000231559A (en) * 1999-02-12 2000-08-22 Matsushita Electric Ind Co Ltd Information processor
KR100318762B1 (en) * 1999-10-01 2002-01-04 윤덕용 Phonetic distance method for similarity comparison of foreign words
JP3677016B2 (en) * 2002-10-21 2005-07-27 富士ゼロックス株式会社 Foreign language electronic dictionary search device
KR100542757B1 (en) * 2003-10-02 2006-01-20 한국전자통신연구원 Automatic expansion Method and Device for Foreign language transliteration
JP4035111B2 (en) * 2004-03-10 2008-01-16 日本放送協会 Parallel word extraction device and parallel word extraction program
JP4511892B2 (en) * 2004-07-26 2010-07-28 ヤフー株式会社 Synonym search device, method thereof, program thereof, and information search device
JP4936650B2 (en) * 2004-07-26 2012-05-23 ヤフー株式会社 Similar word search device, method thereof, program thereof, and information search device
US7584093B2 (en) * 2005-04-25 2009-09-01 Microsoft Corporation Method and system for generating spelling suggestions
KR100643801B1 (en) * 2005-10-26 2006-11-10 엔에이치엔(주) System and method for providing automatically completed recommendation word by interworking a plurality of languages
KR100793378B1 (en) * 2006-06-28 2008-01-11 엔에이치엔(주) Method for comparing similarity of loan word pronunciation and recommending word and system thereof
JP2008084070A (en) * 2006-09-28 2008-04-10 Toshiba Corp Structured document retrieval device and program
JP2008140074A (en) * 2006-11-30 2008-06-19 Casio Comput Co Ltd Example sentence retrieving device and example sentence retrieval processing program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268176A (en) * 2012-06-26 2015-01-07 北京奇虎科技有限公司 Recommendation method and system based on search keyword
CN104268176B (en) * 2012-06-26 2017-10-31 北京奇虎科技有限公司 A kind of recommendation method based on search keyword
CN105027119A (en) * 2013-03-04 2015-11-04 三菱电机株式会社 Search device

Also Published As

Publication number Publication date
CN101751465B (en) 2013-05-08
KR101049358B1 (en) 2011-07-13
JP2010134922A (en) 2010-06-17
KR20100065747A (en) 2010-06-17
JP5323652B2 (en) 2013-10-23

Similar Documents

Publication Publication Date Title
CN101751465B (en) Method and system for determining similar word with input string
CN102549652B (en) Information retrieving apparatus
CN102479191B (en) Method and device for providing multi-granularity word segmentation result
US8666727B2 (en) Voice-controlled data system
US7979268B2 (en) String matching method and system and computer-readable recording medium storing the string matching method
CN102667773B (en) Search device, search method, and program
CN109145281B (en) Speech recognition method, apparatus and storage medium
CN101464896B (en) Voice fuzzy retrieval method and apparatus
US8356032B2 (en) Method, medium, and system retrieving a media file based on extracted partial keyword
CN101441649B (en) Spoken document retrieval system
US20070156404A1 (en) String matching method and system using phonetic symbols and computer-readable recording medium storing computer program for executing the string matching method
US7917352B2 (en) Language processing system
KR101945749B1 (en) Method of searching a data base, navigation device and method of generating an index structure
CN110413764B (en) Long text enterprise name recognition method based on pre-built word stock
CN101216854B (en) Computer words input method and system and its word library maintenance method and device
CN101223572A (en) System, program, and control method for speech synthesis
US20060241936A1 (en) Pronunciation specifying apparatus, pronunciation specifying method and recording medium
CN110059163B (en) Method and device for generating template, electronic equipment and computer readable medium
CN101470710A (en) Method for positioning content of multimedia file
CN109213990A (en) Feature extraction method and device and server
KR101086550B1 (en) System and method for recommendding japanese language automatically using tranformatiom of romaji
CN112765976A (en) Text similarity calculation method, device and equipment and storage medium
CN105740374A (en) Distributed memory based three-dimensional platform data fuzzy query method
CN1830022B (en) Voice response system and voice response method
CN115203206A (en) Data content searching method and device, computer equipment and readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant