WO2010001455A1 - Retrieving device and method - Google Patents

Retrieving device and method Download PDF

Info

Publication number
WO2010001455A1
WO2010001455A1 PCT/JP2008/061860 JP2008061860W WO2010001455A1 WO 2010001455 A1 WO2010001455 A1 WO 2010001455A1 JP 2008061860 W JP2008061860 W JP 2008061860W WO 2010001455 A1 WO2010001455 A1 WO 2010001455A1
Authority
WO
Grant status
Application
Patent type
Prior art keywords
search
keyword
sentences
word
unit
Prior art date
Application number
PCT/JP2008/061860
Other languages
French (fr)
Japanese (ja)
Inventor
慶郎 福田
Original Assignee
富士通株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor
    • G06F17/30861Retrieval from the Internet, e.g. browsers
    • G06F17/30864Retrieval from the Internet, e.g. browsers by querying, e.g. search engines or meta-search engines, crawling techniques, push systems

Abstract

A technology to retrieve examples of an inputted word from a predetermined region such as the Internet is provided. A retrieving device accepts at least one word inputted as a search key word and retrieves the sentences containing the search key word from a storage region. Then, it analyzes the structure of the sentences and extracts a plurality of words having a predetermined relationship with the search key word as co-occurring key words. Then, it searches the storage region for the sentences containing either one of the plurality of co-occurring key words and the search key word and counts the number of sentences per co-occurring key word. Based on the number of sentences per co-occurring key word, the retrieving device outputs the sentences containing either one of the co-occurring key words and the search key words as examples of the search key word.

Description

Search apparatus and method

The present invention relates to a technique for searches out the examples in accordance with the input word.

And when to translate the sentence into other languages, when you create a sentence using the word that does not familiar, it is often to refer to the example. In general, the example dictionary is used.

In addition, when translated into another language, not a reference to the examples, there is also a case of machine translation by entering the translation application (software) sentence.

Further, the prior art related to the present invention, for example, a technology disclosed in the following patent documents.
JP 2004-62893 JP JP 2003-6193 JP JP 2004-54361 JP

However, in the case of using the example dictionary, the number of examples is will be limited, was not necessarily required example is available.
In addition, in the case of using a translation application, to word-for-word translation along the sentence entered by the user, the sentence is or is complicated, and or an ambiguous expression, could not be a natural translation.

Therefore, to provide a technique for searches out the example of the input word from the predetermined area, such as the Internet.

The form of the present invention to solve the above problems, the search apparatus, search method, a search program, and a search server or the like.
The search device,
And a keyword receiving unit that receives at least one word of the input as a search keyword,
A first searching unit which searches out the sentences of the search keyword from the storage area,
Parses the text, and the analysis unit for a plurality extracted words as co-occurrence keywords in the word and predetermined relationship,
A second searching unit for obtaining the number of hits for each co-occurrence keyword sentences containing the respective said search keyword of the plurality of co-occurrence keyword searches from the storage area,
And an output unit for outputting a sentence of the search keyword statements that contain said search keyword and the co-occurring keywords based on the number of hits.

The output unit may output the sentence in the order of the number of hits.

The retrieval device may comprise a switching unit for switching the predetermined relationship.

The search method for solving the above-
And the step of receiving the at least one word of the input as a search keyword,
A step of searches out the sentences of the search keyword from the storage area,
A step of analyzing the syntax of the sentence, a plurality extracts a word in the word and predetermined relationship as co-occurrence keyword,
Determining a number of hits for each co-occurrence keyword sentences containing the respective said search keyword of the plurality of co-occurrence keyword searches from the storage area,
And outputting a sentence containing said search keyword and the co-occurring keyword as sentences of the search keyword on the basis of the number of hits,
The computer executes.

In the search process, may output the sentence in the order of the number of hits.

In the search method may further comprise the step of switching the predetermined relationship.

Also, the search server to solve the above problems,
A search server connected to the terminal via a network,
A keyword receiving unit for receiving at least one word of the input as a search keyword from a terminal,
A first searching unit which searches out the sentences of the search keyword from the storage area,
Parses the text, and the analysis unit for a plurality extracted words as co-occurrence keywords in the word and predetermined relationship,
A second searching unit for obtaining the number of hits for each co-occurrence keyword sentences containing the respective said search keyword of the plurality of co-occurrence keyword searches from the storage area,
And a transmission unit that transmits to the terminal as a sentence in the search keyword text containing said search keyword and the co-occurring keywords based on the number of hits.

The output unit may output the sentence in the order of the number of hits.

The search server can be provided with a switching unit for switching the predetermined relationship.

The search program for solving the above may be a program for executing the above-described searching method in a computer.
Further, as a form to solve the above problems, may be one of the search program computer recording medium recorded readable. By causing the computer reads and executes the program on this recording medium, it is possible to provide that functionality.

Here, the computer-readable recording medium, the electrical information such as data and programs, magnetic, optical, optically, mechanically or by chemical action, which can be read from the computer . Those detachable from the computer among such recording media, for example, a flexible disk, a magneto-optical disk, CD-ROM, CD-R / W, DVD, DAT, 8mm tape, a memory card or the like.

Further, a hard disc, a ROM (Read Only Memory), etc are given as the recording mediums fixed within the computer.

The present invention can provide a search out technology from a predetermined area such as the Internet the example of the input word.

Schematic diagram of the example search device of the embodiment 1 Illustration of the example search method of this embodiment 1 It shows an example of an input screen It illustrates an example of an output screen It illustrates an example of an output screen It illustrates an example of an output screen Schematic diagram of the example search device of the embodiment 2 Illustration of the example search method of this embodiment 2

<Embodiment 1>
Figure 1 is a schematic diagram of the example search device of the first embodiment.
Examples retrieval device 10, CPU (central processing unit) 11, a main memory 12, a storage unit stored with data and software for arithmetic processing (hard disk) 13, input port 14, a communication control unit (CCU: Communication Control Unit ) is a general-purpose computer with a 15 or the like.

The said input output port 14, a keyboard and a mouse, an input device such as a CD-ROM drive, and output devices such as a display device and a printer are properly connected.
CCU15 controls communications with other computers via a network.

The storage unit 13, an operating system (OS) and application software (retrieval program) is installed.

CPU11, the the OS and application programs run from the storage unit 13 properly reads through the memory 12, information inputted from the input-output port 14 and CCU 15, and arithmetically processes the information read from the storage unit 13. CPU11 is by the operation processing, and keyword receiving unit, the first searching unit, the analysis unit, the second search unit, an output unit, also functions as a switching unit.

CPU11 as the keyword receiving unit receives an input of at least one word as a search keyword.

CPU11 as the first searching unit, the searches out the sentences of the search keyword from the storage area.

CPU11 as analysis unit analyzes the syntax of the sentence, a plurality extracts a word in the word and predetermined relationship as co-occurrence keyword.

CPU11 as the second search unit, obtains the number of hits a sentence including a respective said search keyword of the plurality of co-occurrence keyword searches from the memory area for each co-occurrence keyword.

CPU11 as the output unit outputs the sentence containing said search keyword and the co-occurring keywords based on the number of hits as a sentence in the search keyword.

CPU11 as switching unit switches in accordance with the relationship between the search keywords and the co-occurrence keywords obtained by analyzing portion to input from the user.

Incidentally, the example search apparatus 10 of the present embodiment 1, although running the search program is a device that implements the functionality of the respective units by software, and the keyword reception unit, the first searching unit, the analysis unit, the second search unit, may be an electronic device that is composed of an electronic circuit designed for an output unit (hardware).

Next, examples retrieval device 10 of the configuration for example search method performed according to example search program will be described with reference to FIG.

First, when receiving an instruction for instructing or search start the activation of the example search program by the user, the keyword receiving unit, and displays on the display device an input screen 31 shown in FIG. 3 (step 1, also such as S1 referred to).

User to enter at least one word in the input field 32 of the input screen 31 by operating the keyboard or pointing device, selecting the search button 33, the keyword receiving unit receives the input word as a search keyword, the memory 12 It is stored in (S2).

The first search unit, the Web search and searches out the Web page containing the search keyword (S3). For example the first searching unit transmits the search keywords to connect to a search engine on the network (search server) via the CCU 15, the text of the web page or search keywords and matched portions of the search result from the search engine to receive. Alternatively, receiving the URL of the web page including the search keyword as a search result from the search engine, the corresponding web page to obtain (download) on the basis of the URL. The present invention is not limited to the method of using the search engine to access the database server and web server, the search keyword may be a direct search method of the web page or file that contains the.

Analysis unit, searched out a web page or a predetermined number of sentences including the search keyword from the file (x matter) and stores the extracted in the memory 12 (S4). Then, the analysis unit sequentially reads the extracted sentences, parses and obtains the part of speech and dependency relationships morphemes (S5). For example, divided into morphemes by morphological analysis, to identify parts of speech from the comparison and sentence position of each morpheme and part of speech dictionary, receives relates to the search keyword by predicate argument structure analysis as predicates in the case of a nominative relationship the seek. Incidentally, analysis technique syntax may be used any known technique.

Then, a plurality extracts a word in the search keyword and the predetermined relationship on the basis of the analysis result as a co-occurrence keyword (S6).

Here, the predetermined relationship, for example, verbs and nouns, nouns and adjectives, verbs and adverbs, prepositions and nouns, articles and nouns, adjacent word, subject and predicate, such as conjunctions and the like as a node (clause).

In the present embodiment, nouns and verbs, nouns and adjectives, the plurality of relations set in advance such nouns and prepositions, respectively extracts a plurality of co-occurrence keyword.

The second search unit Next, the web search to include the search keywords and co-occurring keyword as a search condition (S7). The web search, as in the first search unit may be a direct search method for Web servers and data servers in system using a search engine.

Incidentally, the second search unit, the corresponding number of texts that correspond to the search condition along with (hereinafter simply referred to as hits), applicable to sentences predetermined number (y matter) obtains. The second search unit, including both the co-occurrence keywords and the search keywords from the text of the y matter (the co-occurrence) extracts a sentence. It should be noted that, if the statement that the both keyword is the co-occurrence of the threshold value (z matter) below, may acquire the sentence was appropriate to further the search condition.

The second search unit repeats the search in step 7 for all combinations of the plurality calculated cooccurrence keyword and a search keyword in Step 6 (S8). Thus, in this example, to previously obtain a plurality of keywords beforehand in Step 6, was repeated search steps 7, in a case where the predetermined relationship is also search for different co-occurrence keyword, the process returns to step 5 Syntax it may try again from the analysis.

Search for all combinations at step 8 when a completed, the output unit outputs the search result (S9). For example, as shown in FIG. 4, on the basis of the number of hits of each co-occurring keywords, and displays the co-occurrence keyword Hits.

Also, when the user selects a desired co-occurrence keyword by operating the keyboard or the pointing device, the output unit as a sentence for searching the specified term including a selected co-occurrence keyword as shown in FIG. 5 and Searches indicate. In Figure 5 "Require" indicates the selected example.

In the example of FIG. 5 showed an example sentence only one may list all sentences by a predetermined number or extraction. Display order of the example sentence based on the same co-occurrence keyword order and was retrieved in step 7, updating chronological sentences sentence is included, example sentence length sequence, to exactly match among the extracted example sentence certain it may be an order number of statements that portion containing both keywords match often. In addition, it may be a combination of these.

Further, in the display screen of FIG. 5 shows an example sentence in a case where the search keyword and the co-occurrence keyword is related nouns and verbs, when the user has to switch the relationship, switching unit and the manipulation to display the example sentences may become another relationship to the output unit detects and (S10,11).

For example, when the user selects a tab 34 for the relationship selected by operating the keywords or pointing device to identify the switching section is selected by detecting the selecting operation tab, among the extracted plurality of relationships in step 6 It is output to the output means the search results for the selected relationship. Figure 6 shows an example of a case where the relationship of nouns and adjectives are selected.

The switching unit, to previously obtain a search result for advance more relationships, but was displayed on the output unit switches the selected relationship, instead of obtaining the pre-many relationship, the user has selected another relationship in the case, it may be repeated a search switched to this relationship. For example, in the search result screen in FIG. 5, when the tab 34 to switch the relationship is selected, you may search the relationship selected by repeating the subsequent processing returns to step 6. Further, in the input screen 31 shown in FIG. 3, the search provided to the column for inputting a desired relationship with keywords, and notifies the analysis unit identifies the switching unit is input relation, co-occurrence for the relationship in Step 6 Keyword it may be obtained.

As described above, according to this embodiment, it is possible to present the example sentence by web search, and terminology, a new word, spoken, also available in slang. Moreover, sentences to be presented by the search apparatus (search program), since extracts from the actual text, not an unnatural sentence as a translation application translation results.
Further, according to this embodiment, since the output examples based on the number of hits can present general examples.

<Embodiment 2>
Figure 7 is a schematic diagram of the example search server according to the second embodiment.

Examples search server 20 of the second embodiment receives an input of a search keyword via the network from the user terminal in comparison with the example search apparatus 10 of Embodiment 1, configured to send the search results to the user terminal via a network different, other configurations are the same. Therefore, in the present embodiment 2, the same elements as the first embodiment discussed above will be omitted described again.

CPU11 of the example search server 20, the processing according to the OS and application programs, and keywords receiving unit described above, the first searching unit, the analysis unit, the second search unit, an output unit, in addition to the switching unit, even as a server unit Function.

CPU11 as a server unit establishes a connection with the user terminal 40 via a network such as the Internet, and transmits the data to the receiving and the user terminal 40 of the data from the user terminal 40. In the present embodiment, the server unit communicates with the user terminal 40 using TCP / IP, which functions as a so-called web server that provides web pages. The communication method between the user terminal 40, known may be any method is not limited thereto.

Figure 8 is an explanatory view of the example search method example search server 20 of the embodiment 2 is performed according to the example search program.

First, the user when the user accesses the example search server 20 by specifying the URL of the example search page, the server unit establishes a connection with the user terminal 40, HTML data representing the input screen 31 of FIG. 3, i.e. the web page to the terminal 40 (S21).

The user terminal 40 which has received the web page, display displays an input screen 31 on the device, if the user selects at least one word type the the search button 33 in the input field 32, examples of the input word Search sent to the server 20.

The term server unit of the example search server 20 has received from the terminal 40 notifies the keyword receiving unit, the keyword receiving unit receives the input word as a search keyword, is stored in the memory 12 (S22).

The first search unit, the Web search and searches out the Web page containing the search keyword (S23).

Analysis unit, searched out a web page or a predetermined number of sentences including the search keyword from the file (x matter) and stores the extracted in the memory 12 (S24). Then, the analysis unit sequentially reads the extracted sentences, parses and obtains the part of speech and dependency relationships morphemes (S25).

Then, a plurality extracts a word in the search keyword and the predetermined relationship on the basis of the analysis result as a co-occurrence keyword (S26).

The second search unit Next, the web search to include the search keywords and co-occurring keyword as a search condition (S27).

The second search unit repeats the search in step 27 for all combinations of the plurality calculated cooccurrence keyword and a search keyword in step 26 (S28).

Search for all combinations at step 8 when a completed, the output unit outputs the search result (S29). For example, as shown in FIG. 4, on the basis of the number of hits of each co-occurring keywords, and transmits the web page with the co-occurrence keyword Hits to the user terminal 40.

The user terminal 40 receives and displays the web page, if the user selects a desired co-occurrence keywords in the page requests the sentences of co electromotive keyword in the example search server 20.

Output unit according to the request, the web page described with the selected co-occurrence keyword example sentence search keyword search statements that contain a keyword, as shown in FIG. 5 via the server section transmits to the terminal 40.

In the example of FIG. 5 showed an example sentence only one may list all sentences by a predetermined number or extraction. Display order of the example sentence based on the same co-occurrence keyword order and was retrieved in step 7, updating chronological sentences sentence is included, example sentence length sequence, to exactly match among the extracted example sentence certain it may be an order number of statements that portion containing both keywords match often. In addition, it may be a combination of these.

Further, in the display screen of FIG. 5 shows an example sentence in a case where the search keyword and the co-occurrence keyword is related nouns and verbs, the user terminal 40, the user has selected the other relationships in the page case, to request a switch to the example search server 20 in the relationship.

Switching unit in response to a request switching of the relationship notifies the selected relation to the analysis unit, to repeat steps 26 and subsequent steps (S30,31).
If the user performs an operation to switch the relationship according to the selection back to, to display the example sentence in a case where the switching unit is another relationship to the output section detects the the manipulation.

As described above, according to the example search server of the embodiment 2, a suitable example sentence of the input word can be provided via a network.

Claims (10)

  1. And a keyword receiving unit that receives at least one word of the input as a search keyword,
    A first searching unit which searches out the sentences of the search keyword from the storage area,
    Parses the text, and the analysis unit for a plurality extracted words as co-occurrence keywords in the word and predetermined relationship,
    A second searching unit for obtaining the number of hits for each co-occurrence keyword sentences containing the respective said search keyword of the plurality of co-occurrence keyword searches from the storage area,
    An output unit for outputting a sentence containing said search keyword and the co-occurring keywords based on the number of hits as a sentence of the search keyword,
    Search device comprising a.
  2. Search device according to claim 1 wherein the output unit, which outputs the sentence in the order of the number of hits.
  3. Search device according to claim 1 or 2 comprising a switching unit for switching the predetermined relationship.
  4. And the step of receiving the at least one word of the input as a search keyword,
    A step of searches out the sentences of the search keyword from the storage area,
    A step of analyzing the syntax of the sentence, a plurality extracts a word in the word and predetermined relationship as co-occurrence keyword,
    Determining a number of hits for each co-occurrence keyword sentences containing the respective said search keyword of the plurality of co-occurrence keyword searches from the storage area,
    And outputting a sentence containing said search keyword and the co-occurring keyword as sentences of the search keyword on the basis of the number of hits,
    Search method executed by a computer.
  5. Searching method according to claim 4 for outputting the sentences in the order of the number of hits.
  6. Searching method according to claim 4 or 5, further perform the step of switching said predetermined relationship.
  7. And the step of receiving the at least one word of the input as a search keyword,
    A step of searches out the sentences of the search keyword from the storage area,
    A step of analyzing the syntax of the sentence, a plurality extracts a word in the word and predetermined relationship as co-occurrence keyword,
    Determining a number of hits for each co-occurrence keyword sentences containing the respective said search keyword of the plurality of co-occurrence keyword searches from the storage area,
    And outputting a sentence containing said search keyword and the co-occurring keyword as sentences of the search keyword on the basis of the number of hits,
    Search program to be executed by the computer.
  8. Search program according to claim 7 for outputting the sentences in the order of the number of hits.
  9. Search program according to claim 7 or 8, further perform the step of switching said predetermined relationship.
  10. A search server connected to the terminal via a network,
    A keyword receiving unit for receiving at least one word of the input as a search keyword from a terminal,
    A first searching unit which searches out the sentences of the search keyword from the storage area,
    Parses the text, and the analysis unit for a plurality extracted words as co-occurrence keywords in the word and predetermined relationship,
    A second searching unit for obtaining the number of hits for each co-occurrence keyword sentences containing the respective said search keyword of the plurality of co-occurrence keyword searches from the storage area,
    A transmission unit for transmitting a statement containing said search keyword and the co-occurring keywords based on the number of hits to the terminal as a sentence of the search keyword,
    Search server with a.
PCT/JP2008/061860 2008-06-30 2008-06-30 Retrieving device and method WO2010001455A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2008/061860 WO2010001455A1 (en) 2008-06-30 2008-06-30 Retrieving device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2008/061860 WO2010001455A1 (en) 2008-06-30 2008-06-30 Retrieving device and method

Publications (1)

Publication Number Publication Date
WO2010001455A1 true true WO2010001455A1 (en) 2010-01-07

Family

ID=41465571

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2008/061860 WO2010001455A1 (en) 2008-06-30 2008-06-30 Retrieving device and method

Country Status (1)

Country Link
WO (1) WO2010001455A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013200862A (en) * 2012-03-23 2013-10-03 Nec (China) Co Ltd Method and device for diversifying query results

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005031950A (en) * 2003-07-11 2005-02-03 Canon Inc Information retrieval device, information retrieval method, and program
JP2007213157A (en) * 2006-02-07 2007-08-23 Just Syst Corp Example sentence retrieval device and example sentence retrieval method
JP2007304895A (en) * 2006-05-12 2007-11-22 Nobuhiko Ido Example sentence creating system using search engine, and web site construction method using practice exercise related to language as content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005031950A (en) * 2003-07-11 2005-02-03 Canon Inc Information retrieval device, information retrieval method, and program
JP2007213157A (en) * 2006-02-07 2007-08-23 Just Syst Corp Example sentence retrieval device and example sentence retrieval method
JP2007304895A (en) * 2006-05-12 2007-11-22 Nobuhiko Ido Example sentence creating system using search engine, and web site construction method using practice exercise related to language as content

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013200862A (en) * 2012-03-23 2013-10-03 Nec (China) Co Ltd Method and device for diversifying query results

Similar Documents

Publication Publication Date Title
US5528491A (en) Apparatus and method for automated natural language translation
US8041557B2 (en) Word translation device, translation method, and computer readable medium
US6539348B1 (en) Systems and methods for parsing a natural language sentence
US7194455B2 (en) Method and system for retrieving confirming sentences
US20040059564A1 (en) Method and system for retrieving hint sentences using expanded queries
US20080091408A1 (en) Navigation system for text
US20060195435A1 (en) System and method for providing query assistance
US20050010421A1 (en) Machine translation device, method of processing data, and program
US6396951B1 (en) Document-based query data for information retrieval
US20020111941A1 (en) Apparatus and method for information retrieval
US20020111792A1 (en) Document storage, retrieval and search systems and methods
US20050138556A1 (en) Creation of normalized summaries using common domain models for input text analysis and output text generation
US6473729B1 (en) Word phrase translation using a phrase index
US6167370A (en) Document semantic analysis/selection with knowledge creativity capability utilizing subject-action-object (SAO) structures
US20060206472A1 (en) Question answering system, data search method, and computer program
US20060122997A1 (en) System and method for text searching using weighted keywords
US20100235165A1 (en) System and method for automatic semantic labeling of natural language texts
US8135575B1 (en) Cross-lingual indexing and information retrieval
US7293015B2 (en) Method and system for detecting user intentions in retrieval of hint sentences
US7797303B2 (en) Natural language processing for developing queries
US20030101044A1 (en) Word, expression, and sentence translation management tool
US6269189B1 (en) Finding selected character strings in text and providing information relating to the selected character strings
US6393389B1 (en) Using ranked translation choices to obtain sequences indicating meaning of multi-token expressions
US20060204945A1 (en) Question answering system, data search method, and computer program
US6470306B1 (en) Automated translation of annotated text based on the determination of locations for inserting annotation tokens and linked ending, end-of-sentence or language tokens

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08777711

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct app. not ent. europ. phase

Ref document number: 08777711

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: JP