WO2006008919A1 - 情報処理装置およびプログラム - Google Patents
情報処理装置およびプログラム Download PDFInfo
- Publication number
- WO2006008919A1 WO2006008919A1 PCT/JP2005/011786 JP2005011786W WO2006008919A1 WO 2006008919 A1 WO2006008919 A1 WO 2006008919A1 JP 2005011786 W JP2005011786 W JP 2005011786W WO 2006008919 A1 WO2006008919 A1 WO 2006008919A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- search
- target
- target information
- original
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Definitions
- the present invention relates to an information processing apparatus, a program, and the like for efficiently searching for store target information described on a homepage on the WEB, for example.
- a search is performed by evaluating a relevance between home pages according to a no-per-link defined between home pages or a user search request.
- a search method including a step of outputting a search result as a homepage and a link series according to the inquiry content specified by the user terminal.
- Patent Document 1 JP 2003-203089 (first page, Fig. 1 etc.)
- Patent Document 2 JP 2001-344283 (first page, Fig. 1 etc.)
- Non-patent document 1 Gurunavi website, Internet URL: http: ⁇ www.gnavi.co.jp
- the information processing apparatus includes a receiving unit that receives search information that is information for specifying two or more search targets, and information relating to evaluation of two or more search targets specified by the search information.
- An information processing apparatus comprising: a target information acquisition unit that acquires two or more original information powers of two or more target information, and an output unit that outputs the target information of the two or more search targets.
- the information processing apparatus is the output unit of the information processing apparatus according to the first aspect of the invention.
- summarizing means for summarizing the two or more pieces of target information, obtaining two or more pieces of summary information, and summary information output means for outputting the two or more pieces of summary information.
- the summarizing means in the information processing apparatus of the second invention includes a word extracting means for extracting a word from the target information, and for each word extracted by the word extracting means. , A word appearance count for each search target for calculating the appearance count for each search target, and summary information of each search target from the target information based on the appearance count Summarization acquisition means for acquiring.
- the summarizing means includes a word total appearance number calculating means for calculating the number of appearances in all target information of all search targets for each word extracted by the word extraction means, and the word for each search target.
- a ratio calculating means for calculating a ratio between the number of appearances for each search target calculated by the appearance count calculating means and the number of appearances in all target information calculated by the word total appearance count calculating means, and the ratio calculating means It further comprises word list creation means for creating a word list for each search object, arranged from words with a high ratio, and the summary acquisition means of the information processing device of the third invention is based on the word list for each search object. It is preferable to obtain summary information of each search target from the target information.
- the target information acquisition unit also acquires link information that is information indicating a location of information in which the target information is described
- the reception unit includes the summary A summary information selection instruction that is an instruction for information is also received, and when the reception unit receives a summary information selection instruction, target information that is a source of summary information corresponding to the summary information selection instruction is described.
- the apparatus further includes an original information acquisition unit that acquires the original information, and the output unit includes an original information output unit that also outputs the original information acquired by the original information acquisition unit. It is the structure which also comprises.
- the powerful configuration is suitable for users who are not satisfied with the summary information alone because the original information can be easily acquired.
- the target information acquisition unit also acquires link information that is information indicating a location of information in which the target information is described
- the output unit includes the link Link symbol output means for outputting link symbol information that is information corresponding to the information
- the accepting unit also accepts a link symbol selection instruction that is an instruction for the link symbol information
- an original information acquisition unit that acquires original information, which is information describing target information indicated by the link information, based on link information corresponding to the instructed link symbol information
- the output unit outputs original information obtained by the original information acquisition unit.
- the information output means is also provided.
- a powerful configuration is preferable because a user who wants to acquire original information without reading the summary can easily acquire the original information.
- the output unit includes a ranking determination unit that ranks the two or more search targets based on the target information of the two or more search targets. Further, based on the ranking of the ranking determination means, the target information or Z and summary information of the two or more search targets are output.
- the search target information such as stores that the user wants is ranked and output. Therefore, it is possible to output information that the user desires in an easy-to-see form.
- the ranking determination means determines whether the number of characters of the target information to be searched for two or more or Z and whether the target information includes telephone number information. , Or Z and target information are listed, and based on the page ranking of the original information! /, The two or more search targets are ranked.
- the search information includes search point information that is information related to a search point, and target group information that is information for specifying a search target group.
- Behavior identification is behavioral characteristics of urban people who act mainly at the station, or behavioral characteristics that if the restaurant is delicious, it will walk about a few minutes on foot from the station.
- the target information acquisition unit includes search target name information that is information indicating the name of the search target and telephone number information that is information indicating the telephone number of the search target.
- a search target information group storage means for storing and searching for a search target information group having at least one search target information having address information which is information indicating a search target address, and based on the search information.
- the search target information acquisition means for acquiring part or all of the search target information from the search target information group storage means, and the search target information acquired by the search target information acquisition means Specified by the search information based on part or all of
- a target information acquisition means for acquiring target information that is information relating to two or more search target targets.
- the search information further includes search range information that is information for specifying a search range from a search point indicated by the search point information, and the search target
- the information acquisition means selects one or more search target information from the search target information group storage means based on the search point information included in the search information, the address information of the search target, and the search range information. Then, part or all of the selected search target information is acquired.
- the search target information acquisition means takes the longitude and latitude of the search point information included in the search information and the longitude and latitude of the address information of the search target.
- the distance between the search point indicated by the search point information and the search target point indicated by the address information is calculated from the two longitudes and latitudes, and the condition indicated by the search range information is calculated based on the distance.
- one or more search target information is selected from the search target information group storage means, and a part or all of the selected search target information is acquired.
- station power is also search information such as 10 minutes on foot, it is possible to obtain search target information such as stores that can be targeted fairly accurately.
- the source information which is information from which the target information is acquired, is information with a hierarchical tag
- the target information acquisition unit includes: When multiple pieces of target information are acquired from a single source information, the same hierarchical level information is acquired.
- the target information can be obtained at high speed.
- the target information acquisition unit searches for one piece of original information when acquiring a plurality of pieces of target information from one piece of original information, Address information
- the hierarchy level including the location information that is one or more of the information and the postal code is determined, and the information of the determined hierarchy level is acquired.
- the target information can be obtained accurately and at high speed.
- objective information such as a store can be appropriately acquired.
- FIG. 1 is a conceptual diagram of an information processing system in the present embodiment.
- the information processing system includes an information terminal 11, an information processing device 12, and an information storage device 13.
- the information terminal 11 is a so-called client terminal, for example, a terminal that outputs target information such as a restaurant.
- the target information is information about two or more search targets (restaurants, etc.) specified by the search information.
- the target information to be used is, for example, information related to the evaluation of restaurants, etc. (delicious, fashionable, bad, etc.).
- the information processing device 12 is a device that acquires target information in response to a request from the information terminal 11 and transmits the target information to the information terminal 11.
- the information processing device 12 is, for example, a server device of an application service provider equipped with a so-called search engine.
- the information storage device 13 is a device that stores target information such as a restaurant, for example.
- the information storage device 13 is, for example, information on evaluations of restaurants and the like described in information portals, target information indicating evaluations of restaurants and the like written by individual users, and information such as diaries recorded by individual users. Is stored.
- information that contains target information is called source information.
- the original information is, for example, the so-called WEB home page
- FIG. 2 is a block diagram of the information processing system in the present embodiment.
- the information terminal 11 includes a user input reception unit 1101, a request transmission unit 1102, an information reception unit 1103, and an information output unit 1 104.
- the information processing apparatus 12 includes a reception unit 1201, a target information acquisition unit 1202, and original information An acquisition unit 1203, an output unit 1204, an original information reception unit 1205, an original information storage unit 1206, and an original information storage unit 1207 are provided.
- the target information acquisition unit 1202 includes a search target information group storage unit 12021, a search target information acquisition unit 12022, and a target information acquisition unit 12023.
- the output unit 1204 includes ranking determination means 12041, target information output means 12042, summary means 1204, summary information output means 12044, original information output means 12045, and link symbol output means 12046.
- the summarizing means 12043 includes a word extracting means 120431, a word appearance count calculating means 120432 for each search target, a total word appearance count calculating means 120433, a ratio calculating means 120434, and a word list creating means 120435.
- the information storage device 13 includes an original information storage unit 1301 and an original information transmission unit 1302.
- the user input receiving unit 1101 receives search information that is information for specifying two or more search targets from the user.
- Search targets include, for example, restaurants, English conversation schools, and travel destinations.
- the search information is information for specifying information desired by the user, which is a so-called search key.
- Search information includes, for example, search point information that is information about the search point (station name such as Shibuya Station, address, area specified by phone number, etc.) and target group information that is information that specifies the group to be searched (Such as ramen [information to identify ramen restaurants and groups to be searched], Italian restaurants, and Chinese cuisine [information to identify groups to be searched as Chinese restaurants]).
- the search information may further include, for example, search range information (within 5 minutes on foot, within 1 km, etc.) that is information for specifying the search range of the search point power indicated by the search point information. Further, the search information may include other search keys. Further, the user input receiving unit 1101 also receives a summary information selection instruction that is an instruction for the output summary information. Summary information is information that summarizes the target information. There are various methods for summarizing target information. An example of the target information summarization method will be described later. The user input receiving unit 1101 also receives a link symbol selection instruction that is an instruction for link symbol information. The link symbol information is information corresponding to link information that is information indicating the location of the original information in which the target information is described. A specific example of link symbol information will be described later.
- the user input receiving unit 1101 receives various instructions and inputs that are user-friendly.
- the search information input means may be anything such as a numeric keypad, keyboard, mouse or menu screen.
- Receive user input The attachment unit 1101 can be realized by a device driver for input means such as a numeric keypad or a keyboard, or control software for a menu screen.
- the request transmission unit 1102 transmits request information having the search information to the information processing device 12 based on the search information received by the user input reception unit 1101.
- the request information is information indicating a request for acquiring target information to be searched corresponding to the search information.
- the data structure of request information does not matter.
- the request information usually includes information for specifying the information processing apparatus 12.
- the information specifying the information processing apparatus 12 is, for example, a URL or URI indicating a folder included in the target information in the information processing apparatus 12, or an IP address of the information processing apparatus 12.
- Request transmission unit 1102 usually includes wireless or wired communication means, but may be broadcast means instead of communication means.
- Information receiving section 1103 receives information such as target information and original information from information processing device 12 based on the transmission of request information in request transmitting section 1102.
- the information receiving unit 1103 is usually realized by a wireless or wired communication means, but can also be realized by a broadcast receiving means.
- the information output unit 1104 outputs information such as target information and original information received by the information receiving unit 1103.
- the output here is a concept that mainly includes display on a display, printing on a printer, and sound output, but transmission to an external device.
- the information output unit 1104 may or may not include an output device such as a display or a speaker.
- the information output unit 1104 can be realized by output device driver software, or output device driver software and output device. Note that the processing of the user input reception unit 1101, the request transmission unit 1102, the information reception unit 1103, the information output unit 1104, and the like in the information terminal 11 can be realized by, for example, processing of V or a so-called WEB browser.
- the accepting unit 1201 accepts an instruction to acquire search information that is information for specifying two or more search targets and other information.
- the search information is included in the request information.
- the other information acquisition instruction is, for example, information including a URL.
- information specified by the URL for example, a home page
- the reception unit 1201 normally receives request information. “Accept” here usually means reception from the information terminal 11. However, in general, the user It may also include accepting manually entered information and reading the recording medium power.
- the reception unit 1201 is usually realized by a wireless or wired communication means, but may be realized by a means for receiving a broadcast.
- the reception unit 1201 can also be realized by a device driver for input means such as a numeric keypad and a keyboard, control software for a menu screen, and the like.
- the target information acquisition unit 1202 acquires target information that is information regarding two or more search targets specified by the search information received by the reception unit 1201. Two or more information storage devices 13 or the original information storage unit 1207 of the information processing device 12 may be used as the destination for acquiring the target information.
- the target information acquisition unit 1202 can usually be realized by an MPU, a memory, or the like.
- the processing procedure of the target information acquisition unit 1202 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the target information acquisition unit 1202 can be realized with a configuration including a wireless or wired communication unit.
- the source information acquisition unit 1203 links the source information that describes target information that is the basis of the summary target information corresponding to the summary target information selection instruction.
- the original information is acquired based on the information.
- the receiving unit 1201 receives the link symbol selection instruction
- the original information acquisition unit 1203 is based on the link information corresponding to the link symbol information corresponding to the link symbol selection instruction! /
- the original information which is information in which the target information indicated by is described is acquired.
- the link information is, for example, a URL or URI indicating the location of the original information.
- the original information acquisition unit 1203 acquires a homepage corresponding to, for example, a URL or a URI.
- the original information acquisition unit 1203 can usually also realize an MPU, a memory and the like.
- the processing procedure of the original information acquisition unit 1203 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the original information acquisition unit 1203 can be realized with a configuration that includes wireless or wired communication means.
- the output unit 1204 outputs target information of two or more search targets.
- the output unit 1204 may output two or more pieces of summary target information, which is information obtained by summarizing two or more pieces of search target information.
- the output unit 1204 has two or more search targets as described below. Information or Z and summary target information may be ranked and output.
- the output unit 1204 may also output the original information acquired by the original information acquisition unit 1203.
- the output unit 1204 may also output link symbol information that is information corresponding to the link information.
- the term “output” is a concept including a display on a force display, printing on a printer, sound output, etc., which are usually transmitted to the information terminal 11.
- the output unit 1204 is usually realized by software constituting information to be transmitted and wireless or wired communication means, but may be broadcast means instead of the communication means. Further, the output unit 1204 can be realized by driver software of an output device or driver software of an output device and an output device. The output unit 1204 may or may not include an output device.
- the original information receiving unit 1205 receives original information from the information storage device 13.
- Original information is information including target information.
- the original information is, for example, information with layered tags such as HTML, compact HTML (hereinafter referred to as “C-HTML”) or XML.
- the original information is, for example, a so-called home page.
- the original information receiving unit 1205 automatically acquires the original information from a large number of information storage devices 13 at a predetermined time.
- the original information receiving unit 1205 is usually realized by a wireless or wired communication means, but may be realized by a means for receiving a broadcast.
- the original information accumulation unit 1206 accumulates the original information received by the original information reception unit 1205 in the original information storage unit 1 207.
- the original information storage unit 1206 can usually be realized by an MPU, a memory, or the like.
- the processing procedure of the original information storage unit 1206 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the original information storage unit 1207 stores original information.
- the original information storage unit 1207 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium.
- Search target information group storage means 12021 indicates search target name information that is information indicating a search target name, telephone number information that is information indicating a search target telephone number, and a search target address. Stores a search target information group having one or more search target information having address information as information.
- the search target information is information indicating a zip code. You may have postal code information.
- the search target information group is, for example, so-called yellow page information.
- the search target information group storage unit 12021 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium.
- the search target information acquisition unit 12022 acquires part or all of the search target information from the search target information group storage unit 12021 based on the search information.
- Search target information acquisition means 12022 acquires, for example, a part or all of the search target information having information on the type of business that the search information has (for example, “English conversation school”, “Ramen shop”, etc.).
- the to-be-searched object information acquisition means 12022 can be usually realized by an MPU, memory, or the like.
- the processing procedure of the search target information acquisition means 12022 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the target information acquisition means 12023 is an object that is information on two or more search targets specified by the search information based on part or all of the search target information acquired by the search target information acquisition means 12022. Get information.
- the target information acquisition means 12023 searches for one original information when one original information ability also acquires a plurality of target information, and is a place that is one or more of telephone number information, address information, and postal code.
- a hierarchy level having a predetermined relationship with a hierarchy level including information is determined, and information on the determined hierarchy level is acquired.
- the target information acquisition unit 12023 acquires, for example, information on a predetermined block including two or more pieces of information of telephone number information, address information, and zip code information acquired by the search target information acquisition unit 12022.
- the predetermined lump information may be one paragraph of information, one page of information, or information with a predetermined tag (for example, “ ⁇ tr>”).
- the target information acquisition means 12023 can usually also be implemented with an MPU or memory power.
- the processing procedure of the target information acquisition means 12023 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the ranking determining unit 12041 ranks two or more search targets based on the target information of two or more search targets. For example, the ranking determination unit 12041 may rank two or more search targets based on the number of characters of all target information, and based on the number of all target information (the number of articles described). Two or more search targets may be ranked. Rankin The determination means 12041 may determine whether or not ranking is based on the page ranking of the original information in which whether or not the telephone number information is included in the target information of two or more search targets or the target information is described. good. For example, it is assumed that the page ranking of the original information (for example, WEB page) is held in advance. There are various known methods for determining the page ranking of a web page.
- the known various methods include a method of ranking according to the number of links to other home pages.
- the page ranking of the WEB page may be determined by a well-known method. Any other algorithm that ranks two or more search targets can be used.
- the ranking determining unit 12041 can usually be realized by an MPU, a memory, or the like.
- the processing procedure of the ranking determining means 12041 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the target information output means 12042 outputs the target information of two or more search targets based on the ranking of the ranking determination means 12041.
- the target information output means 12042 normally configures and outputs information so that information about the search target ranked higher (such as target information, summary information, and search target name) is also presented to the user.
- the output is usually a transmission to the information terminal 11, but is a concept including display on a display, printing on a printer, sound output, and the like.
- the summarizing means 12043 summarizes two or more pieces of target information and acquires two or more pieces of summary information.
- the summarizing means 12043 obtains the first 50 characters of the target information and uses a powerful character string as summary information.
- the summarizing means 12043 can be usually realized by an MPU, a memory or the like.
- the processing procedure of the summarizing means 12043 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the summary information output means 12044 outputs two or more pieces of summary information acquired by the summary means 12043.
- Output is a concept that includes display on a display, printing on a printer, sound output, transmission to an external device, and the like. Here, normally, the output is transmission to the information terminal 11.
- the summary information output unit 12044 is realized by, for example, a wireless or wired communication unit.
- the original information output unit 12045 outputs the original information acquired by the original information acquisition unit 1203.
- Output is a concept that includes display on a display, printing on a printer, sound output, transmission to an external device, and the like. Here, normally, output is transmission to the information terminal 11.
- the original information output unit 12045 is realized by, for example, a wireless or wired communication unit.
- Link symbol output means 12046 outputs link symbol information which is information corresponding to link information.
- Link symbol information is, for example, an image indicating that it is a link source for evaluation information such as restaurants, etc. (Icon) etc.
- the concept of output is as described above.
- the link symbol output means 12046 is realized by, for example, a wireless or wired communication means.
- the word extraction means 120431 extracts words from the target information. There are various algorithms for extracting words. For example, the word extraction unit 120431 holds a dictionary and extracts words registered in the dictionary. Further, the word extraction means 120431 may extract nouns. Since this technique is a well-known language processing technique, detailed description is omitted.
- the word extraction means 120431 can usually also be implemented with an MPU or memory power.
- the processing procedure of the word extraction means 120431 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- Word appearance count calculation means 120432 for each search target calculates the number of appearances for each search target in each word extracted by the word extraction means 120431.
- the word appearance count calculating means 120432 for each search target can usually be realized by an MPU, a memory and the like.
- the processing procedure of the word appearance count calculating means 120432 for each search target is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- Word total appearance count calculation means 120433 calculates the number of appearances in all target information of all search targets for each word extracted by word extraction means 120431.
- the word total appearance count calculating means 120433 can usually be realized by an MPU, a memory and the like.
- the processing procedure of the word total appearance count calculation means 120433 is usually realized by software, and the software It is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the ratio calculation means 120434 is the ratio between the number of appearances for each search target calculated by the word appearance count calculation means 120432 for each search target and the number of appearances in all target information calculated by the word total appearance count calculation means 120433. Is calculated.
- the ratio calculating means 120434 can be usually realized from MPU memory or the like.
- the processing procedure of the ratio calculating means 120434 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the word list creation unit 120435 creates a word list for each search target by arranging words having a high rate calculated by the rate calculation unit 120434.
- the word list creation means 120435 can usually be implemented with an MPU or memory power.
- the processing procedure of the word list creation means 120435 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- each output means held by the output unit 1204 is normally realized physically by one means.
- the output unit 1204 normally configures one file (for example, a file described in HTML) from information to be output such as summary information and link symbol information, and transmits the file to the information terminal 11. Further, when the output unit 1204 outputs one or more pieces of information of target information, summary information, original information, and link symbol information, the output unit 1204 synthesizes, for example, useful information. Process to configure the file. More specifically, the output unit 1204 is a web page that includes one or more information items of target information, summary information, source information, and link symbol information of two or more search targets (for example, two or more restaurants). Configure and output.
- the original information storage unit 1301 stores one or more pieces of original information.
- the original information includes target information to be searched.
- the original information is a so-called page described in HTML, C HTML, XML, or the like.
- the original information storage unit 1301 can also be realized by a force volatile recording medium, which is preferably a non-volatile recording medium.
- the original information transmission unit 1302 transmits the original information in the original information storage unit 1301.
- the trigger and timing at which the original information transmission unit 1302 transmits the original information is not limited.
- the original information transmission unit 1302 For example, the original information is transmitted in response to a request from the information processing apparatus 12.
- the original information transmission unit 1302 is usually realized by a wireless or wired communication means, but may be realized by a broadcasting means.
- the operation of the information processing system will be described. First, the operation of the information terminal 11 will be described using the flowchart of FIG.
- Step S301 The user input receiving unit 1101 determines whether or not an input from the user has been received. If an input is accepted, the process goes to step S302. If no input is accepted, the process returns to step S301.
- Step S302 The user input receiving unit 1101 determines whether or not the input received in step S301 is search information. If it is search information, it goes to step S303, and if it is not search information, it jumps to step S307.
- Step S303 The request transmission unit 1102 configures request information based on the search information received in step S301.
- the request information is information for requesting acquisition of target information to be searched.
- Step S304 The request transmission unit 1102 transmits the information configured in Step S303, Step S308, or Step S310.
- Step S305 The information receiving unit 1103 determines whether or not the information such as the target information is received from the information processing apparatus 12. If information is received, the process proceeds to step S306, and if information is not received, the process returns to step S305.
- Step S306 The information output unit 1104 outputs the information received in step S305.
- the powerful information output unit 1104 interprets a file described in HTML received by the information receiving unit 1103, configures a page, and displays the page on the display. Step S 301 Go back.
- Step S307 The user input receiving unit 1101 determines whether or not the input received in step S301 is a summary information selection instruction. If summary information selection instruction, step S
- the request transmission unit 1102 constitutes an information acquisition request including a summary information selection instruction.
- the constituent information includes link information of the original information in which the target information that is the basis of the summary information corresponding to the summary information selection instruction is described.
- the link information is, for example, the original information A URL or URI indicating the location of the news. Step S304 Go here.
- Step S 309 User input reception unit 1101 determines whether or not the input received in step S 301 is a link symbol selection instruction. If it is a link symbol selection instruction, go to step S310, and if it is not a link symbol selection instruction, jump to step S311.
- Request transmission section 1102 constitutes an information acquisition request including a link symbol selection instruction.
- the constituent information includes the link information of the original information corresponding to the link symbol selection instruction.
- the link information is, for example, a URL or URI indicating the location of the original information
- Step S311 Perform processing according to the accepted input. There are various kinds of processing. Such processing is, for example, processing performed by a so-called WEB browser. Step S301 ⁇ Go back.
- Step S401 The receiving unit 1201 determines whether or not the information has been received. If the information is accepted, the process goes to step S402. If the information is not accepted, the process returns to step S401.
- Step S402 Reception unit 1201 determines whether or not the information received in step S401 includes search information. If the search information is included, the process goes to step S403. If the search information is not included, the process jumps to step S406.
- the case where the search information is included is a case where the request information described above is accepted.
- Step S403 The target information acquisition unit 1202 acquires target information that is information regarding two or more search targets specified by the search information received in step S401. Details of the target information acquisition processing will be described with reference to the flowchart of FIG.
- Step S404 The output unit 1204 configures information to be transmitted to the information terminal 11. Details of the transmission information configuration processing will be described with reference to the flowchart of FIG.
- Step S405 The output unit 1204 outputs the information configured in step S404.
- the output here is transmission to the information terminal 11. Return to step S401.
- Step S406 The receiving unit 1201 selects the summary information selected in step S401. It is determined whether it is a selection instruction. If it is a summary information selection instruction, the process goes to step S407, and if it is not a summary information selection instruction, the process jumps to step S408.
- Step S407 The original information acquisition unit 1203 acquires the original information based on the link information of the original information in which the target information that is the source of the summary target information corresponding to the summary target information selection instruction is described. Go to step S405.
- Step S 408 Reception unit 1201 determines whether or not the information received in step S 401 is a symbol information selection instruction. If it is a symbol information selection instruction, the process goes to step S408, and if it is not a symbol information selection instruction, the process returns to step S401.
- Step S409 Based on the link information for the instructed link symbol information, the original information acquisition unit 1203 acquires the original information, which is information describing the target information indicated by the link information. Step S405
- Step S501 The search target information acquiring means 12022 substitutes 1 for the counter i.
- Step S502 Search target information acquiring means 12022 determines whether or not the target search information exists for the cell. If the i-th search target information exists, the process proceeds to step S503. If the i-th search target information does not exist, the process jumps to step S506.
- Step S 503 Search target information acquiring means 12022 determines whether or not the target search information of the cell satisfies the search information requirement. If the search information requirement is satisfied, the process goes to step S504, and if the search information requirement is not satisfied, the process jumps to step S505.
- search information for example, “about from JR Yamanote Line Shinjuku It ’s OK to judge whether the power matches the ramen shop ”
- Search target information acquisition means 12022 acquires part or all of the search target information that satisfies the requirements of the search information.
- a part or all of the information to be searched is a search key.
- Search target information acquisition means 12022 temporarily stores a search key.
- Step S505 Search target information acquiring means 12022 increments counter i by one. Return to step S502.
- Step S506 The search target information acquiring means 12022 substitutes 1 for the counter i.
- Step S507 The target information acquisition means 12023 determines whether or not the i-th search key exists. If the i-th search key exists, the process goes to step S508. If the i-th search key does not exist, the process returns to the upper function.
- Step S508 The target information acquisition means 12023 substitutes 1 for the counter j.
- Step S509 The target information acquisition means 12023 determines whether or not the j-th original information exists in the original information storage unit 1207. If the original information of the cell exists, go to step S510, and if the j-th original information does not exist, jump to step S517.
- Step S510 The target information acquisition unit 12023 determines whether or not a search target tag corresponding to the i-th search key exists. If the search target tag exists, the process proceeds to step S511, and if the search target tag does not exist, the process jumps to step S518.
- the initial value of the search target tag corresponding to all search keys is NULL (the search target tag does not exist! ⁇ ).
- Step S511 The target information acquisition means 12023 substitutes 1 for the counter k.
- Step S512 The target information acquisition unit 12023 determines whether or not the k th search target tag exists in the j th original information. If the kth search target tag exists, the process goes to step S513, and if the kth search target tag does not exist, the process jumps to step S516.
- the target information acquisition unit 12023 acquires information corresponding to a powerful search target tag. Such information is a candidate for target information.
- Step S514 The target information acquisition unit 12023 determines whether or not the information acquired in Step S513 matches the condition indicated by the search key. Step if the condition is met Go to S515 and if the conditions are not met, jump to step S521. Note that whether or not the condition indicated by the search key is met can be determined by various algorithms. For example, if the information acquired in step S513 includes one or more of the two or more pieces of information (for example, store name, address information, telephone number information, and zip code information) that the search key has, the condition is met. You may decide that Further, for example, it may be determined that the condition is met when the store name included in the information-power search key acquired in step S513 is included and a part of the address information is included.
- the condition indicated by the search key can be determined by various algorithms. For example, if the information acquired in step S513 includes one or more of the two or more pieces of information (for example, store name, address information, telephone number information, and zip code information) that the search key has, the condition is
- Step S5 The target information acquisition unit 12023 temporarily stores the information acquired in Step S513. Such information is target information.
- the target information is stored in a pair with the search key or the search target information.
- Step S509 The target information acquisition means 12023 increments the counter j by 1. Step Return to S509.
- Step S517) The target information acquisition unit 12023 increments the counter i by 1. Return to step S507.
- Step S578 The target information acquisition unit 12023 determines whether or not there is a portion where the j-th original information matches the condition indicated by the search key. That is, for example, when the search key has “store name”, “phone number information”, and “address information”, and when the search key includes two or more pieces of information among “store name”, “phone number information”, and “address information”, It is assumed that the relevant part meets the condition. In addition, various algorithms can be considered for determining whether or not the j-th original information has a force that matches the condition indicated by the search key.
- Step S519 The target information acquisition unit 12023 acquires a tag corresponding to a location that matches the condition indicated by the search key.
- Step S520 The target information acquisition unit 12023 registers the tag acquired in step S519 as a search tag corresponding to the jth original information.
- registration means writing to a memory or a predetermined buffer. Go to step S515.
- Step S521 The target information acquisition unit 12023 increments the counter k by 1. Return to step S512.
- the target information acquisition unit 12023 has the tag structure of the original information.
- processing to acquire information at the same hierarchical level was performed.
- the processing in this flowchart is an example of processing for acquiring information of the same hierarchical level when a plurality of pieces of target information are acquired from one original information.
- the target information acquisition unit 12023 constructs a tag structure tree of original information, and includes two or more pieces of location information (address information, telephone number information, postal code information, etc.) in the structure tree. May be registered as a search tag.
- a tag including two or more location information is registered as a search tag.
- a tag including one or more location information may be registered as a search tag, or a tag including three or more location information. May be registered as a search tag.
- the target information acquisition means 12023 is not limited to the processing procedure or the like as long as the process of acquiring the same hierarchical level information is performed when the same original information ability also acquires a plurality of target information. Yes.
- Step S601 The ranking determining unit 12041 ranks two or more search targets based on the target information of two or more search targets. Details of the ranking process, which is a powerful ranking process, will be described with reference to the flowchart of FIG.
- Step S602 The target information output means 12042 substitutes 1 for the counter i.
- the target information output means 12042 constitutes the header of the i-th target information.
- the target information output means 12042 constitutes the header of the i-th target information.
- Step S604 The target information output means 12042 substitutes 1 for the counter j.
- Link symbol output means 12046 constitutes j-th link symbol information.
- link information indicating the location of the original information is used. This is because the original information acquisition unit 1203 uses the link information to access the original information when the link symbol information is pressed.
- Step S606 Summarization means 12043 acquires summary information. Details of the summary information acquisition process, which is a process for acquiring summary information, will be explained using the flowchart of FIG. Light up.
- the target information output means 12042 constitutes a summarizing section using one or more pieces of summary information.
- the summary unit is information that constitutes information to be output.
- the summary section uses the summary information of the summary section and link information of the original information from which the summary information is composed. This is because when the summary information is pressed, the original information acquisition unit 1203 accesses the original information using the link information.
- Step S608 The counter j is incremented by one.
- Step S609 The target information output means 12042 determines whether or not the processing of all the link symbol information and summary information of the i-th search target has been completed (whether or not j is the last). Such a determination may be made when processing of the previously extracted summary information is completed, or may be made when processing of a predetermined number of summary information is completed. If j is the last, go to step S610, and if j is not the last, return to step S605.
- Step S610 The target information output means 12042 increments the counter i by 1.
- Step S611 The target information output means 12042 determines whether or not the i-th search target exists. If the i-th search target exists, the process goes to step S612. If the i-th search target does not exist, the process jumps to step S603.
- the target information output unit 12042 constitutes information to be output.
- Such processing is, for example, recording a tag such as “ku ZHTML>” in the HTML file in the last line of the file. That is, it is post-processing for configuring information to be transmitted. Return to upper function.
- Step S701 The ranking determining means 12041 substitutes 1 for the counter i.
- Step S702 The ranking determining unit 12041 determines whether or not the i-th search target exists. If the i-th search target exists, the process goes to step S703. If the i-th search target does not exist, the process jumps to step S713.
- Step S703 The ranking determining unit 12041 substitutes 0 for the point information of the i-th search target. Point information is information used to determine ranking.
- Step S704 The ranking determining means 12041 substitutes 1 for the counter j.
- Step S705 The ranking determining unit 12041 determines whether or not the j th target information exists in the i th search target. If the j-th target information exists, the process proceeds to step S706. If the j-th target information does not exist, the process jumps to step S714.
- the ranking determining unit 12041 obtains the data amount of the j-th target information.
- the amount of data may be the number of characters, the number of bytes (data length), the number of words, the number of sentences, etc.
- Step S707 The ranking determining unit 12041 acquires the page rank of the original information of the j-th target information. For example, it is assumed that the information processing apparatus 12 holds the page rank of the original information in advance corresponding to the original information.
- Step S708 The ranking determining unit 12041 determines whether or not the i-th search target telephone number information is included in the j-th target information. If the telephone number information is included, the process goes to step S709. If the telephone number information is not included, the process jumps to step S715.
- the ranking determining means 12041 substitutes ON for the telephone number flag.
- the telephone number flag is a flag indicating whether or not the i-th search target telephone number information is included.
- Step S710 The ranking determination means 12041 calculates points based on the data amount acquired in Step S706, the page rank of the original information acquired in Step S707, and one or more information in the telephone number flag. calculate. An example of a specific point calculation algorithm will be described later.
- Step S711 The ranking determining unit 12041 adds the point calculated in Step S710 to the point information of the i-th search target.
- Step S712 The ranking determining means 12041 increments the counter j by 1. Go to step S705.
- Step S713 The ranking determining unit 12041 sorts the search targets using the point information as a key. Return to upper function.
- Step S714 The ranking determining unit 12041 increments the counter i by 1. Go to step S702. (Step S715) The ranking determining means 12041 substitutes OFF for the telephone number flag. Go to step S710.
- the ranking determination means 12041 describes the number of characters of target information of two or more search targets, whether the target information includes telephone number information, and the target information! Based on the information page ranking !, more than two search targets were ranked. However, the ranking determining unit 12041 ranks two or more search targets based on the number of characters of the target information of two or more search targets, whether or not the phone number information is included, and one or more pieces of information in the page ranking. May be attached. Further, the ranking determination means 12041 may rank two or more search targets based on other information such as the number of articles to be searched two or more.
- Step S801 Summarizing means 12043 substitutes 1 for counter i.
- Step S802 The summarizing means 12043 determines whether or not the i-th search target exists. If the i-th search target exists, the process goes to step S803. If the i-th search target does not exist, the process returns to the upper function.
- Step S803 The word extracting means 120431 substitutes 1 for the counters j and k.
- Step S804 The word extracting means 120431 extracts the kth word from the jth target information.
- Word appearance count calculating means 120432 for each search target calculates the number of appearances of the word extracted in step S804 (first appearance count) in all target information of the i-th search target.
- Step S806 The word total appearance count calculating means 120433 calculates the number of appearances (second appearance count) of the word extracted in step S804 in all the target information of all search targets.
- Step S807 The ratio calculating means 120434 calculates the ratio of the first appearance count to the second appearance count.
- Step S808 The word list creation means 120435 uses the words extracted in Step S804 And register (temporarily store) a pair of information calculated in step S807.
- Step S809 The word extracting means 120431 increments the counter k by 1.
- Step S810 The word extraction means 120431 determines whether or not the kth word exists from the jth target information. This kth word is a word that has not been processed so far. If the kth word exists, go to step S804, and if the kth word does not exist, go to step S811.
- Step S811 The counter j is incremented by one.
- Step S812 The word list creation means 120435 determines whether or not the j-th target information of the i-th search target exists. If the j-th target information exists, the process goes to step S804. If the j-th target information does not exist, the process goes to step S813.
- the word list creation means 120435 sorts the words based on the ratio information. For example, consider that the word at the top of the sort is a word that is unique to the search target.
- Step S814 The summarizing means 12043 substitutes 1 for the counter m.
- Step S815 The summarizing means 12043 determines whether or not the i-th search target summary information is larger than a predetermined size. Note that the initial value of the summary information for each search target is NULL. If larger than the predetermined size, go to step S818, and if smaller than the predetermined size, go to step S816.
- Step S816 The summarizing means 12043 acquires the sentence containing the mth word from all the target information of the i-th search target and adds it as summary information.
- Step S817) The summarizing means 12043 increments the counter m by 1.
- Step S818) The summarizing means 12043 works so that the summary information is within a predetermined size. Cut is a process of erasing information beyond a predetermined size.
- Step S819) Summarizing means 12043 increments counter i by 1. Go to step S8 02.
- the flowchart of FIG. 8 shows an algorithm for acquiring summary information by paying attention to a word that is a characteristic of a search target.
- the summarizing means 12043 includes a word extracting means for extracting words from the target information, and a search for each word extracted by the word extracting means. What is necessary is just to have a word appearance count calculation means for each search target for calculating the appearance count for each target and a summary acquisition means for acquiring summary information for each search target from the target information based on the appearance count. In other words, it is not always necessary to construct the word list based on the ratio of the first appearance count to the second appearance count.
- Another algorithm is, for example, the following algorithm. Take one sentence at a time from the “collection of sentences included in all target information of the i-th search target” and score by the score and ratio of the characteristic words contained in the sentence. For example, if you say “special gyoza is recommended”, “special” is the 5th characteristic word (4.6 points) and “gyoza” is the 3rd characteristic word (5.0 points). 6 / word count. On the other hand, “This restaurant recommends dumplings using special black pork!”! Ubun also includes “special” and “gyoza,” but the number of words is so large. Lower score. However, if “Black Pig” is the first feature word and the points are high enough, this sentence will be preferentially selected.
- the feature word is a word specific to the search target, and the point is information indicating the degree of peculiarity of the search target of the word.
- the number of appearances is a concept including the concept of the above points and the concept of the appearance degree of other words. That is, the summary acquisition means is configured to acquire the summary information of each search target based on the number of appearances.
- FIG. 1 A conceptual diagram of the information processing system is shown in FIG. 1
- the information terminal 11 is, for example, a personal computer equipped with a WEB browser.
- the information terminal 11 is a terminal that outputs target information such as a ramen shop, for example.
- the information processing device 12 is, for example, a server device equipped with a so-called search engine.
- the information storage device 13 is, for example, a server device that stores a homepage of a ramen shop.
- the home page is described in, for example, HTML.
- FIG. 9 shows an example of a homepage received by the information processing apparatus 12 from a large number of information storage apparatuses 13.
- a powerful homepage corresponds to the original information described above.
- the original information receiving unit 1205 receives the home page from the information storage device 13, and the original information accumulating unit 1206 accumulates the home page received by the original information receiving unit 1205 in the original information storing unit 1207.
- Originality The information storage unit 1207 stores a large number of homepages.
- FIG. 10 shows a search target information group stored in the search target information group storage unit 12021.
- the search target information group includes “search target name information”, “zip code information”, “address information”, and “phone number information”.
- Search target name information is, for example, a store name.
- the search target information group is, for example, a so-called yellow page.
- FIG. 11 is an information portal management table.
- the information portal management table is highly reliable and manages the HP URL of the information portal.
- the homepage identified by the URL managed in the powerful information portal management table is a homepage with a high page rank.
- a home page not managed by the information portal management table is a home page with a low page rank.
- the page rank information is used by the ranking determining means 12041.
- the user inputs the home page URL of the information processing apparatus 12 from the information terminal 11 and accesses the home page.
- the information terminal 11 has received and displayed a homepage for searching for a restaurant.
- the user enters search information in a field on the homepage in order to obtain information on the restaurant (in this case, ramen shop) that he / she wants to search (see FIG. 12).
- the search information includes information indicating “search point”, information indicating “range”, and “keyword”.
- the user presses the “Search” button after inputting the search information.
- the user searches for a ramen shop in the JR Yamanote Line Shinjuku Riki within about 10 minutes walking distance.
- the request transmission unit 1102 of the information terminal 11 configures request information (see FIG. 13) having the search point “JR Yamanote Line Shinjuku”, the range “about 10 minutes walk”, and the keyword “ramen”.
- the request information is transmitted to the information processing apparatus 12.
- “1” in the request information in FIG. 13 is a search flag for instructing a search.
- the request information name “1” indicates that the name is not a search key.
- the receiving unit 1201 of the information processing apparatus 12 receives request information from the information terminal 11.
- the request information includes search information that is information for specifying two or more search targets.
- the search target information acquisition means 12022 stores the search target information group based on the received search information (search point “JR Yamanote Line Shinjuku”, range “about 10 minutes walk”, keyword “ramen”). Part or all of the search target information is acquired from step 12021. Specifically, the search target information acquisition means 12022 calculates a distance of about 10 minutes on foot from the JR Yamanote Line Shinjuku Riki. In this case, for example, the search target information acquisition means 12022 calculates a ramen shop within 800 m of the JR Yamanote Line Shinjuku as one minute on foot from the search target information group in FIG.
- Search target information acquisition means 12022 holds addresses of a number of search points such as JR Yamanote Line Shinjuku, and holds map information for converting addresses to latitude and longitude. Then, the search target information acquisition means 12022 acquires location information (latitude and longitude) of the JR Yamanote Line Shinjuku from the location of the JR Yamanote Line Shinjuku. Next, the search target information acquisition unit 12022 acquires position information (latitude and longitude) from the address information of the search target information group in FIG. Then, the search target information acquisition means 12022 calculates the distance between the location information of the JR Yamanote Line Shinjuku and the location information of the store to be searched, and determines whether it is within 800 m.
- the search target information acquisition means 12022 acquires records of stores within 800 m from the search target information group in FIG. Next, the search target information acquisition unit 12022 acquires a record including the character string “ramen” in the “search target name information” from the acquired records. And some or all of the powerful records are the search target information. In addition, the search flow of search target information is not ask
- the search target information intermediate table is a set of records indicating search targets.
- the target information acquisition unit 12023 searches the original information storage unit 1207 based on the searched target information intermediate table of FIG. 14 acquired by the searched target information acquisition unit 12022, and acquires the target information.
- the target information is information about the store specified by each record in the searched target information intermediate table in FIG.
- the target information acquisition means 12023 first performs the test. Search the information described for “Kumamoto OO Ramen” in the first record of the search target information intermediate table for all websites in the original information storage unit 1207 as well. This search is performed using the following algorithm.
- the homepage is described here as data hierarchized by tags such as HTML and XML.
- the homepage is described in HTML.
- the target information acquisition unit 12023 constructs an HTML structural tree from the home page.
- the HTML structure tree is information indicating the hierarchical relationship of HTML tags as shown in FIG. 15, for example.
- Figure 15 shows that the tr> tag exists under the ⁇ table> tag and the td> tag exists under the ⁇ tr> tag.
- the ⁇ table> tag is a tag that specifies the entire table
- the ⁇ tr> tag is a tag that specifies a row (record) in the table.
- the ⁇ td> tag is a tag that designates a cell. The reason for this powerful process is that the information of each store is very often described in the same hierarchy on the HT ML tag hierarchy.
- the target information acquisition means 12023 searches the homepage for the address information "Shinjuku-ku, Tokyo ", phone number information "03-1122-3456" Among them, a sentence group including one or more pieces of information is acquired. A sentence group is information surrounded by tags and is target information. Then, the target information acquisition means 12023 adds a tag of a hierarchy including one or more information out of the residence information “Shinjuku-ku, Tokyo...” of “Kumamoto 00 ramen” and the telephone number information “03-1122-3456”. get. Here, it is assumed that the target information acquisition means 12023 detects that there is a tag including the address information “Shinjuku-ku, Tokyo” in the tag (1> tag).
- the target information acquisition means 12023 selects a tag having two or more location descriptions (such as location information) as the target information. It may be an algorithm determined as an existing tag, and the target information acquisition means 12023 temporarily stores the acquired target information.
- the target information is stored in one or more location information (address information, telephone number information, zip code). The reason why the text group includes information, etc.) is that information such as stores often includes one or more pieces of location information.
- the target information acquisition means 12023 searches the following homepage, and addresses information of "Kumamoto OO ramen”"Shinjuku-ku, Tokyo ", phone number information "03-1122-3456" No In other words, the target information including one or more information and the tag of the hierarchy where the target information exists are acquired.
- the target information acquisition means 12023 has one or more of the address information “Shinjuku-ku, Tokyo...” and telephone number information “03-1122-3456” on the homepage. If there is no sentence group including, the process proceeds to the search process for the next home page.
- the target information acquisition means 12023 is able to display the address information “Shinjuku-ku, Tokyo...” and telephone number information “03—1122-3456” of “Kumamoto ⁇ Ramen” from all websites. Get target information including one or more information (location information).
- the above processing is also performed for the second store "Ramen ABC” and the third store “Ramen XYZ", and all target information is temporarily stored.
- the tag stores information is extracted when the target information search processing of the first store information “Kumamoto 00 ramen” is performed. Search for information that corresponds to the tag (which may be present) and not search for information that corresponds to other tags. Can be removed.
- the acquisition speed of the target information is increased.
- First, for each source information an HTML structure tree is constructed, a tag including two or more location information is registered as a search target tag, and only information corresponding to the registered tag may be used as a target information candidate. good.
- the target information management table shown in FIG. 16 is obtained.
- the target information acquisition means 12023 searches all home pages in the original information storage unit 1207 and searches for target information of ID “1” to “n”, which is target information of “Kumamoto OO Ramen”. I got it.
- the target information acquisition unit 12023 acquires target information of ID “n + l” to “n + m”, which is target information of “ramen ABC”.
- the target information acquisition unit 12023 similarly acquires target information of ID “n + m + 1” to “n + m + p”, which is target information of “ramen XYZ”.
- the ranking determining unit 12041 obtains the attribute value of the “original information URL” of the record of ID “1” in the target information management table of FIG. 16, and the URL that is the attribute value is the information portal management of FIG. Determine whether it exists in the table. For example, the source information URL “http: ZZwww.gourmet.co.jp” with ID “1” in the target information management table in FIG. 16 exists in the information portal management table in FIG. And Next, the ranking determining unit 12041 determines whether or not the “target information” of the record of ID “1” in the target information management table of FIG.
- Ranking determining means 12041 sets an attribute value “telephone number” in a ranking intermediate table, which will be described later, as “1” if the telephone number information is included and “0” otherwise. Next, the ranking determining means 12041 counts the number of characters of “target information” in the record with ID “1” in the target information management table of FIG. 16 to obtain “384”. Ranking determining means 12041 forms a ranking intermediate table having one or more records having “ID”, “search target name information”, “page rank”, “phone number”, and “number of characters”. Fig. 17 shows a powerful ranking intermediate table. As a result of the above processing, the record with ID “1” in FIG. 17 is configured.
- the ranking determining means 12041 obtains the attribute value of “original information URL” of the record of ID “2” in the target information management table of FIG. 16, and the URL that is the attribute value is the information of FIG. Judge whether it exists in the portal management table.
- the ranking determining means 12041 sets the telephone number “1” because it includes the telephone number information in the “target information” of the record of ID “2” in the target information management table of FIG.
- the ranking determination unit 12041 counts the number of characters of “target information” in the record with ID “2” in the target information management table of FIG. 16 to obtain “129”.
- the ranking determining unit 12041 performs the above processing on the records after the ID “3” in the target information management table of FIG. 16 to obtain the ranking intermediate table of FIG.
- the ranking determining unit 12041 calculates points for each store (search target name information) based on the ranking intermediate table of FIG. That is, the ranking determining means 12041 sets the multiple to “1” when the page rank is “high”, sets the multiple to “0.3” when the page rank is “low”, and sets the multiple when the page number is “1”. If the phone number is “0”, the multiple is “0.5”. Ranking determining means 12041 then multiplies “number of characters” by a multiple of the page rank and a multiple of the telephone number to calculate points. And the ranking determining means 12041 Let the points be rounded to an integer.
- the ranking determination means 12041 determines the point of "Kumamoto OO Ramen” as "3
- the ranking determining means 12041 calculates the points of “ramen ABC” and “ramen ⁇ ⁇ ”. As a result, it is assumed that the ranking determining means 12041 calculates the point “2522” for “Kumamoto 00 ramen”, the point “1529” for “ramen ABC”, and the point “4211” for “ramen XYZ”. There are various point calculation algorithms, and it goes without saying that points may be calculated using other calculation formulas. However, in calculating points, it is preferable to consider the page rank and presence / absence of a telephone number. This is because evaluations that take into account the reliability of the website are possible.
- the summarizing means 12043 reads all target information (target information from ID “1” to ID “n” in FIG. 16) of the first search target “Kumamoto 00 ramen”.
- the word extraction means 120431 extracts a word (noun) “salt ramen” and the like from all target information.
- the word appearance count calculating means 1204 32 for each search target calculates the appearance count (first appearance count) of the word (noun) “salt ramen” in all target information of the search target “Kumamoto 00 ramen”. For example, “10” is calculated.
- the number of first appearances of other words is also calculated by the same process.
- the summarizing means 12043 reads all target information (target information from ID “n + 1” to “n + m” in FIG. 16) of the second search target “ramen ABC”.
- the word extraction means 120 431 extracts a word (noun) “special made shrimp ramen” or the like from all target information.
- the word appearance count calculating means 120432 for each search target calculates the number of appearances (first appearance count) of the word (noun) “special shrimp ramen” in all target information of the search target “ramen ABC”, for example, “ 8 ”is calculated.
- the number of first appearances of other words is also calculated by the same process.
- the word total appearance count calculating means 120433 calculates the number of appearances (second appearance count) of all the search target information for words such as “salt ramen” and “special shrimp ramen”. calculate.
- the ratio calculating means 120434 then divides the first appearance count and the second appearance count. The ratio information is calculated.
- the word list creation means 120435 sorts the words for each search target using the “ratio” as a key. Then, the summarizing means 12043 obtains the word list management table of FIG.
- the word list management table holds one or more records having “search target name information”, “word”, “first appearance count”, “second appearance count”, and “ratio”.
- the summarizing means 12043 collects sentences in the target information power of FIG. 16 until the text in the target information including words with a large ratio reaches a predetermined size (for example, 512 bytes). To get. At that time, the summarizing means 12043 also acquires the original information UR L corresponding to the target information. Then, the summarizing means 12043 obtains the summary information management table shown in FIG.
- the summary information management table is a record having “search target name information”, “summary information”, and “original information URL”. “Summary information” is extracted from the target information up to a predetermined size of sentences in which a high percentage of words indicating the characteristics of the store appear.
- the target information output means 12042 configures information to be output by the following processing using the summary information obtained by the above processing.
- An example of the information that the target information output means 12042 finally configures is shown in FIG.
- the target information output means 12042 constitutes information of the headline “Ramen XYZ” ranked first.
- the information in this heading may be anything as long as it indicates that the ranking is first.
- the heading information includes the number “1” indicating the first ranking, the name information to be searched (store name), and the number of stars rounded up (the star power is higher than the point “4211”).
- the target information output means 12042 constitutes link symbol information.
- the link symbol information is information such as “Evaluation 1” and “Evaluation 2” immediately below the heading information of “Ramen ⁇ ”.
- the information such as “evaluation 1” is an anchor, and holds a URL indicating the location of the original information of the first target information.
- the target information output means 12042 constructs a character string of “evaluation 1” from the summary information “Gomala Men's ⁇ .” Of the ramen ⁇ in the summary information management table of FIG. “http: ZZwww.Gurme.co.jp” is given to “Evaluation 1” as anchor information.
- the structure of the character string “evaluation 1” is generated by combining the order (number) of summary information with the fixed character “evaluation”. Then, the configuration of the information in the area (1) in FIG. 20 is completed.
- the target information output means 12042 adds the summary information acquired by the summarization means 12043 below the area (1) in FIG.
- the summary information is also an anchor, and a URL indicating the location of the original information corresponding to the target information to be summarized is given to the summary information.
- the target information output means 12042 obtains the area (2) in FIG. (2) in Fig. 20 is the summary section described above.
- FIG. 20 shows a display image, which is actually described in HTML.
- the output unit 1204 transmits the configured information (information in FIG. 20) to the information terminal 11.
- the information terminal 11 receives the information of FIG. 20 (information described in HTML), interprets and executes the information, and displays the screen of FIG.
- the information terminal 11 searches the information processing device 12 for a page corresponding to the URL “http: ZZwww.gourmet.co.jp” corresponding to “evaluation 1” of the ramen XYZ, and rhttp: // www. Display the page corresponding to “Gourmet. Co.jp”. Since this process is a process according to a known technique, a detailed description thereof is omitted.
- the information terminal 11 searches the information processing apparatus 12 for a page corresponding to the URL “http: ZZwww.gourmet.co.jp” corresponding to the summary information “Gomaramen no Uta.” , / www.Gnoreme.co.jp "is displayed.
- the detailed process is also a process according to a known technique, and detailed description thereof is omitted.
- objective information such as a store can be appropriately acquired. Specifically, it is possible to search for stores that match the user's behavior (such as a 10-minute walk) from the target location (for example, JR Yamanote Line Shinjuku). In addition, it is possible to search for stores with high accuracy by narrowing down stores using the so-called town page information. In addition, when searching for information such as HTML files and other information such as stores, the probability of extracting unnecessary information can be reduced by dividing and searching for information based on the tag structure. Moreover, necessary information can be acquired at high speed.
- the hierarchy level that has a predetermined relationship with the hierarchy level including location information that is one or more of the talk number information, address information, and postal code, and acquiring the information of the determined hierarchy level.
- the probability of retrieving information can be reduced. Specifically, this process can prevent obtaining information that is not information that the user wants to search, such as advertisement information on a WEB page, and obtain good search results.
- the summarizing means includes a word extracting means for extracting words from the target information, and a search target for calculating the number of appearances for each search target in each word extracted by the word extracting means.
- Each word appearance count calculation means and a summary acquisition means for acquiring a summary of each search target from the target information based on the appearance count.
- the summarization means may be processing that only cuts out the beginning of the target information, for example, 100 characters. However, it is possible to obtain summary information that clearly represents the characteristics of the search target such as a store by the summarization method that takes into account the number of occurrences of the word.
- the output unit 1204 may simply output summary information that is not an anchor.
- link symbol output means 12046 outputs link symbol information ("evaluation 1", “evaluation 2”, etc.), but link symbol output means 12046 is not essential. In other words, it is not essential to output link symbol information (such as “Evaluation 1” and “Evaluation 2”).
- a ranking deciding method for ranking two or more search targets. Step 12041 is not essential. That is, store information may be output in the order of processing without ranking.
- the ranking determination means 12041 includes the number of characters of the target information to be searched for two or more, or Z and whether the target information includes telephone number information or whether the Z and the target information are Two or more search targets are ranked based on the page ranking of the original information. However, you may rank other than this process. For example, the ranking determination unit 12041 may rank two or more search targets based only on the number of home pages on which two or more search targets are described.
- the search information includes search point information that is information related to the search point and target group information that is information for specifying a search target group.
- the target group information is a keyword “ramen” or the like.
- the search information may be only the search point information, or may include other information (such as store size and business hours information).
- the search target is mainly a dining place (restaurant) such as a ramen shop, but it may be anything such as an English conversation school or a bookstore, a store selling or renting services or goods.
- the original information that is the information from which the target information is acquired is information with a hierarchical tag
- the target information acquisition unit When acquiring multiple pieces of target information from the original information, the same hierarchical level information was acquired.
- the target information acquisition unit may search the target information by a simple search process or the like without using the tag hierarchy information. Needless to say, when the target information is searched using the hierarchical information of the tag, the search process is accelerated.
- the target information acquisition unit searches for one original information and acquires one of the telephone number information, address information, and zip code when acquiring a plurality of target information from one original information.
- the hierarchy level that is the same as the hierarchy level including the location information that is one or more pieces of information is determined, and information on the determined hierarchy level is acquired.
- the target information acquisition unit searches for one source information, and is a place that is one or more of telephone number information, address information, and postal code.
- Hierarchy level containing information and predetermined It is also possible to determine a hierarchy level having the relationship of and to obtain information on the determined hierarchy level. For example, as shown in Fig.
- the home page having the tag structure of FIG. 21 (a) includes information desired by the user, such as a restaurant, in each row of the table as shown in FIG. 21 (b). Also, for example, a telephone number is included as an attribute value in each row of the table, and an address is included as another attribute value.
- the homepage having the tag structure of FIG. 22 (a) includes information desired by the user such as a restaurant in the table record as shown in FIG. 22 (b). That is, the above-described “hierarchy level having a predetermined relationship with the hierarchy level including the location information” may be the same hierarchy level as the “hierarchy level including the location information” or one of the “hierarchy levels including the location information”. It may be the level of the upper hierarchy or the level of the upper hierarchy.
- the source information may be other information that was the homepage of the WEB, of course.
- the processing in the present embodiment may be realized by software. And this software may be distributed by software download etc. Also, this software may be recorded on a recording medium such as a CD-ROM and distributed.
- the software that realizes the information processing apparatus according to the present embodiment is the following program.
- the program includes a reception step of receiving search information, which is information for specifying two or more search targets, and two or more information that is information regarding two or more search targets specified by the search information. It is a program for executing a target information acquisition step for acquiring two or more original information powers of target information and an output step for outputting the target information of the two or more search targets.
- the output step summarizes the two or more pieces of target information, obtains two or more pieces of summary information, and summary information that outputs the two or more pieces of summary information.
- An information output step may be provided.
- the summarization step includes a word extraction step for extracting a word from the target information, and a word appearance count for each search target for calculating an appearance count for each search target in each word extracted in the word extraction step.
- a summary acquisition step of acquiring summary information of each search target from the target information based on the step and the number of appearances may be provided.
- the summarizing step includes a word total appearance count calculating step for calculating the number of appearances in all target information of all search targets for each word extracted in the word extracting step, and the calculation of the word appearance count for each search target.
- a ratio calculation step for calculating a ratio between the number of appearances for each search target calculated in the step and the number of appearances in all the target information calculated in the word total appearance count calculation step, and a ratio calculated in the ratio calculation step A word list creation step of creating a word list for each search target, arranged from words with a high search rate, wherein the summary acquisition step is based on the word list for each search target, and based on the word list for each search target, Summary information may be acquired.
- the target information acquisition step link information that is information indicating the location of the information in which the target information is described is also acquired, and in the reception step, a summary information selection instruction that is an instruction to the summary information is issued. And when the summary information selection instruction is received in the reception step, based on the link information of the original information in which the target information that is the basis of the summary information corresponding to the summary information selection instruction is described, An original information acquisition step of acquiring the original information may be further executed, and the output step may include an original information output step of outputting the original information acquired in the original information acquisition step.
- link information which is information indicating a location of the information in which the target information is described is also acquired, and the output step includes the link information.
- a link symbol output step for outputting link symbol information that is corresponding information is further provided, and the receiving step also receives a link symbol selection instruction that is an instruction for the link symbol information.
- the output step further includes a ranking determination step of ranking the two or more search targets based on the target information of the two or more search targets, and based on the ranking of the ranking determination step. Accordingly, the target information or Z and summary information of the two or more search targets may be output.
- the number of characters of the target information of the two or more search targets or / and whether or not the target information includes telephone number information or / and the target information are described. Based on the page ranking of the original information, you can rank the two or more search targets.
- the search information preferably includes search point information that is information about a search point and target group information that is information for specifying a search target group.
- search target name information that is information indicating the name of the search target
- telephone number information that is information indicating the telephone number of the search target
- address information that is information indicating the address of the search target.
- a search target information group having at least one search target information is stored, and in the target information acquisition step, part or all of the search target information is acquired based on the search information. Based on the step and part or all of the search target information acquired in the search target information acquisition step, target information that is information on two or more search target targets specified by the search information is acquired.
- a target information acquisition step may be provided.
- the search information further includes search range information that is information for specifying a search range of the search point force indicated by the search point information
- the search target information acquisition step includes a search point included in the search information. Based on the information, the address information of the search target, and the search range information, one or more search target information may be selected, and a part or all of the selected search target information may be acquired. .
- the search target information acquisition step the longitude and latitude of the search point information included in the search information and the longitude and latitude of the address information of the search target are acquired, and the two latitudes The distance between the search point indicated by the search point information and the search target point indicated by the address information is calculated, and the condition indicated by the search range information is met based on the distance. By determining whether or not, one or more pieces of search target information may be selected, and a part or all of the selected search target information may be acquired.
- the original information which is the information from which the target information is acquired, is information with a layered tag.
- the target information acquisition step a plurality of target information is obtained from one source information. It is preferable to acquire information of the same hierarchical level when acquiring
- the target information acquisition step when a plurality of pieces of target information are acquired from one source information, one source information is searched and one or more of telephone number information, address information, and postal code are searched. It is also possible to determine a hierarchy level including location information, which is the information of the information, and obtain information on the determined hierarchy level.
- a hierarchy level including two or more location information may be determined, and the information of the determined hierarchy level may be acquired.
- the transmission step for transmitting information and the reception step for receiving information are performed by hardware, for example, a modem or an interface card in the transmission step. Does not include processing (processing that can only be done with software)! /.
- each process may be realized by centralized processing by a single device (system), or may be distributedly processed by a plurality of devices. It may be realized by doing.
- the processing method described above may be realized by one or more devices.
- the information processing apparatus 12 may not hold the search target information group storage unit 12021, and another apparatus may hold the search target information group storage unit 12021. In such a case, the information processing device 12 searches for the other device and acquires the search target information.
- the information terminal 11 and the information processing device 12 may be realized by a single device.
- the information processing apparatus 12 holds the user input receiving unit 1101, and the output unit 1204 performs processing such as displaying information on a display or outputting sound through a speaker.
- two or more communication means may be physically realized by a single medium.
- the present invention can be variously modified without being limited to the above-described embodiments, and it goes without saying that these are also included in the scope of the present invention.
- the information processing apparatus has an effect that objective information such as a store can be output appropriately, and is useful as, for example, a server apparatus having a search engine on the WEB. .
- FIG. 3 is a flowchart for explaining the operation of the information terminal.
- FIG. 4 is a flowchart for explaining the operation of the information processing apparatus.
- FIG. 5 is a flowchart for explaining the target information acquisition process.
- FIG. 6 is a flowchart for explaining the transmission information configuration process.
- FIG. 7 is a flowchart for explaining the ranking process.
- FIG. 8 is a flowchart for explaining the summary information acquisition process.
- FIG.10 A diagram showing a group of information to be searched
- FIG.14 A diagram showing the interim search target information intermediate table
- FIG. 20 is a diagram showing an example of information to be output [Figure 21] Diagram showing the same HTML structure tree
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006524541A JP4035623B2 (ja) | 2004-07-16 | 2005-06-28 | 情報処理装置およびプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-209780 | 2004-07-16 | ||
JP2004209780 | 2004-07-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006008919A1 true WO2006008919A1 (ja) | 2006-01-26 |
Family
ID=35785042
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/011786 WO2006008919A1 (ja) | 2004-07-16 | 2005-06-28 | 情報処理装置およびプログラム |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP4035623B2 (ja) |
WO (1) | WO2006008919A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008040869A (ja) * | 2006-08-08 | 2008-02-21 | Pioneer Electronic Corp | 地点情報評価装置、地点情報評価プログラム |
WO2008142791A1 (ja) * | 2007-05-24 | 2008-11-27 | Fujitsu Limited | 差分算出プログラム、差分算出装置および差分算出方法 |
CN104715000A (zh) * | 2013-12-17 | 2015-06-17 | 国际商业机器公司 | 用于支持评价分析的装置和方法 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000207458A (ja) * | 1999-01-08 | 2000-07-28 | Recruit Co Ltd | 商品情報サ―ビスシステム |
JP2001357035A (ja) * | 2000-06-13 | 2001-12-26 | Open Door:Kk | コンテンツ評価・検索システム |
JP2003157254A (ja) * | 2001-11-20 | 2003-05-30 | Just Syst Corp | 情報処理装置、情報処理方法、及び情報処理プログラム |
JP2003167990A (ja) * | 2001-11-30 | 2003-06-13 | Fujitsu Ltd | 商品情報収集システム及び方法 |
JP2003271670A (ja) * | 2002-03-19 | 2003-09-26 | Mitsubishi Electric Corp | 情報収集装置、情報収集方法及びプログラム |
JP2004185572A (ja) * | 2002-12-06 | 2004-07-02 | Nippon Telegr & Teleph Corp <Ntt> | 口コミ情報解析方法及び装置 |
-
2005
- 2005-06-28 JP JP2006524541A patent/JP4035623B2/ja active Active
- 2005-06-28 WO PCT/JP2005/011786 patent/WO2006008919A1/ja active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000207458A (ja) * | 1999-01-08 | 2000-07-28 | Recruit Co Ltd | 商品情報サ―ビスシステム |
JP2001357035A (ja) * | 2000-06-13 | 2001-12-26 | Open Door:Kk | コンテンツ評価・検索システム |
JP2003157254A (ja) * | 2001-11-20 | 2003-05-30 | Just Syst Corp | 情報処理装置、情報処理方法、及び情報処理プログラム |
JP2003167990A (ja) * | 2001-11-30 | 2003-06-13 | Fujitsu Ltd | 商品情報収集システム及び方法 |
JP2003271670A (ja) * | 2002-03-19 | 2003-09-26 | Mitsubishi Electric Corp | 情報収集装置、情報収集方法及びプログラム |
JP2004185572A (ja) * | 2002-12-06 | 2004-07-02 | Nippon Telegr & Teleph Corp <Ntt> | 口コミ情報解析方法及び装置 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008040869A (ja) * | 2006-08-08 | 2008-02-21 | Pioneer Electronic Corp | 地点情報評価装置、地点情報評価プログラム |
WO2008142791A1 (ja) * | 2007-05-24 | 2008-11-27 | Fujitsu Limited | 差分算出プログラム、差分算出装置および差分算出方法 |
JPWO2008142791A1 (ja) * | 2007-05-24 | 2010-08-05 | 富士通株式会社 | 差分算出プログラム、差分算出装置および差分算出方法 |
JP4957796B2 (ja) * | 2007-05-24 | 2012-06-20 | 富士通株式会社 | 差分算出プログラム、差分算出装置および差分算出方法 |
CN104715000A (zh) * | 2013-12-17 | 2015-06-17 | 国际商业机器公司 | 用于支持评价分析的装置和方法 |
US10185915B2 (en) | 2013-12-17 | 2019-01-22 | International Business Machines Corporation | Analysis of evaluations from internet media |
Also Published As
Publication number | Publication date |
---|---|
JP4035623B2 (ja) | 2008-01-23 |
JPWO2006008919A1 (ja) | 2008-05-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4909334B2 (ja) | サービス提案装置及びその方法、サービス提案システム、ユーザのお気に入りベースに基づくサービス提案装置及びその方法 | |
US8001135B2 (en) | Search support apparatus, computer program product, and search support system | |
US20050004903A1 (en) | Regional information retrieving method and regional information retrieval apparatus | |
JP5769327B2 (ja) | データベース構築装置、商標侵害検知装置、データベース構築方法、およびプログラム | |
JP5221664B2 (ja) | 情報マップ管理システムおよび情報マップ管理方法 | |
KR101122737B1 (ko) | 지식노드 연결구조를 생성하기 위한 검색 데이터베이스 구축 장치 및 방법 | |
US20200043074A1 (en) | Apparatus and method of recommending items based on areas | |
JP2007233862A (ja) | サービス検索システム及びサービス検索方法 | |
KR101671374B1 (ko) | 키워드 추천 장치와 방법 및 키워드 지식베이스 구축 방법 | |
US20130304370A1 (en) | Method and apparatus to provide location information | |
JP2010181966A (ja) | レコメンド情報評価装置およびレコメンド情報評価方法 | |
JP4035623B2 (ja) | 情報処理装置およびプログラム | |
JP5185891B2 (ja) | コンテンツ提供装置、コンテンツ提供方法およびコンテンツ提供プログラム | |
JP6639040B2 (ja) | 情報検索装置及びプログラム | |
JP3984263B2 (ja) | 地図情報システム連動サーチエンジンサーバーシステム。 | |
JP4505389B2 (ja) | 広告コンテンツ送信システム、広告コンテンツ送信方法 | |
JP5144185B2 (ja) | 情報検索システム及び情報検索方法 | |
JP4708288B2 (ja) | サービス連携サーバ、方法、システム、プログラム、及び、記録媒体 | |
JP2007048328A (ja) | 情報処理装置、情報処理方法およびプログラム | |
JP2001236368A (ja) | 情報通信端末、サーバ装置およびそれらを接続した情報通信システム | |
JP5084859B2 (ja) | 情報処理装置、データ抽出方法、及びプログラム | |
JP2009122738A (ja) | 情報処理装置、情報処理方法、およびプログラム | |
KR20200125412A (ko) | 여행 속성 언어 개인화 방법 및 장치 | |
JP2005099964A (ja) | 検索分類システム、検索分類サーバ、プログラムおよび記録媒体 | |
JP2007102635A (ja) | Blogコミュニティ推薦方法及びシステム及びプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 2006524541 Country of ref document: JP |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |