WO2011013490A1

WO2011013490A1 - Information processing device, information processing method, program and web system

Info

Publication number: WO2011013490A1
Application number: PCT/JP2010/061535
Authority: WO
Inventors: 百合子杉嵜; 田原　義則; 亮二黒澤; 隼輔石川
Original assignee: インターナショナル・ビジネス・マシーンズ・コーポレーション
Priority date: 2009-07-28
Filing date: 2010-07-07
Publication date: 2011-02-03
Also published as: US20120284299A1; JP5705114B2; US8725762B2; JPWO2011013490A1

Abstract

In order to prevent information leakage that may possibly be caused by information acquisition through a network, an information processing device (310) includes a request acquisition unit (314) acquiring an original request including a search value specifying information to be acquired; a specificity evaluation unit (324) determining whether or not information to be acquired in response to a request to be issued currently is statistically specific in relation to a request issued in the past; a diffusing request generation unit (316) generating dummy requests including dummy values for imparting resistance to data mining of an access log, until information to be acquired is determined not to be statistically specific; a search request issuing unit (318) issuing a diffusing request as a search request through networks (140, 220) to a database; and a search result extraction unit (320) extracting information acquired in response to a diffusing request from responses to search requests.

Description

Information processing apparatus, information processing method, program, and web system

The present invention relates to network technology, and more particularly, to technology for preventing information leakage that may occur from information acquisition via a network.

In recent years, personal computers, workstations, or server computers are interconnected via a network and share information due to advances in network infrastructure and computer technology. When sharing information, a client computer such as a personal computer (hereinafter simply referred to as a client) issues a request for information to a web server that stores information and responds to the request. Then, when the web server sends information matching the request to the client, the client acquires the requested information.

Requests sent from the client to the server include values that specify information such as numeric data sets and keywords, and the server issues a query to the database etc. by referring to the values. To extract information from the database. That is, the conventional information retrieval method means that information that the client is currently interested in is disclosed to the server.

If the web server can be trusted, it can be said that there are relatively few problems even with conventional information retrieval. However, even if the web server is reliable, a search entity such as an individual or a company that performs a search can determine what information the search entity is currently interested in to obtain information. It is necessary to notify the server.

In recent years, with the advancement of browsing technology, so-called mashup systems that improve the accessibility to information by consolidating information managed by multiple web servers into a single information processing device, such as Web2.0 As popular. The mashup system includes a client, a mashup server, and a plurality of information servers.

The mashup server may be installed in a company or the like, or the ISP (Internet Service Provider) may be a web server installed in the Internet in order to execute mashup processing exclusively. Good. Each information server is provided by an ISP or the like, searches information corresponding to a client request from a database managed by each, and sends the searched information to the mashup server. The mashup server appropriately arranges the acquired information, and displays information acquired from a plurality of information servers on a client via a browser program or the like.

In the mashup system, multiple information servers acquire requests issued by clients in response to information requested by clients. The request is configured as, for example, an SQL (Structured Query Language) statement including a search word and a conditional expression for searching for information to be acquired. The information server acquires search conditions from the received request, and acquires information corresponding to the request by executing a search of a database managed by each information server. The problem here is that none of the information servers connected via the network is necessarily reliable. Even if a reliable information server is used, the request content sent to the information server is analyzed by data mining during log analysis of the information server, and the search purpose on the client side is implicit. It will be known.

That is, as the spread of network technology and the increase in added value of information accumulated on the network, the gradual information leakage that occurs in the information search protocol has become a problem. In this specification, the term “slow information leakage” means that the network access as a group on the network is statistically analyzed by the information server, and as a result, the intention of the group, that is, the search It means that the intention leaks slowly.

For example, Japanese Patent Application Laid-Open No. 11-259512 (Patent Document 1) discloses a data search system that protects search conditions and the location of a searcher as confidential information. Has been. In Patent Document 1, a search is performed by concealing or shielding a part or all of the search conditions as confidential information in the data search device, and the data search device narrows down the search result, thereby performing the data search. More specifically, as a method for hiding or blurring the search condition, the search condition that should be shielded in advance is deleted from all the input search conditions, or the input search condition is similar words or higher ranks. This is done by replacing the concept, adding extra search conditions, or dividing the search conditions.

In the information retrieval protocol described in Patent Document 1, processing such as deletion, superordinate conceptualization, and addition of a search term is performed on the retrieval condition input to the data processing apparatus on the data retrieval apparatus side. As a result, on the data processing apparatus side, an additional program for processing the search condition, a similar word dictionary, and the like are required, and preprocessing for the search condition is required to execute the search process. Further, in the data search device of Patent Document 1, the data search device temporarily stores information hit under a broader condition than the search condition using the shielded search condition, and information that matches the original search condition is again stored. A data editor to search is required. As a result, the data search apparatus is required to acquire and process a larger amount of information than is originally acquired, and the data search apparatus itself must substantially include a secondary database. . For this reason, since the data search device of Patent Document 1 involves waste of hardware resources and program resources, if the amount of information stored on the network is enormous as in the present, the search cannot be made efficient.

In Japanese Patent Laid-Open No. 2002-312377 (Patent Document 2), the input first search condition is changed to the second search condition for searching a wider range, and information search is executed on the search server. The first search result is acquired, the acquired first search result is searched again according to the first search condition, a search result corresponding to the first search condition is generated, and information such as user privacy is stored. A search device for preventing leakage is disclosed.

The search device described in Patent Document 2 also expands the search condition to cause the search device to acquire the extended search result, and searches the search result again to generate a search result that should be originally acquired. As a result, the search device itself must function as a secondary database. As a result, the search device must secure a storage space for the expansion of the search condition, and the search device itself needs to include a certain degree of search capability. Information leakage is not effectively prevented in terms of waste of resources and software resources.

JP 11-259512 A JP 2002-312377 A

As described above, the prior art generates a modified search expression including an original search condition to prevent information leakage due to search, issues a search request to the database, acquires an extended search result, and executes the extended search. The original search result is generated again by using the original search condition and searching again by the search device.

However, when the amount of information that can be accessed via a network has become enormous in recent years, it is possible to cause an extended search result to be acquired by a client or a gateway server for executing a search. The overhead such as waste, re-search processing, and data editing processing cannot be ignored, and it is necessary to implement a small database system level in terms of software, and the information search system itself must be redundantly implemented. Hardware resources and software resources needed to be modified.

In addition, in the conventional method for preventing information leakage, the original search condition is expanded to create a higher level concept, or the extended search condition is generated so as to widen the range, but after all, the original search condition is , Must be included in the extended search criteria. However, the extended search conditions generated by Patent Document 1 and Patent Document 2 do not prevent the leakage of the search intention that the client side intends to search. It does not prevent gradual information leakage.

An object of the present invention is to provide an information processing apparatus, an information processing method, a program, and a web system that prevent a gradual information leakage that may occur from information acquisition via a network. And

The present invention has been made by paying attention to the fact that a gradual information leak occurs in the conventional information retrieval. In the present invention, the past request log is statistically analyzed to retrieve information. If the request is specific from the past request history, a plurality of dummy requests including randomly selected dummy values are generated, and a spread request including a plurality of dummy requests is generated.

Diffusion requests are characterized by data mining, etc. for access logs to web servers that execute search processing when the search target input by the search subject is specific from the past request log. Accumulating data so that it cannot be analyzed improves the data mining tolerance of requests issued by the search subject.

In the present invention, the search target is specified by a numerical data set such as map data and has a continuous attribute that can be obtained by calculation from homogeneous information, as well as company name, stock price, product name, gender, age, arbitrary The present invention can also be applied to an object having discrete attributes such as a character string.

According to the present invention, an information processing apparatus that acquires information via a network, the information processing apparatus includes:
A request acquisition unit that acquires an original request including a search value that specifies information to be acquired from the database;
Specificity for determining whether or not the information to be acquired by a request to be currently issued with respect to requests issued in the past by the information processing apparatus is specific with respect to a request log for registering the search value history An evaluation unit;
If the specificity evaluation unit determines that the search value is specific, the acquisition is performed so as to dilute the specificity of the access log related to the search value for the database issued by the information processing apparatus. A spreading request generation unit that generates a spreading request including a dummy request generated from a dummy value that gives a search value for requesting information different from information that should be information;
A search request issuing unit that issues the spread request as a search request to the database via the network;
There is provided an information processing apparatus including a search result extraction unit that extracts information acquired by the spread request from a response to the search request.

The dummy value of the present invention is stored in a dummy generation information storage unit, and the data mining tolerance can be imparted by reducing the specificity of the search value in the request log. The search request issuing unit of the present invention can issue the spread request including only the dummy request as the search request when the information to be acquired has a continuous attribute. The search request issuing unit of the present invention can issue the spread request including the original request and the dummy request as the search request when the information to be acquired has discrete attributes.

The specificity evaluation unit of the present invention searches the request log for the search value included in the original request, and the number of occurrences of the search value corresponds to an increase rate of the currently determined search value. Generation of the dummy request for the spread request unit can be started by predicting that the average value of the number of occurrences of search values will increase by a threshold value or more.

The original request of the present invention includes a plurality of search values for acquiring different information, and the specificity evaluation unit determines the specificity for each of the plurality of search values, and for each search value The dummy request can be generated and the spreading request issued to the database to be searched for information. The information processing apparatus of the present invention can receive a response from the database, generate a display area for displaying a response corresponding to each of the search values included in the original request, and display the response.

The information processing apparatus of the present invention can be a mashup server implemented in the Web 2.0 paradigm.

According to the present invention, there is further provided an information processing method and program executed by the information processing apparatus, and a web system including the information processing apparatus.

The figure which showed embodiment of the web system 100 of this invention. The figure which showed the web system 200 of 2nd Embodiment of this embodiment. The figure which showed the functional block of the information processing system 300 which produces | generates the spreading | diffusion request of this embodiment. The flowchart of the information processing method of this embodiment. The figure which showed the request log 500 with respect to the specific search object which an original request contains. In the present embodiment, the process of determining the characteristics of the search target in the time chunk that currently stores the request log 500 from the increasing rate of the original request including the specific search target in the specific time chunk. The figure which showed embodiment. The figure which showed embodiment of the access log 700 which the web server 150 records after issuing the spreading | diffusion request by this embodiment with respect to the request log 500 shown by FIG. 5 and FIG. 6 is a detailed flowchart of processing from acquisition of an original request to issuance of a request when searching for information associated with specific numerical data in the present embodiment. The figure which showed the pseudo code of the process of S803-S805 among the processes demonstrated in FIG. The flowchart of 2nd Embodiment of the information search method of this embodiment. FIG. 11 is an embodiment of pseudo code for executing the processing shown in FIG. 10. FIG. The diffusion state of the content of the diffusion request generated in the embodiment when searching for map data. The figure which showed embodiment of the search screen 1300 which the web system of this embodiment displays. The figure which showed embodiment in case this search object has a discrete attribute in this embodiment. The figure which showed embodiment of the access log 1500 produced | generated as a result of the spreading | diffusion request issue by the information processing method of this embodiment.

Hereinafter, the present invention will be described with embodiments, but the present invention is not limited to the embodiments described below. FIG. 1 illustrates an embodiment of a web system 100 of the present invention. Web system 100 includes clients 110-114, mashup server 130, and web servers 150-154. The clients 110 to 114 and the mashup server 130 are interconnected via a network 120 such as a LAN, WAN, or the Internet. The mashup server 130 is interconnected to the web servers 150 to 154 via the network 140. The network 140 is not particularly limited, but a wide area network such as the Internet can be used.

The mashup server 130 and the web servers 150 to 154 can employ almost the same hardware configuration, and a CISC architecture microprocessor such as a PENTIUM (registered trademark) or a PENTIUM (registered trademark) compatible chip, or A RISC architecture microprocessor such as POWERPC (registered trademark) can be implemented in a single-core or multi-core form. In addition, each server is controlled by an operating system such as WINDOWS (registered trademark) 200X, UNIX (registered trademark), LINUX (registered trademark), C, C ++, JAVA (registered trademark), JAVABEANS (registered trademark). Search requests sent from clients 110 to 114 by executing server programs such as CGI, Servlet, APACHE, IIS (Internet Information Server) implemented using programming languages such as, PERL, and RUBY.・ Process the request.

Note that, in a specific implementation form of the mashup server 130, the mashup server 130 can be implemented as a partial function of a gateway server or the like of a company. In another embodiment, the mashup server 130 may be installed in an ISP (Internet Service Provider) that performs a service based on a paradigm such as Web 2.0. Furthermore, the web servers 150 to 154 manage the databases 160 to 164, respectively, and can provide information corresponding to requests via the network 140. In the embodiment to be described, the server 150 is implemented as a company information service providing server, and the server 152 is implemented as a stock price information service providing server. The web server 154 is implemented as a map information service providing server, and processes individual requests from the mashup server 130 and sends them to the mashup server 130.

Each of the clients 110 to 114 acquires information using a plurality of application services. For example, the client 110 acquires information corresponding to the original request issued by the client 110 via the mashup server 130. The mashup server 130 stores information from the plurality of web servers 150 to 154 in association with the client 110, and presents the information to the client 110 as composite information.

For example, when the client 110 desires to acquire specific company information, stock price information, and map information at the same time, the mashup server 130 determines each application based on an original request sent from the client 110, for example.・ Generate a spread request to be sent to the web servers 150 to 154 that provide the service, send the spread request to each of the web servers 150 to 154, and create an original from the information acquired corresponding to the spread request The result corresponding to the request is acquired, and is combined with, for example, a web page as composite information and sent to the client 110.

The term “diffusion request” referred to in the present embodiment corresponds to the type of search target included in the original request issued by the client, and is sent to the web servers 150 to 154 generated for each search target attribute. Means a request to be made. A spread request is a single, dummy value that is generated so that it is difficult for the web server to analyze the characteristics of the original request by statistically mining data using the access log. Generated as a request or set of requests.

Clients 110-114 can be implemented using a personal computer or workstation, etc., and the microprocessor (MPU) may include any single-core or multi-core processor known so far. . The clients 110 to 114 may be controlled by any known operating system such as WINDOWS (registered trademark), UNIX (registered trademark), LINUX (registered trademark), or MAC OS. The clients 110 to 114 access the mashup server 130 and the web servers 150 to 154 in order to access Internet Explorer (registered trademark), Mozilla (registered trademark), Opera (registered trademark), and FireFox (registered trademark). Browser software such as can be implemented.

Between the clients 110 to 114, the mashup server 130 and the mashup server 130, and the web servers 150 to 154, the data is transferred using a file transfer protocol such as HTTP or HTTPS using a transaction protocol such as TCP / IP. Transmission / reception is performed. In addition, the mashup server 130 implements JDBC (Java (registered trademark) Database Connectivity), ODBC (Open Database Connectivity), etc. to access the database of the web servers 150 to 154, and is defined by JDBC. An application level protocol can connect to the web servers 150-154.

In the embodiment shown in FIG. 1, the request issued by the client 110 is intercepted once by the mashup server 130. Then, the mashup server 130 performs statistical processing with reference to the past request log. As a result of the statistical processing, the mashup server 130 determines that the search value for designating the information to be acquired included in the request reflects the specific search intention based on the request history. And a spread request is issued to the web servers 150 to 154 that manage the search target information. Each of the web servers 150 to 154 receives the spread request, searches the databases 160 to 164 managed by each, extracts information corresponding to the request, and returns it to the mashup server as a response. The mashup server 130 forms a web page having a display area for simultaneously displaying the response on the desktop screen from the responses received from the web servers 150 to 154, and assigns each response to the display area. By displaying the request, the client 110 that issued the request browses.

FIG. 2 shows a web system 200 according to the second embodiment of the present embodiment. The web system 200 shown in FIG. 2 implements a mashup application in which a plurality of clients 210 to 214 are implemented as an extended application of a web browser, for example, a plug-in program or an add-in program. Instead, the web system 200 does not use a dedicated server such as the mashup server 130. In the embodiment shown in FIG. 2, the function of the mashup server 130 in FIG. 1 is implemented as the function of the clients 210 to 214, and the spreading request issued from the original request to each of the web servers 230 to 234 is transmitted. Generate and issue a spread request to the web servers 230 to 234, filter the search result corresponding to each spread request, and display it as composite information on the browser program.

On the other hand, the web servers 230 to 234 have the same configuration as that of the embodiment shown in FIG. 1, and return the searched information to the client 210 or the like in response to the spread request from the client 210 or the like.

In this embodiment, when generating a spread request as a single request, search values for specifying a search target of a dummy request are generated by combining them with an operator OR. In addition, when generating a spread request as a request set, a request including a plurality of dummy requests including dummy values so that the access log is statistically uniform, for example, the occurrence frequency of the search target is white noise.・ Create a set. In either case, the dummy request refers to the request log and relates to the time scale for each search target so that the client-side search intent is not extracted by data mining on the web server. , Select the request content to average. Note that the spread request may include the original request or may not include the original request at all according to the attribute of the information to be searched.

FIG. 3 shows functional blocks of the information processing system 300 that generates the spreading request of this embodiment. The information processing system 300 shown in FIG. 3 corresponds to the mashup server 130 in the embodiment of FIG. 1, and corresponds to the clients 210 to 214 in the embodiment shown in FIG. In each embodiment, although each functional block is implemented as a server application or a client application, each functional block of the information processing system 300 is processed by a microprocessor. This is realized by reading a program for causing the apparatus to function as each functional means into a RAM, which is an execution space, and executing the program.

As shown in FIG. 3, the information processing system 300 includes an information processing device 310 and an input / output device 330 including a display device, a keyboard, a mouse, and the like. The information processing apparatus 310 sends a spread request to the

networks

140 and 220 via the network adapter 312 and obtains a response from the web server corresponding to the spread request. The information processing apparatus 310 further includes a request acquisition unit 314, a diffusion request generation unit 316, and a dummy generation information storage unit 322. Further, the information processing apparatus 310 includes a request log 328 that stores requests sent from the information processing apparatus 310 to the web server in time series.

In the embodiment in which the information processing apparatus 310 is implemented as the mashup server 130, the request acquisition unit 314 acquires original requests from the clients 110 to 114 via the network 120. When the information processing apparatus 310 accesses the web servers 230 to 234 without using the mashup server 130, an original request including a search condition input by the operator is acquired via the input / output apparatus 330. . The diffusion request generation unit 316 refers to the request / log storage unit 328 and determines the specificity of the original request acquired by the request acquisition unit 314 in the past request log.

The diffusion request generation unit 316 acquires a dummy value used to generate the diffusion request corresponding to the determination result from the dummy generation information storage unit 322, and the specificity evaluation unit 324 determines that the specific search target is specific. A dummy value is generated until it is determined that there is no dummy value and included in the spread request. Note that the peculiarity of the original request is determined by, for example, a threshold set by the mashup server 130 or the clients 210 to 214 for the number of appearances of the search target in a specific time scale in relation to the request issuance managed. Can be done using. Furthermore, it can be determined by performing more advanced statistical processing according to the processing capability of the information processing apparatus 310.

In the present embodiment, the spread request generated by the spread request generation unit 316 is created by different processing depending on the attribute of data to be processed by the

web servers

150, 152, and 154. In this embodiment, the spread request is generated in order to make it difficult to analyze the time-series threshold behavior related to a specific target of the access log managed by each web server 150 to 154 statistically. The The target information to be searched is not particularly limited, but in this embodiment, the information to be searched is classified into information having a continuous attribute and information having a discrete attribute.

The information having the continuous attribute described above is a homogeneous information excluding data to be searched for values for characterizing information to be searched, for example, position coordinates, longitude, latitude, altitude, time, period, etc. Information having an attribute that can be acquired by a preset operation such as extrapolation, interpolation, and movement from the. More specifically, examples of information having continuous attributes include position coordinates and latitude / longitude data.

On the other hand, the information having the above-mentioned discrete attribute is an attribute that has a possibility that the data may fluctuate independently of other homogeneous information and needs to directly access the data to be searched to acquire the data. Is defined as information having More specifically, examples of information having discrete attributes include company stock price information, business performance information, M & A (Mergers and Acquisitions) information, and other information related to company activities and group activities.

The dummy generation information storage unit 322 can be implemented as a database or a table and can register, for example, company names, addresses, latitude / longitude information, and the like in association with attributes of information requested by the request. In another embodiment, when the information to be acquired is discrete such as the stock price, performance, product, and topics of a specific company, a dummy for each category is used to reduce the specificity of the original request for each attribute of the information. Information that can be used as values can be registered.

The peculiarity evaluation unit 324 receives the original request and analyzes the request log. When the received original request deviates from the average value of the access information of the request log, refer to the dummy generation information, Until the original request is determined to be non-specific as determined from the request log, the diffusion request generation unit 316 causes generation of the diffusion request including the dummy request including the dummy value.

The information processing apparatus 310 includes a search request issuing unit 318 and a search result extracting unit 320. The search request issuing unit 318 sets the original request and the dummy request generated including the dummy value in the SQL query, and issues them to the web server via the

networks

140 and 220. Note that the spread request generation unit 316 determines whether to pass the value specified in the original request to the search request issuing unit 318 according to the attribute of whether the information to be acquired is continuous or discrete. to decide.

More specifically, for example, when the original request requests information having continuous attributes described as a numerical data set, a vector, etc., the value specified in the original request is not set in the search request. In any case, it is possible to acquire information specified by the original request by using extrapolation, interpolation, or relative difference. For this reason, in the case of information having a continuous attribute, the dummy request is generated so that the web server 154 can reach the target information by another request from the client, not the target information.

On the other hand, if the information has discrete attributes specified by stock price information, company name, organization name, search character string, etc., the search purpose cannot be achieved unless the target information is directly searched. For this reason, the diffusion request generation unit 316 generates a request for requesting information that is the same quality as the information to be searched using the value described in the original request and is different from the information to be searched. And the dummy request is passed to the search request issuing unit 318 together with the original request to generate a diffusion request. For this reason, the search result extraction unit 320 receives the response of the original request together with the response of the dummy request.

The search result extraction unit 320 filters the search result sent from the web server as necessary, and displays the search result on the display device of the input / output device 330 via the input / output interface / browser 326. In addition, when the operator of the information processing apparatus 310 acquires a map or the like as a search result, the operator should adjust the display area or scale with a mouse or the like, and additionally issue a relative movement request or the like to acquire the original request. Update search results sequentially so that you can access the information.

FIG. 4 is a flowchart of the information processing method of this embodiment. The process of FIG. 4 starts from step S400, and an original request is acquired in step S401. The original request is acquired from the network 120 or the input / output device 330 by the information processing apparatus 310 according to the embodiment of FIG. 1 or the embodiment of FIG. For example, an original request is generated to acquire composite information with {C _i , S _i , G _i } as a search target for searching for information acquisition of company information, stock price information, and map information. To do. The information processing apparatus 310 separates the search request included in the original request, calls the specificity evaluation unit 324 in step S402, and first, for each search request included in the original request, is diffused in relation to the time course. Determine whether or not The process executed by the specificity evaluation unit 324 will be described later in more detail.

When it is determined that the request to be sent to the web server is not spread in terms of contents and time based on the determination using the request log (no), in step S403, the diffusion request is referred to by referring to the dummy generation information. Then, the process returns to step S402 again to determine whether or not the content of the request is spread.

If it is determined in step S402 that the content of the request is diffused by comparison with the request log, that is, it is not specific (yes), the request is transmitted in step S404.

In step S405, it is determined whether or not a response from the web server has been received. If the response has not been received (no), the process is repeated until the response has been received. On the other hand, when the reception of the response from the web server is completed in step S405 (yes), in step S406, the information processing apparatus 310 merges the responses corresponding to the original request and displays them in the browser. Note that the process of step S406 can include a process of filtering data to be browsed according to the attribute of the received data. When the browsing on the client display device is completed, the process ends in step S407 and waits for the subsequent input of the original request.

Hereinafter, an exemplary embodiment of processing executed by the specificity evaluation unit 324 will be described in the present embodiment. FIG. 5 shows a request log 500 for a particular search target included in the original request for exemplary purposes. In FIG. 5, the vertical axis represents the number of accumulated requests in the i-th time chunk (i = 0,..., P: p is a non-negative integer) for the same search target included in the original request. It is SN _i , and the horizontal axis shows the passage of time. The time chunk can be set as appropriate, for example, in minutes, hours, days, weeks, months, etc., for the purpose of diluting the specificity of the original request. Further, the request log 500 can be generated and stored for each specific unit of request issuance, and the request issuer unit can be a client unit, a business unit unit, or a company unit.

Whether the original request requests the same search target, for company information, the text match for the same company name, and for stock price information, the text match for the company name or stock code, etc. It can be used and judged. Further, a search target specified by numerical data such as map information can be determined by matching within a numerical range of longitude / latitude set around a specific latitude / longitude. In the case of searching for map information, the latitude / longitude range set according to whether it is an urban area or a non-urban area can be changed. The identity of the search target may be determined using whether or not there is a common landmark within a specific range in relation to the designated.

The information processing apparatus 310 generates, for each search target, a search target issued as an original request for a specific search target in units of time chunks given at an appropriate processing interval from the start of recording of the request log 500. Register as N. Then, in the time chunk in which the request log 500 is currently accumulated, the number of occurrences of original requests including the target search object is detected in units of original requests.

The peculiarity evaluation unit 324 checks the increase rate per request of the search target to be noticed at the stage where the currently accumulated time chunk is completed, and the time chunk is determined to be specific in the request log 500. Whether or not the search is to be performed is determined based on the currently determined time chunk TC _p . As shown in FIG. 5, in the time chunk immediately after the start of request log recording, any original request issued in the time chunk is determined to be specific, and a spread request is generated.

On the other hand, since the history of search requests is accumulated with the lapse of time recorded in the request log 500, it is necessary to determine whether or not the search target currently determined including the past history is specific. . When the characteristics are determined in consideration of the past history, the average value Nav of the number of requests over the time chunk TC _i of the requests up to the current time chunk TC _p for the specific search target, and the current time chunk TC _p Using SN _p as the number of requests predicted to be acquired for, for example, that the specificity index SN _p given by the following equation (1) is larger than the probability error from the average value N _av Judgment can be made.

In the above equation (1), σ _error is a probability error with respect to the number of requests for the search target over N _av time chunks, α is a positive real number multiplied by the probability error, α ≧ 1. N _av is given by the following formula (2), and is updated sequentially when the time chunk currently being recorded is completed.

In the above equation (1), the value p for identifying the time chunk increases as the request log is recorded, but the first time chunk that starts the singularity evaluation process forms a singularity. In order to correspond to the singular point, in the case of p = 0, the processing is started assuming that the search value is always singular. In addition, in a case where the current of the time chunk TC _p previously all requests not been made, the case that will be SN _p number of requests for the first time in the current time-chunk TC _p is issued, to always specific also However, as long as p> 0, no specific processing is performed, and determination is performed according to the above formula (1). On the other hand, there is little possibility that there is no identical search target in the past. For this reason, if a request log that does not include a specific search intention for a specific search target can be approximated as white noise around N _av over a time chunk, the above-described equations (1) and (2) are changed. Thus, SP can be defined using a probability density function and variance given in a binomial distribution. Furthermore, when a relationship is assumed between the search targets, it is assumed that the request log 500 has a multidimensional normal distribution, and the search target is searched using a multidimensional normal distribution and a variance-covariance matrix. Correlated specificity may be determined.

In addition, the specificity evaluation unit 324 of the present embodiment indicates that the last time chunk in FIG. 5 is a time chunk that is currently recording a request log, and designates a specific search target at the current time point. The search value is accumulated up to SN _current . This increased rate, increasing until the end of time chunks, predicts that the number of occurrences until SN _predict increases, using the above equation (1) based on the prediction, it is determined as being specific. The prediction determination embodiment of this embodiment will be described later in more detail.

FIG. 6 shows the characteristics of the search target in the time chunk in which the request log 500 is currently accumulated from the increase rate of the original request including the specific search target in the specific time chunk in this embodiment. An embodiment of a process for determining In FIG. 6, the vertical axis indicates the cumulative number in the search target time chunk TC _m (m = 0, 2, 3,..., N) included in the original request, and the horizontal axis indicates the request log 600. The time course of is shown. Further, the request log 600 is individually inspected as search objects C _i , S _i , N _i , and O _i . The cumulative number in each time chunk is indicated by a bar, and the black hatched bar is a time chunk that has already been recorded.

In FIG. 6, time chunks that are determined to be specific on the request log 600 for a specific search target are indicated by black triangles on the bar. The search target marked with a black triangle has not been requested in the past, and was first detected in the time chunk indicated by the black triangle. In addition, the time chunk indicated by the white bar is a time chunk that is currently accumulated.

Even if it is determined that the specific search request for the time chunk is specific at the end of the time chunk, since the original request has already been issued, the web server 150 or the like analyzes the access log. Thus, it is possible to determine the search intention of the original request issuer.

In this embodiment, before sending the original request to the web servers 150 to 154, the specificity evaluation unit 324 intercepts the original request and determines the content of the individual search request. This allows the original request acquired by the information processing apparatus 310 to determine the increase rate of the search target within a specific time chunk. That is, the peculiarity evaluation unit 324 accumulates search values specifying a specific search target in the time chunk, calculates an increase rate with respect to the total number of original requests, and performs linear extrapolation until the end of the time chunk. Then, extrapolation is performed by an appropriate method such as polynomial extrapolation or exponential extrapolation, and the extrapolated result is integrated within the currently accumulated time chunk to predict the number of occurrences. In FIG. 6, the search object O _{i that} is accumulating in the last time chunk is shown with a white triangle for the purpose of indicating that it is determined to be specific when the time chunk is completed. When it is determined that the estimated number based on the prediction satisfies the above formula (1), the diffusion request generation unit 316 is instructed to generate the diffusion request, and the dilution process of the specificity level is started.

This process is shown for the search target O _i in FIG. 6, and the occurrence number SN _current accumulated so far in the time chunk currently being processed is not determined to be specific, When the cumulative rate of increase until the end of the time chunk, is expected to be a value of SN _predict, SN _predict indicates that it will be judged that specific. As a result, the specificity level is predicted and evaluated in the currently accumulated time chunk, so that it is possible to perform specificity dilution processing in the time chunk during request log recording of the time chunk. Thus, gradual information leakage can be prevented.

The diffusion request generation unit 316 of the present embodiment individually determines the specificity of the search target included in the original request, and generates a diffusion request when determining that the search target is specific. The spread request is generated by modifying the original request so that the information to be searched is not specific as viewed from the request log 500. The modification of the original request can be performed in the exemplary embodiment as follows.

○ When a request is issued with numerical data such as map information, a single or multiple dummy requests including dummy values that are randomly corrected numerical data beyond the numerical range where the search target is determined to be the same Generate. In a further preferred embodiment, a plurality of dummy values are selected and set in the spread request so that the frequency spectrum becomes equal when Fourier transform is performed in relation to latitude and longitude. However, as an actual problem, if the frequency spectrum is uniform to some extent, it is considered that data mining becomes difficult. Therefore, it is not always necessary to equalize the frequency spectrum completely.

○ In the case of attribute information for which a corresponding value such as company information or stock price information must be acquired directly, it is the same as the search target company or stock brand that is determined to be specific from the dummy generation information storage unit 322 Randomly extract company names, stocks, and stock codes that are classified into different types of business, and acquire dummy information in an appropriate number so that the number of requests generated within the category is white noise. To do.

The dummy request can be configured as the same set of search values as the original request, but since the information processing apparatus 310 individually accesses the database, the dummy request is a single request. It can be generated as a search request including a search value. The generated dummy request is randomly selected including the search value that is the true purpose of the search, and is sent to the corresponding databases 160 to 164, 240 to 244, and the like.

FIG. 7 shows an embodiment of an access log 700 recorded, for example, by the web server 150 after issuing a spread request according to the present embodiment to the request log 500 shown in FIGS. The information processing apparatus 310 detects the number of requests sent to the company information providing server 150 for each specific search target for each specific time chunk, and calculates the statistical peculiarity of the search target included in the original request while accumulating it. Determine gender. When it is determined that the search target is specific, the information processing apparatus 310 issues a dummy request, dilutes the specificity of the search target, and sends an access log to the specific issuer on the web server 150 side. FIG. 7 shows that the access specificity is diluted, that is, white noise is generated so that the specific search value does not show a prominent tendency as shown in FIG.

The information processing apparatus 310 also has a case where the web server 150 is accumulated as an access log from a specific information processing apparatus 310 even when a search target that should be a dummy value for the true request TR is included in the original request A dummy request is generated so that the access log approaches white noise. The true request is a search value included in the original request and means a request reflecting a searcher's specific intention. When the information processing apparatus 310 determines that the true request TR ₁ is specific in the time chunk TC ₁ , the information processing apparatus 310 extracts a search target whose specificity is diluted from the dummy generation information storage unit 322 and outputs the dummy request DR. Set to (Dummy Request).

FIG. 8 is a detailed flowchart of processing from acquisition of an original request to issuance of a request when searching for information associated with specific numerical data in this embodiment. For example, the information to which FIG. 8 is applied can be applied as long as the map data characterized by a numerical data set such as position coordinates or the information specified by the numerical data is continuous. In the embodiment of FIG. 8, the input value can be input as, for example, a value such as longitude and latitude, or can be input as a company name, a place name, and the like. When the original request is input as a company name, a place name, an address, or the like, the process can be executed by replacing the latitude / longitude data stored in the dummy generation information storage unit 322.

The process of FIG. 8 shows details of steps S402 and S403 of FIG. 4 and is started after acquiring the original request in step S401. In step S800,

Cx and cy that satisfy the above are generated using the function rnd (). In the above formula (3), (x, y) is numeric data specified by the original request, and w and h are values specifying the range of the numeric data. Note that cx and cy are numerical data generated by random numbers and correspond to longitude and latitude values. Further, gx and gy are the barycentric points (average coordinates) of the coordinates including the past history.

In step S801, the time stale index ti is initialized to zero. The time scale index ti is defined in minutes, hours, days, months, etc., and defines a time scale for determining how far the original request is not specific. Specifically, t0 specifies that one minute, t1 refers to one hour, and t2 refers to a request log issued during the time scale of one day. In addition, as t3 and t4, it is possible to refer to request logs in units of months or quarters.

In step S802, it is determined whether or not ti exceeds the number of elements in the ta array. If ti <smaller than the number of elements in the array of ta (yes), the past history for ta [ti] time is determined in step S803. The coordinates of the new center points gx and gy are calculated from the history coordinate group and cx and cy using the average value of the coordinate group values. In step S804, a distance L between (x, y) and (gx, gy) is calculated. When a search value is given as a coordinate group, the number of requests generated for L and L gives a measure of the specificity of the original request, and each is statistically processed as a probability error or the like. Note that the distance L used in the present embodiment may be a Euclidean distance, a Manhattan distance, or an appropriate topological distance defined between feature values defined by feature coordinate axes.

In step S805, when it is determined that the distance L is equal to or less than a threshold value that is a value that the original request is assumed to be non-specific (yes), the time scale index ti is incremented by 1 in step S806, and then processing is performed. Is returned to step S802 to determine the specificity in another time scale. On the other hand, if the distance L is not less than or equal to the threshold value in step S805 (no), the process returns to step S800 to generate additional cx ′ and cy ′, and the calculation is repeated until the distance L is less than or equal to the threshold value.

In step S802, when calculation of the set timescale index is completed (no), since no period of the set timescale is specific, control is passed to step S404, and {(cx, cy)} is set as numerical data and issued as a spread request. In this embodiment, since the information is numerically continuous, (x, y) that is the value of the original request is not included in the spread request.

8 is executed, the information processing apparatus 310 can acquire the information acquired as the original request without sending the numerical data specified as the original request to the web server. The process of FIG. 8 can be effectively applied when the information can be specified by numerical data and has topologically continuous attributes. As a preferred embodiment to which FIG. 8 is applied, there is a map search and the like. Furthermore, as another preferred embodiment of the present embodiment, when there is data in the extracted (cx, cy) set that is rounded to the position coordinates of a specific landmark that is not (x, y), the landmark Can be arranged in the area given by (w, h) and the retrieved information can be displayed.

FIG. 9 shows a pseudo code of the processes of S803 to S805 among the processes described in FIG. The pseudo code block 900 corresponds to the processes of steps S804 and S805, and the pseudo code block 910 corresponds to the process of step S803 of FIG. In block 910, variable = list. The number of elements () included is the number of request log elements included in the time scale specified by ti.

FIG. 10 is a flowchart of the second embodiment of the information search method of this embodiment, and corresponds to steps S402 to S405 of FIG. The embodiment of FIG. 10 can be suitably applied when the search target information has discrete attributes. The processing in FIG. 10 starts from step S1000 after obtaining the original request in step S401. The symbols used in FIG. 10 are defined as shown in Table 1 below.

In step S1000, the index value of the attribute matching kn is obtained from the array k [] and set to the variable ti. In step S1001, the number of access candidates to be extracted as a dummy request is initialized to null, the number of accesses corresponding to the index value ti is set to count 1, and ti is set as the value of index = 0 in the access candidate index array. To do. Thereafter, the number of access candidates is incremented by one. In step S1002, d = | {Avg (ac [0], ac [n-1])-ac [ti] | is calculated, and the difference from the average value of the access count is calculated.

In step S1003, it is determined whether or not | d | is less than or equal to a threshold value. If it is not less than or equal to the threshold value (no), an index value specifying an attribute name to be extracted as a dummy request is given in step S1004, and 0 ≦ dc An integer value dc satisfying the condition that ≦ N−1, dc ≠ ti and not already extracted is generated using an rnd () function or the like. As the threshold value, the probability error of the request log described in the equation (1) can be used. However, when a different criterion is used to determine the specificity, a corresponding appropriate threshold value is set. Can be set.

In step S1005, the value corresponding to the index value dc in the access count array ac [] is updated, the index value dc is set to the value of the access candidate index array c [ci], and the counter ci of the number of access candidates is set to 1. Increment, return the process to step S1002, and repeat the above-described process until a positive value is returned in the determination of step S1003.

On the other hand, if it is determined in step S1003 that | d | is equal to or smaller than the threshold value (yes), the process branches to step S1006, and in step S1006, this element of array c [] is randomly sorted, The extraction history is erased, and the loop index i is initialized to 0 in step S1007. Thereafter, if the loop counter i is less than ci in step S1008 (yes), k [c [i]] is set in the search character string of the request constituting the spread request in step S1009, and Perform access. In step S1010, it is determined whether c [i] is equal to ti. If c [i] == ti (yes), the loop counter is incremented by 1 in step S1012, and the process branches to step S1008. The dummy access is executed until a negative result is returned in step S1008.

If c [i] == ti is not satisfied in step S1010 (no), the access result is stored in step S1011, the process branches to step S1012, and a negative value is returned again in step S1008. Repeat the process. If a negative result is returned in step S1008 (no), control is passed to step S406, and the process of FIG. 10 ends. In the process of FIG. 10, the target attribute name to be accessed is described as being performed in a process other than the process of FIG. 10, but by omitting the process of step S <b> 1010 and storing all the access execution results, FIG. In this process, it is possible to complete the process for all access candidates to be accessed.

Further, also in the determination in step S1003, the access history can be spread over the time scale for spreading the access history by using the time scale index for calculating the average value as described in step S803 in FIG. .

FIG. 11 shows an embodiment of pseudo code for executing the processing shown in FIG. A block 1100 corresponds to the processing step S1003 of FIG. 10, and a block 1110 corresponds to the processing of the block 1020 of FIG. In the pseudo code shown in FIG. 11, the access execution for the attribute name to be searched is described as an embodiment executed after the block 1110 ends. However, as described with reference to FIG. By executing it in the loop, the data mining tolerance can be further improved.

FIG. 12 shows the diffusion state of the content of the diffusion request generated in the embodiment when searching for map data. The vertical and horizontal axes in FIG. 12 correspond to the vertical and horizontal widths w and h of the display area, respectively. FIGS. 12 (a) to 12 (c) show changes in the diffusion state when the initial conditions for generating random numbers are different. As shown in FIG. 12, the spread request includes position coordinates sufficiently separated from the target coordinates (0, 0) given as the original request, and the points of the dummy request constituting the spread request are sufficiently random. The distribution is shown, indicating that the data mining tolerance of the request can be improved.

FIG. 13 shows an embodiment of a search screen 1300 displayed by the web system of this embodiment. FIG. 13 shows a search in the case where the information search embodiment according to the present embodiment is applied to information having request contents set as numerical data and having continuity. In the embodiment shown in FIG. 13, the landmark 1312 is a search target. In the web system of this embodiment, the operator of the information processing apparatus 310 inputs the position coordinates of the search target 1312, the company name, and the like. When the position coordinates and the company name are input, the information processing apparatus 310 uses the processing shown in FIG. 8 to determine whether or not the search target 1312 is a specific access from the request log history analysis. To judge.

In the embodiment to be described, the information processing apparatus 310 determines that access to the search target 1312 is specific (| d |> threshold), and in the embodiment to be described, out of the position coordinates extracted as the access candidate The response of the request corresponding to the position coordinates registered as the landmark 1314 is filtered to display the map image 1310. Note that the spread request issued by the information processing device 310 is issued as a request set generated according to the processing of FIG. 8, and improves the data mining tolerance of the request related to the search target 1312.

The user who has acquired the map image 1310 scrolls the map data using a mouse or the like, moves the map data to a landmark 1316 such as a park, and moves the search target 1312 to the vicinity of the center to display the map data 1320. In addition, since map data is continuously registered on the two-dimensional plane, scrolling of map data does not include a value for specifying a search object as in a request, and is moved with respect to default map data. Therefore, the data specifying the search target 1312 is not transmitted to the web server and does not affect the data mining tolerance.

FIG. 14 shows an embodiment when the search target has discrete attributes in this embodiment. In the embodiment shown in FIG. 14, information acquired from a plurality of web servers by the mashup server 130 is mashed up and provided as one desktop screen 1400. In the embodiment shown in FIG. 14, an operator who performs a search performs a search with the intention of acquiring stock price information of a specific company “FGH” in LosLoAngels. In the embodiment shown in FIG. 14, the operator inputs a search character string such as FGH, stock price, map, and topics, but the mashup server 130 does not send the original request input by the operator as it is. Instead, the mashup server 130 refers to the company information and the request log stored in the dummy generation information storage unit 322, and obtains information such as a stock price in addition to the search target company “FGH”. Is acquired as an access candidate, and a search request is issued to the web server as a diffusion request.

In the embodiment described with reference to FIG. 14, all the search results acquired by the above-described processing are acquired without being filtered, and are displayed as a search result list in the order corresponding to the requests sent randomly to the display frame 1410. As shown in the display frame 1410, a search result is also obtained for the search target company “FGH”. However, since the search result is obtained together with a search request for other company information, the search target company name “FGH” in the request is obtained. ", The relative weight in the access log is reduced. In addition, according to the present embodiment, a statistically processed diffusion request of the request log is sent to the web server. Therefore, durability against data mining for analyzing the access log in the web server that accepted the request is received. Can be improved.

The display frame 1420, the display frame 1430, and the display frame 1440 display the map position, stock price fluctuation, and topics corresponding to the search result currently selected by the operator in the search result list of the display frame 1410. ing. When the operator selects another search result as the search result of the display frame 1410, the display content in each

display frame

1420, 1430, 1440 is changed in cooperation with the selection of other search results, and a plurality of web servers It is possible to efficiently present independent information from.

FIG. 15 is a diagram for explaining the access effect generated as a result of issuing a diffusion request in order to explain the effect of improving the data mining tolerance for preventing the gradual information leakage from the access log by the information processing method of this embodiment. A log 1500 is shown. In FIG. 15, an access log 1510 is an embodiment in a specific web server when the spread request of this embodiment is not used, and an access log 1520 is a case where the spread request of this embodiment is applied. Fig. 4 illustrates an embodiment in a particular web server. Note that the vertical axis represents the total number of accesses for each search character string for access logs within a specific period.

Also, for convenience of explanation, it is assumed that company names A, B, C, and D are company names that are added to the spreading request. It is assumed that the search target company name is Company C.

In the access log 1510, the search target company name input by the operator is transmitted as it is to the web server, so that a request including the search target company name C company is prominently recorded as an access log. For this reason, on the web server side, it is possible to trace the transition of the search target of a specific enterprise or individual by performing data mining on the access log in time series. For example, when the number of accesses to the company C increases after a specific date and time, the searcher leaks to the site that operates the web server that he / she is interested in the company C after the specific date and time. For example, important information such as TOB (TakeOver Bid) and merger is less likely to be gradually leaked.

The access log 1520 indicates an access log generated on the web server side when the spread request of this embodiment is used. In the present embodiment, the request log is statistically processed to generate a spread request, and a request set is issued to the web server. As a result, it is shown that the access frequency to each company is leveled within the range of the threshold value | d | that defines the specific specificity, and the data mining resistance against the access log is improved. FIG. 15 exemplifies the company name. In this embodiment, in addition to this, various information such as a specific character string in the geography / region name, product name, age, sex, group name, SNS, etc. Applicable and can prevent gradual information leakage about search intentions related to market research, future trends, corporate activity, and network activity, respectively.

In addition, although this invention was demonstrated as a process which each function means and each function means perform in order to make an understanding of invention easy, this invention is not limited to a specific function means mentioned above performing a specific process. A function for executing the above-described processing can be assigned to any functional means in consideration of efficiency such as processing efficiency and implementation programming.

The above-described functions of the present invention include C ++, Java (registered trademark), JavaBeans (registered trademark), Java Applet (registered trademark), JavaScript (registered trademark), Perl, Ruby and other object-oriented programming languages, SQL and other search-only languages, etc. It can be realized by a device-executable program described in the above, and can be stored in a device-readable recording medium and distributed or transmitted and distributed.

DESCRIPTION OF SYMBOLS 100 ... Web system, 110-114 ... Client, 120 ... Network, 130 ... Mashup server, 140 ... Network, 150-154 ... Web server, 160-164 ... Database, 200 ... Web system, 210-214 ... Client, 220 ... Network, 230-234 ... Web server, 240-244 ... Database, 300 ... Information processing system, 310 ... Information processing device, 312 ... Network adapter, 314 ... Request acquisition unit, 316 ... Diffusion request generation 318 ... Search request issuing unit 320 ... Search result extracting unit 322 ... Dummy generation information storage unit 324 ... Specificity evaluation unit 326 ... Input / output interface / browser 328 ... Request log storage unit 330 ... Input Output device

Claims

An information processing apparatus that acquires information via a network, wherein the information processing apparatus includes:
A request acquisition unit that acquires an original request including a search value that specifies information to be acquired from the database;
Specificity for determining whether or not the information to be acquired by a request to be currently issued with respect to requests issued in the past by the information processing apparatus is specific with respect to a request log for registering the search value history An evaluation unit;
If the specificity evaluation unit determines that the search value is specific, the acquisition is performed so as to dilute the specificity of the access log related to the search value for the database issued by the information processing apparatus. A spreading request generation unit that generates a spreading request including a dummy request generated from a dummy value that gives a search value for requesting information different from information that should be information;
A search request issuing unit that issues the spread request as a search request to the database via the network;
An information processing apparatus comprising: a search result extracting unit that extracts information acquired by the spread request from a response to the search request.
The information processing apparatus according to claim 1, wherein the dummy value is stored in a dummy generation information storage unit, and data mining tolerance is imparted by reducing the specificity of the search value in the request log.
The information processing apparatus according to claim 2, wherein the search request issuing unit issues the spread request including only the dummy request as the search request when the information to be acquired has a continuous attribute.
The information according to claim 2, wherein the search request issuing unit issues the spread request including the original request and the dummy request as the search request when the information to be acquired has discrete attributes. Processing equipment.
The peculiarity evaluation unit searches the request log for the search value included in the original request, and determines the number of occurrences of the search value corresponding to the increase rate of the search value currently determined. The information processing apparatus according to claim 4, wherein the generation of the dummy request for the spread request unit is started by predicting an increase in the average value of occurrences by a threshold value or more.
The original request includes a plurality of the search values for acquiring different information, and the specificity evaluation unit determines the specificity for each of the plurality of search values, and the dummy request for each of the search values. The information processing apparatus according to claim 5, wherein a request is generated and the spread request is issued to the database for which information is to be retrieved.
The information processing apparatus according to claim 6, wherein the information processing apparatus receives a response from the database, generates a display area for displaying a response corresponding to each of the search values included in the original request, and displays the response. Information processing device.
The information processing apparatus according to claim 7, wherein the information processing apparatus is a mashup server implemented in a Web 2.0 paradigm.
An information processing method for acquiring information via a network, the information processing method comprising:
Obtaining an original request containing a search value specifying information to be retrieved from the database;
Determining whether or not the information to be acquired by a request to be currently issued with respect to requests issued in the past by the information processing apparatus is specific with respect to a request log for registering the history of the search value; ,
When it is determined that the search value is specific in the step of determining whether or not it is specific, the specificity of the access log related to the search value for the database issued by the information processing apparatus is determined. Generating a diffusion request including a dummy request generated from a dummy value that provides a search value for requesting information different from the information to be acquired so as to be diluted;
Issuing the spread request as a search request to the database via the network;
An information processing method for executing the step of extracting information acquired by the spread request from a response to the search request.
The step of generating the spread request includes a step of obtaining a dummy value for reducing the specificity of the search value in the request log from the dummy generation information storage unit and setting the dummy value in the dummy request. 9. The information processing method according to 9.
The step of issuing the spread request includes the step of issuing the spread request including only the dummy request as the search request when the information to be acquired has a continuous attribute. Information processing method.
The step of issuing the spread request includes the step of issuing the spread request including the original request and the dummy request as the search request when the information to be acquired has discrete attributes. 11. The information processing method according to 11.
The step of determining whether or not it is specific includes searching the request log for the search value included in the original request, and determining the number of occurrences of the search value from an increase rate of the currently determined search value. The information processing method according to claim 12, further comprising a step of starting generation of the dummy request by predicting that the average value of occurrences of the corresponding search values increases by a threshold value or more.
The original request includes a plurality of search values for obtaining different information, and determining whether or not the original request is specific includes determining the specificity for each of the plurality of search values. Including
Generating the spreading request includes generating the dummy request for each search value;
The information processing method according to claim 13, wherein the step of issuing the spread request includes the step of issuing the spread request to the database for which information is to be searched.
The information processing method according to claim 14, wherein the information processing device is a mashup server implemented in a Web 2.0 paradigm.
An information processing apparatus is an executable program for executing an information processing method for acquiring information via a network, the program comprising:
A request acquisition unit that acquires an original request including a search value that specifies information to be acquired from the database;
Specificity for determining whether or not the information to be acquired by a request to be currently issued with respect to requests issued in the past by the information processing apparatus is specific with respect to a request log for registering the search value history Evaluation department,
If the specificity evaluation unit determines that the search value is specific, the acquisition is performed so as to dilute the specificity of the access log related to the search value for the database issued by the information processing apparatus. A spreading request generation unit that generates a spreading request including a dummy request generated from a dummy value that gives a search value for requesting information different from the information that should be
A search request issuing unit that issues the spread request as a search request to the database via the network;
The program for functioning as a search result extraction part which extracts the information acquired by the said spreading | diffusion request from the response with respect to the said search request.
The program according to claim 16, wherein the dummy value is stored in a dummy generation information storage unit, and data mining tolerance is imparted by reducing the specificity of the search value in the request log.
The peculiarity evaluation unit searches the request log for the search value included in the original request, and determines the number of occurrences of the search value corresponding to the increase rate of the search value currently determined. The program according to claim 17, wherein the generation of the dummy request for the spread request unit is started by predicting that the average value of the number of occurrences increases by a threshold value or more.
A web system for transferring information over a network, the web system comprising:
Information for acquiring an original request including a search value designating information to be acquired and issuing a search request for searching for the information to be acquired to at least one web server connected to the network A processing device;
A web server that receives the search request including a plurality of search values from the information processing device, searches a database, and returns information specified in the search request as a response to the information processing device;
The information processing apparatus includes:
A request acquisition unit that acquires an original request including a search value that specifies information to be acquired from the database;
Specificity for determining whether or not the information to be acquired by a request to be currently issued with respect to requests issued in the past by the information processing apparatus is specific with respect to a request log for registering the search value history An evaluation unit;
If the specificity evaluation unit determines that the search value is specific, the acquisition is performed so as to dilute the specificity of the access log related to the search value for the database issued by the information processing apparatus. A spreading request generation unit that generates a spreading request including a dummy request generated from a dummy value that gives a search value that requests information different from the information that should be;
Issuing a search request for issuing the spread request as a search request to the database via the network;
Web system including
The diffusion request generation unit generates the diffusion request by acquiring and adding the dummy value that is not related to the information to be acquired from the dummy generation information storage unit, and the diffusion request is to be acquired When the information is continuous, only the dummy request is included, and when the information to be acquired is discrete, the original request and the dummy request are included, and the specificity evaluation unit includes the original request. For the search value included in the search value, the request log is searched, and a threshold value is set with respect to an average value of the search value corresponding to the occurrence number of the search value corresponding to the increase rate of the search value currently determined. In anticipation of an increase, the generation of the dummy request for the spread request unit is started, and the web The stem is constructed as Web2.0 paradigm, the information processing apparatus is a mashup server, Web system according to claim 19.