WO2011013490A1 - 情報処理装置、情報処理方法、プログラムおよびウェブ・システム - Google Patents
情報処理装置、情報処理方法、プログラムおよびウェブ・システム Download PDFInfo
- Publication number
- WO2011013490A1 WO2011013490A1 PCT/JP2010/061535 JP2010061535W WO2011013490A1 WO 2011013490 A1 WO2011013490 A1 WO 2011013490A1 JP 2010061535 W JP2010061535 W JP 2010061535W WO 2011013490 A1 WO2011013490 A1 WO 2011013490A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- request
- search
- information
- value
- information processing
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6263—Protecting personal data, e.g. for financial or medical purposes during internet communication, e.g. revealing personal data from cookies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/564—Enhancement of application control based on intercepted application data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
- H04L67/5682—Policies or rules for updating, deleting or replacing the stored data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2101—Auditing as a secondary aspect
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2123—Dummy operation
Definitions
- the present invention relates to network technology, and more particularly, to technology for preventing information leakage that may occur from information acquisition via a network.
- a client computer such as a personal computer (hereinafter simply referred to as a client) issues a request for information to a web server that stores information and responds to the request. Then, when the web server sends information matching the request to the client, the client acquires the requested information.
- Requests sent from the client to the server include values that specify information such as numeric data sets and keywords, and the server issues a query to the database etc. by referring to the values.
- the conventional information retrieval method means that information that the client is currently interested in is disclosed to the server.
- the web server can be trusted, it can be said that there are relatively few problems even with conventional information retrieval. However, even if the web server is reliable, a search entity such as an individual or a company that performs a search can determine what information the search entity is currently interested in to obtain information. It is necessary to notify the server.
- a search entity such as an individual or a company that performs a search can determine what information the search entity is currently interested in to obtain information. It is necessary to notify the server.
- the mashup system includes a client, a mashup server, and a plurality of information servers.
- the mashup server may be installed in a company or the like, or the ISP (Internet Service Provider) may be a web server installed in the Internet in order to execute mashup processing exclusively. Good.
- Each information server is provided by an ISP or the like, searches information corresponding to a client request from a database managed by each, and sends the searched information to the mashup server.
- the mashup server appropriately arranges the acquired information, and displays information acquired from a plurality of information servers on a client via a browser program or the like.
- multiple information servers acquire requests issued by clients in response to information requested by clients.
- the request is configured as, for example, an SQL (Structured Query Language) statement including a search word and a conditional expression for searching for information to be acquired.
- the information server acquires search conditions from the received request, and acquires information corresponding to the request by executing a search of a database managed by each information server.
- the problem here is that none of the information servers connected via the network is necessarily reliable. Even if a reliable information server is used, the request content sent to the information server is analyzed by data mining during log analysis of the information server, and the search purpose on the client side is implicit. It will be known.
- the term “slow information leakage” means that the network access as a group on the network is statistically analyzed by the information server, and as a result, the intention of the group, that is, the search It means that the intention leaks slowly.
- Patent Document 1 discloses a data search system that protects search conditions and the location of a searcher as confidential information. Has been.
- a search is performed by concealing or shielding a part or all of the search conditions as confidential information in the data search device, and the data search device narrows down the search result, thereby performing the data search.
- the search condition that should be shielded in advance is deleted from all the input search conditions, or the input search condition is similar words or higher ranks. This is done by replacing the concept, adding extra search conditions, or dividing the search conditions.
- processing such as deletion, superordinate conceptualization, and addition of a search term is performed on the retrieval condition input to the data processing apparatus on the data retrieval apparatus side.
- an additional program for processing the search condition, a similar word dictionary, and the like are required, and preprocessing for the search condition is required to execute the search process.
- the data search device of Patent Document 1 the data search device temporarily stores information hit under a broader condition than the search condition using the shielded search condition, and information that matches the original search condition is again stored. A data editor to search is required.
- the data search apparatus is required to acquire and process a larger amount of information than is originally acquired, and the data search apparatus itself must substantially include a secondary database. .
- the data search device of Patent Document 1 involves waste of hardware resources and program resources, if the amount of information stored on the network is enormous as in the present, the search cannot be made efficient.
- Patent Document 2 Japanese Patent Laid-Open No. 2002-312377
- the input first search condition is changed to the second search condition for searching a wider range, and information search is executed on the search server.
- the first search result is acquired, the acquired first search result is searched again according to the first search condition, a search result corresponding to the first search condition is generated, and information such as user privacy is stored.
- a search device for preventing leakage is disclosed.
- the search device described in Patent Document 2 also expands the search condition to cause the search device to acquire the extended search result, and searches the search result again to generate a search result that should be originally acquired.
- the search device itself must function as a secondary database.
- the search device must secure a storage space for the expansion of the search condition, and the search device itself needs to include a certain degree of search capability. Information leakage is not effectively prevented in terms of waste of resources and software resources.
- the prior art generates a modified search expression including an original search condition to prevent information leakage due to search, issues a search request to the database, acquires an extended search result, and executes the extended search.
- the original search result is generated again by using the original search condition and searching again by the search device.
- the original search condition is expanded to create a higher level concept, or the extended search condition is generated so as to widen the range, but after all, the original search condition is , Must be included in the extended search criteria.
- the extended search conditions generated by Patent Document 1 and Patent Document 2 do not prevent the leakage of the search intention that the client side intends to search. It does not prevent gradual information leakage.
- An object of the present invention is to provide an information processing apparatus, an information processing method, a program, and a web system that prevent a gradual information leakage that may occur from information acquisition via a network.
- the present invention has been made by paying attention to the fact that a gradual information leak occurs in the conventional information retrieval.
- the past request log is statistically analyzed to retrieve information. If the request is specific from the past request history, a plurality of dummy requests including randomly selected dummy values are generated, and a spread request including a plurality of dummy requests is generated.
- Diffusion requests are characterized by data mining, etc. for access logs to web servers that execute search processing when the search target input by the search subject is specific from the past request log. Accumulating data so that it cannot be analyzed improves the data mining tolerance of requests issued by the search subject.
- the search target is specified by a numerical data set such as map data and has a continuous attribute that can be obtained by calculation from homogeneous information, as well as company name, stock price, product name, gender, age, arbitrary
- the present invention can also be applied to an object having discrete attributes such as a character string.
- an information processing apparatus that acquires information via a network
- the information processing apparatus includes: A request acquisition unit that acquires an original request including a search value that specifies information to be acquired from the database; Specificity for determining whether or not the information to be acquired by a request to be currently issued with respect to requests issued in the past by the information processing apparatus is specific with respect to a request log for registering the search value history An evaluation unit; If the specificity evaluation unit determines that the search value is specific, the acquisition is performed so as to dilute the specificity of the access log related to the search value for the database issued by the information processing apparatus.
- a spreading request generation unit that generates a spreading request including a dummy request generated from a dummy value that gives a search value for requesting information different from information that should be information;
- a search request issuing unit that issues the spread request as a search request to the database via the network;
- an information processing apparatus including a search result extraction unit that extracts information acquired by the spread request from a response to the search request.
- the dummy value of the present invention is stored in a dummy generation information storage unit, and the data mining tolerance can be imparted by reducing the specificity of the search value in the request log.
- the search request issuing unit of the present invention can issue the spread request including only the dummy request as the search request when the information to be acquired has a continuous attribute.
- the search request issuing unit of the present invention can issue the spread request including the original request and the dummy request as the search request when the information to be acquired has discrete attributes.
- the specificity evaluation unit of the present invention searches the request log for the search value included in the original request, and the number of occurrences of the search value corresponds to an increase rate of the currently determined search value.
- Generation of the dummy request for the spread request unit can be started by predicting that the average value of the number of occurrences of search values will increase by a threshold value or more.
- the original request of the present invention includes a plurality of search values for acquiring different information, and the specificity evaluation unit determines the specificity for each of the plurality of search values, and for each search value
- the dummy request can be generated and the spreading request issued to the database to be searched for information.
- the information processing apparatus of the present invention can receive a response from the database, generate a display area for displaying a response corresponding to each of the search values included in the original request, and display the response.
- the information processing apparatus of the present invention can be a mashup server implemented in the Web 2.0 paradigm.
- an information processing method and program executed by the information processing apparatus and a web system including the information processing apparatus.
- the figure which showed embodiment of the web system 100 of this invention The figure which showed the web system 200 of 2nd Embodiment of this embodiment.
- FIG. 6 is a detailed flowchart of processing from acquisition of an original request to issuance of a request when searching for information associated with specific numerical data in the present embodiment.
- FIG. 11 is an embodiment of pseudo code for executing the processing shown in FIG. 10.
- FIG. The diffusion state of the content of the diffusion request generated in the embodiment when searching for map data.
- the figure which showed embodiment of the access log 1500 produced
- FIG. 1 illustrates an embodiment of a web system 100 of the present invention.
- Web system 100 includes clients 110-114, mashup server 130, and web servers 150-154.
- the clients 110 to 114 and the mashup server 130 are interconnected via a network 120 such as a LAN, WAN, or the Internet.
- the mashup server 130 is interconnected to the web servers 150 to 154 via the network 140.
- the network 140 is not particularly limited, but a wide area network such as the Internet can be used.
- the mashup server 130 and the web servers 150 to 154 can employ almost the same hardware configuration, and a CISC architecture microprocessor such as a PENTIUM (registered trademark) or a PENTIUM (registered trademark) compatible chip, or A RISC architecture microprocessor such as POWERPC (registered trademark) can be implemented in a single-core or multi-core form.
- each server is controlled by an operating system such as WINDOWS (registered trademark) 200X, UNIX (registered trademark), LINUX (registered trademark), C, C ++, JAVA (registered trademark), JAVABEANS (registered trademark).
- Search requests sent from clients 110 to 114 by executing server programs such as CGI, Servlet, APACHE, IIS (Internet Information Server) implemented using programming languages such as, PERL, and RUBY. ⁇ Process the request.
- the mashup server 130 can be implemented as a partial function of a gateway server or the like of a company.
- the mashup server 130 may be installed in an ISP (Internet Service Provider) that performs a service based on a paradigm such as Web 2.0.
- the web servers 150 to 154 manage the databases 160 to 164, respectively, and can provide information corresponding to requests via the network 140.
- the server 150 is implemented as a company information service providing server
- the server 152 is implemented as a stock price information service providing server.
- the web server 154 is implemented as a map information service providing server, and processes individual requests from the mashup server 130 and sends them to the mashup server 130.
- Each of the clients 110 to 114 acquires information using a plurality of application services.
- the client 110 acquires information corresponding to the original request issued by the client 110 via the mashup server 130.
- the mashup server 130 stores information from the plurality of web servers 150 to 154 in association with the client 110, and presents the information to the client 110 as composite information.
- the mashup server 130 determines each application based on an original request sent from the client 110, for example. ⁇ Generate a spread request to be sent to the web servers 150 to 154 that provide the service, send the spread request to each of the web servers 150 to 154, and create an original from the information acquired corresponding to the spread request The result corresponding to the request is acquired, and is combined with, for example, a web page as composite information and sent to the client 110.
- the term “diffusion request” referred to in the present embodiment corresponds to the type of search target included in the original request issued by the client, and is sent to the web servers 150 to 154 generated for each search target attribute. Means a request to be made.
- a spread request is a single, dummy value that is generated so that it is difficult for the web server to analyze the characteristics of the original request by statistically mining data using the access log. Generated as a request or set of requests.
- Clients 110-114 can be implemented using a personal computer or workstation, etc., and the microprocessor (MPU) may include any single-core or multi-core processor known so far. .
- the clients 110 to 114 may be controlled by any known operating system such as WINDOWS (registered trademark), UNIX (registered trademark), LINUX (registered trademark), or MAC OS.
- the clients 110 to 114 access the mashup server 130 and the web servers 150 to 154 in order to access Internet Explorer (registered trademark), Mozilla (registered trademark), Opera (registered trademark), and FireFox (registered trademark). Browser software such as can be implemented.
- the data is transferred using a file transfer protocol such as HTTP or HTTPS using a transaction protocol such as TCP / IP. Transmission / reception is performed.
- the mashup server 130 implements JDBC (Java (registered trademark) Database Connectivity), ODBC (Open Database Connectivity), etc. to access the database of the web servers 150 to 154, and is defined by JDBC.
- JDBC Java (registered trademark) Database Connectivity
- ODBC Open Database Connectivity
- An application level protocol can connect to the web servers 150-154.
- the request issued by the client 110 is intercepted once by the mashup server 130. Then, the mashup server 130 performs statistical processing with reference to the past request log. As a result of the statistical processing, the mashup server 130 determines that the search value for designating the information to be acquired included in the request reflects the specific search intention based on the request history. And a spread request is issued to the web servers 150 to 154 that manage the search target information. Each of the web servers 150 to 154 receives the spread request, searches the databases 160 to 164 managed by each, extracts information corresponding to the request, and returns it to the mashup server as a response.
- the mashup server 130 forms a web page having a display area for simultaneously displaying the response on the desktop screen from the responses received from the web servers 150 to 154, and assigns each response to the display area. By displaying the request, the client 110 that issued the request browses.
- FIG. 2 shows a web system 200 according to the second embodiment of the present embodiment.
- the web system 200 shown in FIG. 2 implements a mashup application in which a plurality of clients 210 to 214 are implemented as an extended application of a web browser, for example, a plug-in program or an add-in program. Instead, the web system 200 does not use a dedicated server such as the mashup server 130.
- the function of the mashup server 130 in FIG. 1 is implemented as the function of the clients 210 to 214, and the spreading request issued from the original request to each of the web servers 230 to 234 is transmitted.
- the web servers 230 to 234 have the same configuration as that of the embodiment shown in FIG. 1, and return the searched information to the client 210 or the like in response to the spread request from the client 210 or the like.
- search values for specifying a search target of a dummy request are generated by combining them with an operator OR.
- Create a set the dummy request refers to the request log and relates to the time scale for each search target so that the client-side search intent is not extracted by data mining on the web server. , Select the request content to average.
- the spread request may include the original request or may not include the original request at all according to the attribute of the information to be searched.
- FIG. 3 shows functional blocks of the information processing system 300 that generates the spreading request of this embodiment.
- the information processing system 300 shown in FIG. 3 corresponds to the mashup server 130 in the embodiment of FIG. 1, and corresponds to the clients 210 to 214 in the embodiment shown in FIG.
- each functional block is implemented as a server application or a client application
- each functional block of the information processing system 300 is processed by a microprocessor. This is realized by reading a program for causing the apparatus to function as each functional means into a RAM, which is an execution space, and executing the program.
- the information processing system 300 includes an information processing device 310 and an input / output device 330 including a display device, a keyboard, a mouse, and the like.
- the information processing apparatus 310 sends a spread request to the networks 140 and 220 via the network adapter 312 and obtains a response from the web server corresponding to the spread request.
- the information processing apparatus 310 further includes a request acquisition unit 314, a diffusion request generation unit 316, and a dummy generation information storage unit 322. Further, the information processing apparatus 310 includes a request log 328 that stores requests sent from the information processing apparatus 310 to the web server in time series.
- the request acquisition unit 314 acquires original requests from the clients 110 to 114 via the network 120.
- the information processing apparatus 310 accesses the web servers 230 to 234 without using the mashup server 130, an original request including a search condition input by the operator is acquired via the input / output apparatus 330.
- the diffusion request generation unit 316 refers to the request / log storage unit 328 and determines the specificity of the original request acquired by the request acquisition unit 314 in the past request log.
- the diffusion request generation unit 316 acquires a dummy value used to generate the diffusion request corresponding to the determination result from the dummy generation information storage unit 322, and the specificity evaluation unit 324 determines that the specific search target is specific. A dummy value is generated until it is determined that there is no dummy value and included in the spread request.
- the peculiarity of the original request is determined by, for example, a threshold set by the mashup server 130 or the clients 210 to 214 for the number of appearances of the search target in a specific time scale in relation to the request issuance managed. Can be done using. Furthermore, it can be determined by performing more advanced statistical processing according to the processing capability of the information processing apparatus 310.
- the spread request generated by the spread request generation unit 316 is created by different processing depending on the attribute of data to be processed by the web servers 150, 152, and 154.
- the spread request is generated in order to make it difficult to analyze the time-series threshold behavior related to a specific target of the access log managed by each web server 150 to 154 statistically.
- the target information to be searched is not particularly limited, but in this embodiment, the information to be searched is classified into information having a continuous attribute and information having a discrete attribute.
- the information having the continuous attribute described above is a homogeneous information excluding data to be searched for values for characterizing information to be searched, for example, position coordinates, longitude, latitude, altitude, time, period, etc.
- Information having an attribute that can be acquired by a preset operation such as extrapolation, interpolation, and movement from the. More specifically, examples of information having continuous attributes include position coordinates and latitude / longitude data.
- the information having the above-mentioned discrete attribute is an attribute that has a possibility that the data may fluctuate independently of other homogeneous information and needs to directly access the data to be searched to acquire the data.
- information having discrete attributes include company stock price information, business performance information, M & A (Mergers and Acquisitions) information, and other information related to company activities and group activities.
- the dummy generation information storage unit 322 can be implemented as a database or a table and can register, for example, company names, addresses, latitude / longitude information, and the like in association with attributes of information requested by the request.
- a dummy for each category is used to reduce the specificity of the original request for each attribute of the information. Information that can be used as values can be registered.
- the peculiarity evaluation unit 324 receives the original request and analyzes the request log. When the received original request deviates from the average value of the access information of the request log, refer to the dummy generation information, Until the original request is determined to be non-specific as determined from the request log, the diffusion request generation unit 316 causes generation of the diffusion request including the dummy request including the dummy value.
- the information processing apparatus 310 includes a search request issuing unit 318 and a search result extracting unit 320.
- the search request issuing unit 318 sets the original request and the dummy request generated including the dummy value in the SQL query, and issues them to the web server via the networks 140 and 220.
- the spread request generation unit 316 determines whether to pass the value specified in the original request to the search request issuing unit 318 according to the attribute of whether the information to be acquired is continuous or discrete. to decide.
- the value specified in the original request is not set in the search request.
- the dummy request is generated so that the web server 154 can reach the target information by another request from the client, not the target information.
- the diffusion request generation unit 316 generates a request for requesting information that is the same quality as the information to be searched using the value described in the original request and is different from the information to be searched. And the dummy request is passed to the search request issuing unit 318 together with the original request to generate a diffusion request. For this reason, the search result extraction unit 320 receives the response of the original request together with the response of the dummy request.
- the search result extraction unit 320 filters the search result sent from the web server as necessary, and displays the search result on the display device of the input / output device 330 via the input / output interface / browser 326.
- the operator of the information processing apparatus 310 acquires a map or the like as a search result, the operator should adjust the display area or scale with a mouse or the like, and additionally issue a relative movement request or the like to acquire the original request. Update search results sequentially so that you can access the information.
- FIG. 4 is a flowchart of the information processing method of this embodiment.
- the process of FIG. 4 starts from step S400, and an original request is acquired in step S401.
- the original request is acquired from the network 120 or the input / output device 330 by the information processing apparatus 310 according to the embodiment of FIG. 1 or the embodiment of FIG.
- an original request is generated to acquire composite information with ⁇ C i , S i , G i ⁇ as a search target for searching for information acquisition of company information, stock price information, and map information.
- the information processing apparatus 310 separates the search request included in the original request, calls the specificity evaluation unit 324 in step S402, and first, for each search request included in the original request, is diffused in relation to the time course. Determine whether or not The process executed by the specificity evaluation unit 324 will be described later in more detail.
- step S403 When it is determined that the request to be sent to the web server is not spread in terms of contents and time based on the determination using the request log (no), in step S403, the diffusion request is referred to by referring to the dummy generation information. Then, the process returns to step S402 again to determine whether or not the content of the request is spread.
- step S402 If it is determined in step S402 that the content of the request is diffused by comparison with the request log, that is, it is not specific (yes), the request is transmitted in step S404.
- step S405 it is determined whether or not a response from the web server has been received. If the response has not been received (no), the process is repeated until the response has been received. On the other hand, when the reception of the response from the web server is completed in step S405 (yes), in step S406, the information processing apparatus 310 merges the responses corresponding to the original request and displays them in the browser. Note that the process of step S406 can include a process of filtering data to be browsed according to the attribute of the received data. When the browsing on the client display device is completed, the process ends in step S407 and waits for the subsequent input of the original request.
- FIG. 5 shows a request log 500 for a particular search target included in the original request for exemplary purposes.
- the time chunk can be set as appropriate, for example, in minutes, hours, days, weeks, months, etc., for the purpose of diluting the specificity of the original request.
- the request log 500 can be generated and stored for each specific unit of request issuance, and the request issuer unit can be a client unit, a business unit unit, or a company unit.
- a search target specified by numerical data such as map information can be determined by matching within a numerical range of longitude / latitude set around a specific latitude / longitude. In the case of searching for map information, the latitude / longitude range set according to whether it is an urban area or a non-urban area can be changed. The identity of the search target may be determined using whether or not there is a common landmark within a specific range in relation to the designated.
- the information processing apparatus 310 generates, for each search target, a search target issued as an original request for a specific search target in units of time chunks given at an appropriate processing interval from the start of recording of the request log 500. Register as N. Then, in the time chunk in which the request log 500 is currently accumulated, the number of occurrences of original requests including the target search object is detected in units of original requests.
- the peculiarity evaluation unit 324 checks the increase rate per request of the search target to be noticed at the stage where the currently accumulated time chunk is completed, and the time chunk is determined to be specific in the request log 500. Whether or not the search is to be performed is determined based on the currently determined time chunk TC p . As shown in FIG. 5, in the time chunk immediately after the start of request log recording, any original request issued in the time chunk is determined to be specific, and a spread request is generated.
- the history of search requests is accumulated with the lapse of time recorded in the request log 500, it is necessary to determine whether or not the search target currently determined including the past history is specific.
- the average value Nav of the number of requests over the time chunk TC i of the requests up to the current time chunk TC p for the specific search target, and the current time chunk TC p Using SN p as the number of requests predicted to be acquired for, for example, that the specificity index SN p given by the following equation (1) is larger than the probability error from the average value N av Judgment can be made.
- ⁇ error is a probability error with respect to the number of requests for the search target over N av time chunks
- ⁇ is a positive real number multiplied by the probability error
- ⁇ ⁇ 1 N av is given by the following formula (2), and is updated sequentially when the time chunk currently being recorded is completed.
- the value p for identifying the time chunk increases as the request log is recorded, but the first time chunk that starts the singularity evaluation process forms a singularity.
- the processing is started assuming that the search value is always singular.
- the case that will be SN p number of requests for the first time in the current time-chunk TC p is issued, to always specific also
- no specific processing is performed, and determination is performed according to the above formula (1).
- SP can be defined using a probability density function and variance given in a binomial distribution.
- the search targets it is assumed that the request log 500 has a multidimensional normal distribution, and the search target is searched using a multidimensional normal distribution and a variance-covariance matrix. Correlated specificity may be determined.
- the specificity evaluation unit 324 of the present embodiment indicates that the last time chunk in FIG. 5 is a time chunk that is currently recording a request log, and designates a specific search target at the current time point.
- the search value is accumulated up to SN current . This increased rate, increasing until the end of time chunks, predicts that the number of occurrences until SN predict increases, using the above equation (1) based on the prediction, it is determined as being specific.
- the prediction determination embodiment of this embodiment will be described later in more detail.
- FIG. 6 shows the characteristics of the search target in the time chunk in which the request log 500 is currently accumulated from the increase rate of the original request including the specific search target in the specific time chunk in this embodiment.
- An embodiment of a process for determining In FIG. 6, the vertical axis indicates the cumulative number in the search target time chunk TC m (m 0, 2, 3,..., N) included in the original request, and the horizontal axis indicates the request log 600. The time course of is shown. Further, the request log 600 is individually inspected as search objects C i , S i , N i , and O i . The cumulative number in each time chunk is indicated by a bar, and the black hatched bar is a time chunk that has already been recorded.
- time chunks that are determined to be specific on the request log 600 for a specific search target are indicated by black triangles on the bar.
- the search target marked with a black triangle has not been requested in the past, and was first detected in the time chunk indicated by the black triangle.
- the time chunk indicated by the white bar is a time chunk that is currently accumulated.
- the web server 150 or the like analyzes the access log. Thus, it is possible to determine the search intention of the original request issuer.
- the specificity evaluation unit 324 intercepts the original request and determines the content of the individual search request. This allows the original request acquired by the information processing apparatus 310 to determine the increase rate of the search target within a specific time chunk. That is, the peculiarity evaluation unit 324 accumulates search values specifying a specific search target in the time chunk, calculates an increase rate with respect to the total number of original requests, and performs linear extrapolation until the end of the time chunk. Then, extrapolation is performed by an appropriate method such as polynomial extrapolation or exponential extrapolation, and the extrapolated result is integrated within the currently accumulated time chunk to predict the number of occurrences.
- the peculiarity evaluation unit 324 accumulates search values specifying a specific search target in the time chunk, calculates an increase rate with respect to the total number of original requests, and performs linear extrapolation until the end of the time chunk. Then, extrapolation is performed by an appropriate method such as polynomial extrapolation or exponential extrapolation, and the extrapolated result is integrated within the currently accumulated
- the search object O i that is accumulating in the last time chunk is shown with a white triangle for the purpose of indicating that it is determined to be specific when the time chunk is completed.
- the diffusion request generation unit 316 is instructed to generate the diffusion request, and the dilution process of the specificity level is started.
- the diffusion request generation unit 316 of the present embodiment individually determines the specificity of the search target included in the original request, and generates a diffusion request when determining that the search target is specific.
- the spread request is generated by modifying the original request so that the information to be searched is not specific as viewed from the request log 500.
- the modification of the original request can be performed in the exemplary embodiment as follows.
- a request is issued with numerical data such as map information
- a single or multiple dummy requests including dummy values that are randomly corrected numerical data beyond the numerical range where the search target is determined to be the same Generate.
- a plurality of dummy values are selected and set in the spread request so that the frequency spectrum becomes equal when Fourier transform is performed in relation to latitude and longitude.
- attribute information for which a corresponding value such as company information or stock price information must be acquired directly it is the same as the search target company or stock brand that is determined to be specific from the dummy generation information storage unit 322 Randomly extract company names, stocks, and stock codes that are classified into different types of business, and acquire dummy information in an appropriate number so that the number of requests generated within the category is white noise. To do.
- the dummy request can be configured as the same set of search values as the original request, but since the information processing apparatus 310 individually accesses the database, the dummy request is a single request. It can be generated as a search request including a search value. The generated dummy request is randomly selected including the search value that is the true purpose of the search, and is sent to the corresponding databases 160 to 164, 240 to 244, and the like.
- FIG. 7 shows an embodiment of an access log 700 recorded, for example, by the web server 150 after issuing a spread request according to the present embodiment to the request log 500 shown in FIGS.
- the information processing apparatus 310 detects the number of requests sent to the company information providing server 150 for each specific search target for each specific time chunk, and calculates the statistical peculiarity of the search target included in the original request while accumulating it. Determine gender.
- the information processing apparatus 310 issues a dummy request, dilutes the specificity of the search target, and sends an access log to the specific issuer on the web server 150 side.
- FIG. 7 shows that the access specificity is diluted, that is, white noise is generated so that the specific search value does not show a prominent tendency as shown in FIG.
- the information processing apparatus 310 also has a case where the web server 150 is accumulated as an access log from a specific information processing apparatus 310 even when a search target that should be a dummy value for the true request TR is included in the original request A dummy request is generated so that the access log approaches white noise.
- the true request is a search value included in the original request and means a request reflecting a searcher's specific intention.
- the information processing apparatus 310 determines that the true request TR 1 is specific in the time chunk TC 1 , the information processing apparatus 310 extracts a search target whose specificity is diluted from the dummy generation information storage unit 322 and outputs the dummy request DR. Set to (Dummy Request).
- FIG. 8 is a detailed flowchart of processing from acquisition of an original request to issuance of a request when searching for information associated with specific numerical data in this embodiment.
- the information to which FIG. 8 is applied can be applied as long as the map data characterized by a numerical data set such as position coordinates or the information specified by the numerical data is continuous.
- the input value can be input as, for example, a value such as longitude and latitude, or can be input as a company name, a place name, and the like.
- the process can be executed by replacing the latitude / longitude data stored in the dummy generation information storage unit 322.
- step S800 shows details of steps S402 and S403 of FIG. 4 and is started after acquiring the original request in step S401.
- step S800 shows details of steps S402 and S403 of FIG. 4 and is started after acquiring the original request in step S401.
- Cx and cy that satisfy the above are generated using the function rnd ().
- (x, y) is numeric data specified by the original request
- w and h are values specifying the range of the numeric data.
- cx and cy are numerical data generated by random numbers and correspond to longitude and latitude values.
- gx and gy are the barycentric points (average coordinates) of the coordinates including the past history.
- step S801 the time stale index ti is initialized to zero.
- the time scale index ti is defined in minutes, hours, days, months, etc., and defines a time scale for determining how far the original request is not specific. Specifically, t0 specifies that one minute, t1 refers to one hour, and t2 refers to a request log issued during the time scale of one day. In addition, as t3 and t4, it is possible to refer to request logs in units of months or quarters.
- step S802 it is determined whether or not ti exceeds the number of elements in the ta array. If ti ⁇ smaller than the number of elements in the array of ta (yes), the past history for ta [ti] time is determined in step S803.
- the coordinates of the new center points gx and gy are calculated from the history coordinate group and cx and cy using the average value of the coordinate group values.
- step S804 a distance L between (x, y) and (gx, gy) is calculated.
- the distance L used in the present embodiment may be a Euclidean distance, a Manhattan distance, or an appropriate topological distance defined between feature values defined by feature coordinate axes.
- step S805 when it is determined that the distance L is equal to or less than a threshold value that is a value that the original request is assumed to be non-specific (yes), the time scale index ti is incremented by 1 in step S806, and then processing is performed. Is returned to step S802 to determine the specificity in another time scale. On the other hand, if the distance L is not less than or equal to the threshold value in step S805 (no), the process returns to step S800 to generate additional cx ′ and cy ′, and the calculation is repeated until the distance L is less than or equal to the threshold value.
- step S802 when calculation of the set timescale index is completed (no), since no period of the set timescale is specific, control is passed to step S404, and ⁇ (cx, cy) ⁇ is set as numerical data and issued as a spread request. In this embodiment, since the information is numerically continuous, (x, y) that is the value of the original request is not included in the spread request.
- the information processing apparatus 310 can acquire the information acquired as the original request without sending the numerical data specified as the original request to the web server.
- the process of FIG. 8 can be effectively applied when the information can be specified by numerical data and has topologically continuous attributes.
- FIG. 8 there is a map search and the like.
- the landmark Can when there is data in the extracted (cx, cy) set that is rounded to the position coordinates of a specific landmark that is not (x, y), the landmark Can be arranged in the area given by (w, h) and the retrieved information can be displayed.
- FIG. 9 shows a pseudo code of the processes of S803 to S805 among the processes described in FIG.
- the pseudo code block 900 corresponds to the processes of steps S804 and S805, and the pseudo code block 910 corresponds to the process of step S803 of FIG.
- variable list.
- the number of elements () included is the number of request log elements included in the time scale specified by ti.
- FIG. 10 is a flowchart of the second embodiment of the information search method of this embodiment, and corresponds to steps S402 to S405 of FIG.
- the embodiment of FIG. 10 can be suitably applied when the search target information has discrete attributes.
- the processing in FIG. 10 starts from step S1000 after obtaining the original request in step S401.
- the symbols used in FIG. 10 are defined as shown in Table 1 below.
- step S1000 the index value of the attribute matching kn is obtained from the array k [] and set to the variable ti.
- d
- step S1003 it is determined whether or not
- the threshold value the probability error of the request log described in the equation (1) can be used. However, when a different criterion is used to determine the specificity, a corresponding appropriate threshold value is set. Can be set.
- step S1005 the value corresponding to the index value dc in the access count array ac [] is updated, the index value dc is set to the value of the access candidate index array c [ci], and the counter ci of the number of access candidates is set to 1. Increment, return the process to step S1002, and repeat the above-described process until a positive value is returned in the determination of step S1003.
- step S1003 if it is determined in step S1003 that
- the target attribute name to be accessed is described as being performed in a process other than the process of FIG. 10, but by omitting the process of step S ⁇ b> 1010 and storing all the access execution results, FIG. In this process, it is possible to complete the process for all access candidates to be accessed.
- the access history can be spread over the time scale for spreading the access history by using the time scale index for calculating the average value as described in step S803 in FIG. .
- FIG. 11 shows an embodiment of pseudo code for executing the processing shown in FIG.
- a block 1100 corresponds to the processing step S1003 of FIG. 10, and a block 1110 corresponds to the processing of the block 1020 of FIG.
- the access execution for the attribute name to be searched is described as an embodiment executed after the block 1110 ends.
- the data mining tolerance can be further improved.
- FIG. 12 shows the diffusion state of the content of the diffusion request generated in the embodiment when searching for map data.
- the vertical and horizontal axes in FIG. 12 correspond to the vertical and horizontal widths w and h of the display area, respectively.
- FIGS. 12 (a) to 12 (c) show changes in the diffusion state when the initial conditions for generating random numbers are different.
- the spread request includes position coordinates sufficiently separated from the target coordinates (0, 0) given as the original request, and the points of the dummy request constituting the spread request are sufficiently random. The distribution is shown, indicating that the data mining tolerance of the request can be improved.
- FIG. 13 shows an embodiment of a search screen 1300 displayed by the web system of this embodiment.
- FIG. 13 shows a search in the case where the information search embodiment according to the present embodiment is applied to information having request contents set as numerical data and having continuity.
- the landmark 1312 is a search target.
- the operator of the information processing apparatus 310 inputs the position coordinates of the search target 1312, the company name, and the like. When the position coordinates and the company name are input, the information processing apparatus 310 uses the processing shown in FIG. 8 to determine whether or not the search target 1312 is a specific access from the request log history analysis. To judge.
- the information processing apparatus 310 determines that access to the search target 1312 is specific (
- the user who has acquired the map image 1310 scrolls the map data using a mouse or the like, moves the map data to a landmark 1316 such as a park, and moves the search target 1312 to the vicinity of the center to display the map data 1320.
- map data is continuously registered on the two-dimensional plane, scrolling of map data does not include a value for specifying a search object as in a request, and is moved with respect to default map data. Therefore, the data specifying the search target 1312 is not transmitted to the web server and does not affect the data mining tolerance.
- FIG. 14 shows an embodiment when the search target has discrete attributes in this embodiment.
- information acquired from a plurality of web servers by the mashup server 130 is mashed up and provided as one desktop screen 1400.
- an operator who performs a search performs a search with the intention of acquiring stock price information of a specific company “FGH” in LosLoAngels.
- the operator inputs a search character string such as FGH, stock price, map, and topics, but the mashup server 130 does not send the original request input by the operator as it is.
- the mashup server 130 refers to the company information and the request log stored in the dummy generation information storage unit 322, and obtains information such as a stock price in addition to the search target company “FGH”. Is acquired as an access candidate, and a search request is issued to the web server as a diffusion request.
- all the search results acquired by the above-described processing are acquired without being filtered, and are displayed as a search result list in the order corresponding to the requests sent randomly to the display frame 1410.
- a search result is also obtained for the search target company “FGH”.
- the search target company name “FGH” in the request is obtained. ", The relative weight in the access log is reduced.
- a statistically processed diffusion request of the request log is sent to the web server. Therefore, durability against data mining for analyzing the access log in the web server that accepted the request is received. Can be improved.
- the display frame 1420, the display frame 1430, and the display frame 1440 display the map position, stock price fluctuation, and topics corresponding to the search result currently selected by the operator in the search result list of the display frame 1410. ing.
- the display content in each display frame 1420, 1430, 1440 is changed in cooperation with the selection of other search results, and a plurality of web servers It is possible to efficiently present independent information from.
- FIG. 15 is a diagram for explaining the access effect generated as a result of issuing a diffusion request in order to explain the effect of improving the data mining tolerance for preventing the gradual information leakage from the access log by the information processing method of this embodiment.
- a log 1500 is shown.
- an access log 1510 is an embodiment in a specific web server when the spread request of this embodiment is not used, and an access log 1520 is a case where the spread request of this embodiment is applied.
- Fig. 4 illustrates an embodiment in a particular web server. Note that the vertical axis represents the total number of accesses for each search character string for access logs within a specific period.
- company names A, B, C, and D are company names that are added to the spreading request. It is assumed that the search target company name is Company C.
- the search target company name input by the operator is transmitted as it is to the web server, so that a request including the search target company name C company is prominently recorded as an access log. For this reason, on the web server side, it is possible to trace the transition of the search target of a specific enterprise or individual by performing data mining on the access log in time series. For example, when the number of accesses to the company C increases after a specific date and time, the searcher leaks to the site that operates the web server that he / she is interested in the company C after the specific date and time. For example, important information such as TOB (TakeOver Bid) and merger is less likely to be gradually leaked.
- TOB TakeOver Bid
- the access log 1520 indicates an access log generated on the web server side when the spread request of this embodiment is used.
- the request log is statistically processed to generate a spread request, and a request set is issued to the web server.
- that defines the specific specificity
- FIG. 15 exemplifies the company name.
- various information such as a specific character string in the geography / region name, product name, age, sex, group name, SNS, etc. Applicable and can prevent gradual information leakage about search intentions related to market research, future trends, corporate activity, and network activity, respectively.
- the above-described functions of the present invention include C ++, Java (registered trademark), JavaBeans (registered trademark), Java Applet (registered trademark), JavaScript (registered trademark), Perl, Ruby and other object-oriented programming languages, SQL and other search-only languages, etc. It can be realized by a device-executable program described in the above, and can be stored in a device-readable recording medium and distributed or transmitted and distributed.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
データベースから取得するべき情報を指定する検索値を含むオリジナル・リクエストを取得するリクエスト取得部と、
前記情報処理装置が過去に発行したリクエストに対して現在発行するべきリクエストにより取得するべき前記情報が、前記検索値の履歴を登録するリクエスト・ログに関して特異的であるか否かを判断する特異性評価部と、
前記特異性評価部が前記検索値について特異的であると判断した場合、前記情報処理装置が発行する前記データベースに対する前記検索値に関連したアクセス・ログの特異性を希釈するように、前記取得するべき情報とは異なる情報を要求する検索値を与えるダミー値から生成したダミー・リクエストを含む拡散リクエストを生成する、拡散リクエスト生成部と、
前記拡散リクエストを検索要求として前記ネットワークを介して前記データベースに宛てて発行する検索要求発行部と、
前記拡散リクエストにより取得された情報を前記検索要求に対するレスポンスから抽出する検索結果抽出部と
を含む、情報処理装置が提供される。
Claims (20)
- ネットワークを介して情報を取得する情報処理装置であって、前記情報処理装置は、
データベースから取得するべき情報を指定する検索値を含むオリジナル・リクエストを取得するリクエスト取得部と、
前記情報処理装置が過去に発行したリクエストに対して現在発行するべきリクエストにより取得するべき前記情報が、前記検索値の履歴を登録するリクエスト・ログに関して特異的であるか否かを判断する特異性評価部と、
前記特異性評価部が前記検索値について特異的であると判断した場合、前記情報処理装置が発行する前記データベースに対する前記検索値に関連したアクセス・ログの特異性を希釈するように、前記取得するべき情報とは異なる情報を要求する検索値を与えるダミー値から生成したダミー・リクエストを含む拡散リクエストを生成する、拡散リクエスト生成部と、
前記拡散リクエストを検索要求として前記ネットワークを介して前記データベースに宛てて発行する検索要求発行部と、
前記拡散リクエストにより取得された情報を前記検索要求に対するレスポンスから抽出する検索結果抽出部と
を含む、情報処理装置。 - 前記ダミー値は、ダミー生成情報格納部に格納され、前記検索値の前記リクエスト・ログにおける前記特異性を低下させることで、データマイニング耐性を付与する、請求項1に記載の情報処理装置。
- 前記検索要求発行部は、前記取得するべき情報が連続的な属性を有する場合、前記ダミー・リクエストのみを含む前記拡散リクエストを前記検索要求として発行する、請求項2に記載の情報処理装置。
- 前記検索要求発行部は、前記取得するべき情報が離散的な属性を有する場合、前記オリジナル・リクエストおよび前記ダミー・リクエストを含む前記拡散リクエストを前記検索要求として発行する、請求項2に記載の情報処理装置。
- 前記特異性評価部は、前記オリジナル・リクエストが含む前記検索値について、前記リクエスト・ログを検索し、現在判断している検索値の増加レートから前記検索値の発生数が対応する前記検索値の発生数の平均値に対してしきい値以上増加することを予測して前記拡散リクエスト部に対する前記ダミー・リクエストの生成を開始させる、請求項4に記載の情報処理装置。
- 前記オリジナル・リクエストは、異なる情報を取得するための複数の前記検索値を含み、前記特異性評価部は、複数の前記検索値ごとに前記特異性を判定し、前記検索値ごとに前記ダミー・リクエストを生成して、それぞれ情報検索するべき前記データベースに前記拡散リクエストを発行する、請求項5に記載の情報処理装置。
- 前記情報処理装置は、前記データベースからのレスポンスを受領して前記オリジナル・リクエストが含む前記検索値にそれぞれ対応するレスポンスを表示する表示領域を生成し、前記レスポンスを表示させる、請求項6に記載の情報処理装置。
- 前記情報処理装置は、Web2.0パラダイムで実装されるマッシュアップ・サーバである、請求項7に記載の情報処理装置。
- ネットワークを介して情報を取得する情報処理方法であって、前記情報処理方法は、情報処理装置が、
データベースから取得するべき情報を指定する検索値を含むオリジナル・リクエストを取得するステップと、
前記情報処理装置が過去に発行したリクエストに対して現在発行するべきリクエストにより取得するべき前記情報が、前記検索値の履歴を登録するリクエスト・ログに関して特異的であるか否かを判断するステップと、
前記特異的であるか否かを判断するステップにおいて前記検索値について特異的であると判断された場合、前記情報処理装置が発行する前記データベースに対する前記検索値に関連したアクセス・ログの特異性を希釈するように、前記取得するべき情報とは異なる情報を要求する検索値を与えるダミー値から生成したダミー・リクエストを含む拡散リクエストを生成するステップと、
前記拡散リクエストを検索要求として前記ネットワークを介して前記データベースに宛てて発行するステップと、
前記拡散リクエストにより取得された情報を前記検索要求に対するレスポンスから抽出するステップと
を実行する情報処理方法。 - 前記拡散リクエストを生成するステップは、前記検索値の前記リクエスト・ログにおける前記特異性を低下させるためのダミー値をダミー生成情報格納部から取得し、ダミー・リクエストにセットするステップを含む、請求項9に記載の情報処理方法。
- 前記拡散リクエストを発行するステップは、前記取得するべき情報が連続的な属性を有する場合、前記ダミー・リクエストのみを含む前記拡散リクエストを前記検索要求として発行するステップを含む、請求項10に記載の情報処理方法。
- 前記拡散リクエストを発行するステップは、前記取得するべき情報が離散的な属性を有する場合、前記オリジナル・リクエストおよび前記ダミー・リクエストを含む前記拡散リクエストを前記検索要求として発行するステップを含む、請求項11に記載の情報処理方法。
- 前記特異的であるか否かを判断するステップは、前記オリジナル・リクエストが含む前記検索値について、前記リクエスト・ログを検索し、現在判断している検索値の増加レートから前記検索値の発生数が対応する前記検索値の発生数の平均値に対してしきい値以上増加することを予測して前記ダミー・リクエストの生成を開始させるステップを含む、請求項12に記載の情報処理方法。
- 前記オリジナル・リクエストは、異なる情報を取得するための複数の前記検索値を含み、前記特異的であるか否かを判断するステップは、複数の前記検索値ごとに前記特異性を判定するステップを含み、
前記拡散リクエストを生成するステップは、前記検索値ごとに前記ダミー・リクエストを生成するステップと、
前記拡散リクエストを発行するステップは、それぞれ情報検索するべき前記データベースに前記拡散リクエストを発行するステップを含む、請求項13に記載の情報処理方法。 - 前記情報処理装置は、Web2.0パラダイムで実装されるマッシュアップ・サーバである、請求項14に記載の情報処理方法。
- 情報処理装置がネットワークを介して情報を取得する情報処理方法を実行するための装置実行可能なプログラムであって、前記プログラムは、情報処理装置を、
データベースから取得するべき情報を指定する検索値を含むオリジナル・リクエストを取得するリクエスト取得部、
前記情報処理装置が過去に発行したリクエストに対して現在発行するべきリクエストにより取得するべき前記情報が、前記検索値の履歴を登録するリクエスト・ログに関して特異的であるか否かを判断する特異性評価部、
前記特異性評価部が前記検索値について特異的であると判断した場合、前記情報処理装置が発行する前記データベースに対する前記検索値に関連したアクセス・ログの特異性を希釈するように、前記取得するべき情報とは異なる情報を要求する検索値を与えるダミー値から生成したダミー・リクエストを含む拡散リクエストを生成する、拡散リクエスト生成部、
前記拡散リクエストを検索要求として前記ネットワークを介して前記データベースに宛てて発行する検索要求発行部、
前記拡散リクエストにより取得された情報を前記検索要求に対するレスポンスから抽出する検索結果抽出部
として機能させるためのプログラム。 - 前記ダミー値は、ダミー生成情報格納部に格納され、前記検索値の前記リクエスト・ログにおける前記特異性を低下させることで、データマイニング耐性を付与する、請求項16に記載のプログラム。
- 前記特異性評価部は、前記オリジナル・リクエストが含む前記検索値について、前記リクエスト・ログを検索し、現在判断している検索値の増加レートから前記検索値の発生数が対応する前記検索値の発生数の平均値に対してしきい値以上増加することを予測して前記拡散リクエスト部に対する前記ダミー・リクエストの生成を開始させる、請求項17に記載のプログラム。
- ネットワークを介して情報を転送するウェブ・システムであって、前記ウェブ・システムは、
取得するべき情報を指定する検索値を含むオリジナル・リクエストを取得して、前記ネットワークに接続された少なくとも1のウェブ・サーバに対して前記取得するべき情報を検索するための検索要求を発行する情報処理装置と、
前記情報処理装置からの複数の検索値を含む前記検索要求を受領してデータベースを検索し、前記検索要求で指定される情報を前記情報処理装置にレスポンスとして返すウェブ・サーバと
を含み、
前記情報処理装置は、
前記データベースから取得するべき情報を指定する検索値を含むオリジナル・リクエストを取得するリクエスト取得部と、
前記情報処理装置が過去に発行したリクエストに対して現在発行するべきリクエストにより取得するべき前記情報が、前記検索値の履歴を登録するリクエスト・ログに関して特異的であるか否かを判断する特異性評価部と、
前記特異性評価部が前記検索値について特異的であると判断した場合、前記情報処理装置が発行する前記データベースに対する前記検索値に関連したアクセス・ログの特異性を希釈するように、前記取得するべき情報とは異なる情報を要求する検索値を与えるダミー値から生成したダミー・リクエストを含む拡散リクエストを生成する、拡散リクエスト生成部と、
前記拡散リクエストを検索要求として前記ネットワークを介して前記データベースに宛てて発行する検索要求発行と、
を含む、ウェブ・システム。 - 前記拡散リクエスト生成部は、前記取得するべき情報とは関連性がない前記ダミー値をダミー生成情報格納部から取得して追加して前記拡散リクエストを生成し、前記拡散リクエストは、前記取得するべき情報が連続的な場合、前記ダミー・リクエストのみを含み、前記取得するべき情報が離散的である場合、前記オリジナル・リクエストおよび前記ダミー・リクエストを含み、前記特異性評価部は、前記オリジナル・リクエストが含む前記検索値について、前記リクエスト・ログを検索し、現在判断している検索値の増加レートから前記検索値の発生数が対応する前記検索値の発生数の平均値に対してしきい値以上増加することを予測して前記拡散リクエスト部に対する前記ダミー・リクエストの生成を開始させると共に、前記ウェブ・システムは、Web2.0パラダイムとして構成され、前記情報処理装置は、マッシュアップ・サーバである、請求項19に記載のウェブ・システム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/387,477 US8725762B2 (en) | 2009-07-28 | 2010-07-07 | Preventing leakage of information over a network |
JP2011524721A JP5705114B2 (ja) | 2009-07-28 | 2010-07-07 | 情報処理装置、情報処理方法、プログラムおよびウェブ・システム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009175664 | 2009-07-28 | ||
JP2009-175664 | 2009-07-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011013490A1 true WO2011013490A1 (ja) | 2011-02-03 |
Family
ID=43529153
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/061535 WO2011013490A1 (ja) | 2009-07-28 | 2010-07-07 | 情報処理装置、情報処理方法、プログラムおよびウェブ・システム |
Country Status (3)
Country | Link |
---|---|
US (1) | US8725762B2 (ja) |
JP (1) | JP5705114B2 (ja) |
WO (1) | WO2011013490A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014106723A (ja) * | 2012-11-27 | 2014-06-09 | Kddi Corp | 検索情報難読化装置、検索情報難読化方法、およびプログラム |
WO2014141659A1 (ja) * | 2013-03-15 | 2014-09-18 | 日本電気株式会社 | 情報受信装置、情報受信システム、及び、情報受信方法 |
WO2018034192A1 (ja) * | 2016-08-19 | 2018-02-22 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び、記録媒体 |
JP2020527772A (ja) * | 2017-12-12 | 2020-09-10 | グーグル エルエルシー | 差分プライバシーを備える忘却型アクセス |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014003794A1 (en) * | 2012-06-29 | 2014-01-03 | Hewlett-Packard Development Company, L.P. | Obscuring internet tendencies |
US20140143882A1 (en) * | 2012-11-21 | 2014-05-22 | Alcatel-Lucent Usa Inc. | Systems and methods for preserving privacy for web applications |
US9444797B2 (en) | 2014-07-10 | 2016-09-13 | Empire Technology Development Llc | Protection of private data |
EP3163789B1 (en) * | 2015-10-29 | 2021-08-18 | Airbus Defence and Space GmbH | Forward-secure crash-resilient logging device |
US20220272110A1 (en) | 2019-03-04 | 2022-08-25 | Airgap Networks Inc. | Systems and methods of creating network singularities and detecting unauthorized communications |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002132813A (ja) * | 2000-10-18 | 2002-05-10 | Sharp Corp | 情報提供制御装置、情報提供方法、情報提供プログラムを記録した記録媒体および情報提供システム |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3581009B2 (ja) | 1998-03-12 | 2004-10-27 | 株式会社日立製作所 | データ検索システム及びデータ検索方法 |
JP2002312377A (ja) | 2001-04-18 | 2002-10-25 | Nec Corp | 検索装置、検索用サーバ、検索システム、検索方法およびそのプログラム |
JP3871301B2 (ja) * | 2001-05-15 | 2007-01-24 | インターナショナル・ビジネス・マシーンズ・コーポレーション | データベース検索装置、及びプログラム |
US7457946B2 (en) * | 2002-10-17 | 2008-11-25 | International Business Machines Corporation | Method and program product for privately communicating web requests |
JP4007596B2 (ja) * | 2003-02-25 | 2007-11-14 | インターナショナル・ビジネス・マシーンズ・コーポレーション | サーバ及びプログラム |
US20050177630A1 (en) * | 2003-12-19 | 2005-08-11 | Jolfaei Masoud A. | Service analysis |
JP2005222135A (ja) * | 2004-02-03 | 2005-08-18 | Internatl Business Mach Corp <Ibm> | データベースアクセス監視装置、情報流出元特定システム、データベースアクセス監視方法、情報流出元特定方法、およびプログラム |
US20090112805A1 (en) * | 2007-10-31 | 2009-04-30 | Zachary Adam Garbow | Method, system, and computer program product for implementing search query privacy |
US8239396B2 (en) * | 2009-03-20 | 2012-08-07 | Oracle International Corporation | View mechanism for data security, privacy and utilization |
-
2010
- 2010-07-07 WO PCT/JP2010/061535 patent/WO2011013490A1/ja active Application Filing
- 2010-07-07 JP JP2011524721A patent/JP5705114B2/ja not_active Expired - Fee Related
- 2010-07-07 US US13/387,477 patent/US8725762B2/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002132813A (ja) * | 2000-10-18 | 2002-05-10 | Sharp Corp | 情報提供制御装置、情報提供方法、情報提供プログラムを記録した記録媒体および情報提供システム |
Non-Patent Citations (1)
Title |
---|
HIDETOSHI KIDO: "Ichi Joho Service no Tameno Kaku Joho o Mochiita Ichi Privacy Hogo Shuho to Sono Cost Hyoka", DEWS2005 RONBUNSHU, 2 May 2005 (2005-05-02), Retrieved from the Internet <URL:http://www.ieice.org/iss/de/DEWS/DEWS2005/procs/papers/3A-i5.pdf> * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014106723A (ja) * | 2012-11-27 | 2014-06-09 | Kddi Corp | 検索情報難読化装置、検索情報難読化方法、およびプログラム |
WO2014141659A1 (ja) * | 2013-03-15 | 2014-09-18 | 日本電気株式会社 | 情報受信装置、情報受信システム、及び、情報受信方法 |
JPWO2014141659A1 (ja) * | 2013-03-15 | 2017-02-16 | 日本電気株式会社 | 情報受信装置、情報受信システム、及び、情報受信方法 |
US9817996B2 (en) | 2013-03-15 | 2017-11-14 | Nec Corporation | Information receiving device, information receiving method, and medium |
WO2018034192A1 (ja) * | 2016-08-19 | 2018-02-22 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び、記録媒体 |
JPWO2018034192A1 (ja) * | 2016-08-19 | 2019-06-13 | 日本電気株式会社 | 情報処理装置、情報処理方法、及び、プログラム |
JP2020527772A (ja) * | 2017-12-12 | 2020-09-10 | グーグル エルエルシー | 差分プライバシーを備える忘却型アクセス |
JP2021182402A (ja) * | 2017-12-12 | 2021-11-25 | グーグル エルエルシーGoogle LLC | 差分プライバシーを備える忘却型アクセス |
JP7124182B2 (ja) | 2017-12-12 | 2022-08-23 | グーグル エルエルシー | 差分プライバシーを備える忘却型アクセス |
US11727124B2 (en) | 2017-12-12 | 2023-08-15 | Google Llc | Oblivious access with differential privacy |
Also Published As
Publication number | Publication date |
---|---|
US20120284299A1 (en) | 2012-11-08 |
JP5705114B2 (ja) | 2015-04-22 |
US8725762B2 (en) | 2014-05-13 |
JPWO2011013490A1 (ja) | 2013-01-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5705114B2 (ja) | 情報処理装置、情報処理方法、プログラムおよびウェブ・システム | |
Das et al. | Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method | |
US6718365B1 (en) | Method, system, and program for ordering search results using an importance weighting | |
KR101374651B1 (ko) | 서치 결과를 향상시키기 위해 사용자로부터의 피드백을 적용하는 서치 엔진 | |
US8126874B2 (en) | Systems and methods for generating statistics from search engine query logs | |
KR100672277B1 (ko) | 개인화 검색 방법 및 검색 서버 | |
US8868595B2 (en) | Enhanced control to users to populate a cache in a database system | |
CA2790421C (en) | Indexing and searching employing virtual documents | |
US20110093461A1 (en) | Extensible Custom Variables for Tracking User Traffic | |
US20120278354A1 (en) | User analysis through user log feature extraction | |
KR20110009198A (ko) | 최다 클릭된 다음 객체들을 갖는 검색 결과 | |
WO2013086113A2 (en) | System for forensic analysis of search terms | |
CN102222098A (zh) | 一种网页预取方法和系统 | |
Jagan et al. | A survey on web personalization of web usage mining | |
US20130227112A1 (en) | Smart cache learning mechanism in enterprise portal navigation | |
US9400843B2 (en) | Adjusting stored query relevance data based on query term similarity | |
Bhushan et al. | Recommendation of optimized web pages to users using Web Log mining techniques | |
Sathiyamoorthi et al. | Data Pre-Processing Techniques for Pre-Fetching and Caching of Web Data through Proxy Server | |
US10235459B1 (en) | Creating entries in at least one of a personal cache and a personal index | |
JP2017167829A (ja) | 検出装置、検出方法及び検出プログラム | |
US20150156169A1 (en) | Method for determining validity of command and system thereof | |
CN112016017A (zh) | 确定特征数据的方法和装置 | |
Kim et al. | RILCA: Collecting and analyzing user-behavior information in instant search using relational DBMS | |
Agrawal et al. | A Survey Report On Current Research and Development of Data Processing In Web Usage Data Mining | |
Raut et al. | Research on Web Log Mining to Predicting User Behavior through Session |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10804237 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011524721 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13387477 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10804237 Country of ref document: EP Kind code of ref document: A1 |