US20050086192A1 - Method and apparatus for improving the integration between a search engine and one or more file servers - Google Patents

Method and apparatus for improving the integration between a search engine and one or more file servers Download PDF

Info

Publication number
US20050086192A1
US20050086192A1 US10/688,287 US68828703A US2005086192A1 US 20050086192 A1 US20050086192 A1 US 20050086192A1 US 68828703 A US68828703 A US 68828703A US 2005086192 A1 US2005086192 A1 US 2005086192A1
Authority
US
United States
Prior art keywords
file
files
index
server
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/688,287
Other languages
English (en)
Inventor
Shoji Kodama
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to US10/688,287 priority Critical patent/US20050086192A1/en
Assigned to HITACHI, LTD. reassignment HITACHI, LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KODAMA, SHOJI
Priority to JP2004241794A priority patent/JP4559158B2/ja
Publication of US20050086192A1 publication Critical patent/US20050086192A1/en
Priority to US12/554,290 priority patent/US9229940B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers

Definitions

  • the present invention is related to computer file access and in particular to improving the performance of index maintenance in search engines.
  • the Internet is commonly associated with the world wide web (the “web”).
  • the web has facilitated an explosive proliferation of information to the millions of users who access the web. This information is accessed in the form of files by web servers.
  • the Internet has also provided access to files provided by file servers which pre-date the web, such as bulletin boards, ftp sites, and so on.
  • An intranet that is a private network of a company or any other organization is also used for sharing files.
  • a file server or a NAS Network Attached Storage
  • NFS and CIFS protocols are used for accessing files.
  • Search engines have become a valuable tool in navigating the Internet and/or file servers. Search engines are a commonly used tool to access the many millions of files on the Internet and/or file servers. Typically, the search engine accepts search requests from a user and sends a obtains a list of file names that match the search conditions.
  • index An integral component of a search engine is its “index.”
  • the index is a collection of information that is parsed or otherwise generated from an analysis of a file, and comprises keywords and related information used by the search engine to facilitate a file search.
  • the specific information content and data structures of the index vary from one search engine to another, and is beyond the scope of the present invention.
  • index typically involves the search engine checking updated dates of every files, reading every updated file on the Internet and/or file servers and parsing its contents to build up the index.
  • file contents change over time.
  • the search engine must therefore perform updates to the index in order that the index be current.
  • This task typically involves once again crawling the web and/or file servers to access attributes of each file, and then determine whether the file has been updated since the last time the index was updated; or when the index was created, in the case of the very first index update. This determination can be made, for example, by accessing the modification date of the file and comparing it against the index. Making this check reduces the update effort and thus improves the update time; not every file will be re-indexed, only those that have changed relative to the time of the index.
  • an update list is maintained in a file server.
  • Update information based on the update list is communicated to a search engine.
  • the update information comprises only those files that have been modified during an previous update operation on an index in the search engine.
  • the file server presents a restricted directory listing to a search engine, as compared to a directory listing of the same directory to a client other than a search engine.
  • a set of one or more filtering criteria can be used to limit the number of files presented to the search engine. This reduces the number of files the search engine must examine when performing an update of its search index.
  • an update list is maintained in the file server. Files referenced in the update list are limited depending on one or more filtering criteria.
  • FIG. 1 is a high level generalized block diagram of an illustrative embodiment of the present invention
  • FIG. 2 is a generalized flow diagram highlighting the processing for creating an index
  • FIG. 3 highlights the processing of file service requests in a file server
  • FIG. 4 is a high level flow diagram showing steps in the file server for processing update lists
  • FIG. 5 is a flow diagram highlighting steps in the file server for processing a write request
  • FIG. 6 is a flow diagram highlighting steps in the file server for processing a write request according to another embodiment of the present invention.
  • FIG. 7 is a generalized flow diagram highlighting steps in the file server for processing a directory listing request
  • FIG. 8 illustrates an example embodiment of an updated list
  • FIG. 9 illustrates an example embodiment of a file filtering table
  • FIG. 10 illustrates multiple exports.
  • FIG. 1 shows a high level block diagram outlining the basic architecture of an example embodiment of a search engine environment in accordance with the present invention.
  • the figure shows at least one file server 0104 having one or more files which can be accessed by users on a network 0103 .
  • a file server controller 010403 provides the processing capability conventionally associated with a file server. This may include a central processing unit (CPU), memory, and storage for program code to control the operation of the CPU.
  • CPU central processing unit
  • the files stored on a file server are organized into a system of files 010401 .
  • the file server can access an update list 010402 .
  • the update list can be contained in physical storage in a suitable location.
  • the file server element 0104 shown in the figure represents a plurality of file servers, each storing its own set of files.
  • a typical protocol that file servers use is the network file system protocol (NFS).
  • NFS network file system protocol
  • CIFS common internet file system
  • Still other protocols such as HTTP can be used by a file server.
  • the architecture typically includes at least one NFS/CIFS clients 0101 who communicate with the file server(s) 0104 over the network 0103 via the NFS or CIFS protocol in order to read and write files in the file server.
  • Clients include creators of the files, and users who can access the file to either read or modify files, or read and write files.
  • the client element 0101 of FIG. 1 represents a plurality of users, each capable of accessing one or more of the file servers.
  • a search engine server 0105 communicates via the network 0103 .
  • a file server controller 010502 provides the processing capability conventionally associated with a search engine. This may include a central processing unit (CPU), memory, and storage for program code to control the operation of the CPU.
  • CPU central processing unit
  • memory volatile and a non-transitory memory
  • storage for program code to control the operation of the CPU.
  • index information can be represented generically as an index database 010501 , without loss of generality.
  • the index is created and subsequently updated and otherwise maintained by the search engine.
  • This activity includes parsing or otherwise generating information from files in the file server(s) 0104 in order to create the index database. It can be appreciated that the search engine can use the same NFS or CIFS protocol to access files in the file server(s).
  • the architecture shows at least one file search clients 0102 . These are the users who access the search engine to submit file search requests. It can be appreciated that a “user” can be a human user or a machine user. An interface is understood to be provided by the search engine that is suitable to the kind of user being serviced. In a generalize sense, the file search client element 0102 shown in FIG. 1 represents a plurality of search clients.
  • the network 0103 is generally any suitable communication network that allows for communication among the various servers and clients mentioned above.
  • the figure shows a local area network (LAN), but it can be appreciated that other communication networks are equally suitable.
  • Connectivity to a LAN network is typically provided by the ethernet standard, using the TCP/IP protocol.
  • the file server 0104 and the search engine server 0105 each can be embodied in conventional computer hardware (e.g., comprising a suitable CPU, memory, storage devices, and so on).
  • Conventional software platforms can be used to support the server; e.g., Unix or other UNIX-based OS's, Macintosh OS, various Microsoft OS's, and so on.
  • the file server and the search engine server can run on same hardware and software platform.
  • NFS server and a search engine software can run on a Linux OS.
  • processing in the search engine includes creating an index database.
  • the “index” is used by the search engine when processing a search request.
  • the index is consulted to identify those files, if any, which satisfy a search client's request.
  • index is a very generalized reference to the specific data that a particular search engine may use. It is understood that the specific data structures and storage formats which comprise an “index” is likely to vary from one search engine to another. However, a search engine's index is likely to contain information about a file and its content (e.g., keywords).
  • the index may be one large database or some other single organization of data representing all file servers. However, logically, one can refer to each file server as having its own associated index; it being understood that reference is being made with respect to that portion of index structure associated with a file server.
  • an index is created for all of the files that can be accessed from a file server; this is done for every file server that is made known to the search engine. Also, if a search engine which is already online learns of a new file server, an index needs to be created for the accessible files contained in that file server. This is represented in FIG. 2 at decision step 0201 , where a determination is made whether the index is to be created for a particular file server.
  • the search engine sends an initialization operation (see FIG. 4 ) to the file server in a step 0202 .
  • This causes the file server to clear its associated update list 010402 .
  • the step 0201 is for creating the index for the first time.
  • a table can be provided to manage which file server the search engine made an index and at which time. See FIG. 2A as an example.
  • one file server can have multiple export points.
  • the search engine accesses update information contained in an update list 010402 associated with that file server (step 0203 ). Then in a step 0204 , files referenced in the update information are accessed by the search engine (see FIG. 4 ). For each file, the search engine will parse through (or otherwise analyze) the contents of the file to produce index information that is suitable for the index.
  • the search engine can access each file one at a time and perform the parsing. Alternatively, the search engine can access groups of files at a time and perform the parsing operation on the group.
  • the update list can be accessed by the search engine, just like any other file.
  • the file server creates a special file that contains a list of updated files and the search engine retrieves a copy of the file from the file server and stores it as a local copy.
  • the search engine also deletes the contents of the special file.
  • the search engine can then operate on the local copy; e.g., reading through the file to identify the files to parse.
  • a protocol can be defined between the search engine and the file server to obtain the information contained in the update list.
  • the file server can communicate to the search engine each file name of the files in the updated file or a list of every file name of the files in the update list to be processed in the search engine.
  • the search engine can receive the actual files in the updated list instead of a list of file names from the file servers.
  • a file server receives many requests for file operations.
  • Typical operations include, for example, file creation, file open, file read, file write, directory listings, and so on.
  • the specific file operations provided vary depending the file system and the protocols for communicating with the file server; e.g., NFS, CIFS, etc.
  • the file server receives a file operation request from a client.
  • the request is handed off to an appropriate handler.
  • a file open request is handled by a file open handler 0303 .
  • a file read request is handled by a file read handler 0304 .
  • a file write request is handled by a file write handler 0305 in accordance with an embodiment of the present invention. This aspect of the invention will be discussed below.
  • a directory listing request is handled by a directory listing handler 0306 in accordance with another aspect of the present invention. The directory listing request will be discussed further below.
  • a “get update list” request is handled by the handler 0307 . This function is provided in accordance with an embodiment of the present invention and is discussed below.
  • a file write operation changes (modifies) the content of the specified file.
  • the file server makes a determination in a step 0501 whether this is the first write operation on the file since it was opened. If it is the first write operation since the file was opened, then in a step 0502 a reference to the file is placed in the update list 010402 associated with the file server. If it is a write operation subsequent to the first write operation after the file was opened, then processing proceeds to the next step. Typically, the next step is to effect the requested write operation (step 0503 ), the details of which depend on the specific file server.
  • step 0501 The purpose of checking for the first write operation in step 0501 is to avoid having multiple entries in the update list 010402 for the same file.
  • One way to achieve this is as disclosed in step 0502 .
  • the update list can be inspected each time to determine whether the file is already in the list or not.
  • a created file initially contains no data. Therefore, it is not necessary that the file server make an entry in the update list to refer to a newly created file. When content is placed in the file, this will occur via a file write operation. However, in some file systems, the file create operation may leave the file in a state where subsequent write operations can be performed; thus obviating the need for a separate file open function call. Therefore with reference to the decision step 0501 in FIG. 5 , it can be appreciated that the test can be modified to include testing for the first write operation following a file open operation or a file create operation.
  • the information contained in the update list identifies the file that is the object of the write operation.
  • a complete path name of the file should suffice.
  • Other naming conventions might be more suitable.
  • the specific information will depend on the specifics of the filed serve, or the file system, and the like.
  • FIG. 8 an typical implementation exemplar of the update list 010402 is shown.
  • the implementation shown comprises a list of file names.
  • Each file that is referenced in the update list has been modified.
  • Each entry 080101 comprises a files name, including a full path name.
  • the “get update list” request comprises two kinds of operations.
  • the file server determines in a decision step 0402 whether the request is for an initialization operation or for a retrieval operation of the update list. If the request is an initialization operation, then in a step 0403 , the file server simply clears the update list, if one previously existed. If an update list did not already exist, then the file server will create an update list. This aspect of the invention is discussed further below.
  • the particular implementation shown in FIG. 4 uses a special protocol between a file server and a search engine to communicate an updated file list.
  • the search engine can use standard NFS/CIFS protocols to get a updated file list from a file server.
  • the updated file list is stored on the file server as a file. So the search engine reads the file via standard NFS/CIFS protocols and knows which files have been updated by reading the file. The content in the special file must be cleared after the read by the search engine.
  • the file server will communicate the update list to the search engine (step 0404 ).
  • a copy of the file can be communicated to the search engine, just like any other file.
  • the file server can communicate the actual files to the search engine; either one at a time, or in groups, or in some other suitable manner.
  • the search engine will analyze the file and update the index with information produce by the analysis, thereby updating the index.
  • the update list is cleared, in a step 0405 .
  • the update list can be cleared after the communication is complete.
  • the file server communicates files to the search engine instead, then each file that is referenced in the update list can be deleted from the update list after it is communicated to the search engine.
  • the list is once again filled with references to files that are modified.
  • the files referenced in the update list therefore represent those files that have been modified subsequent to a point in time when the update list was last cleared.
  • the update list contains a list of file references that have been modified since the last time the update list was retrieved by the search engine.
  • files referenced in the update list represent those files that have been modified subsequent to a point in time when the index was being updated. It can be appreciated that updating the index can be a time consuming operation. Thus, in practice, the clearing of the update list by the file server (by virtue of a get_file_list request) may very well occur before the completion of updating the index by the search engine.
  • the update list therefore avoids the search engine having to perform the brute force task of accessing and parsing every file on a given file server in order to update the index.
  • An index can be created for a file that does not have one. This situation may arise because the search engine was not previously aware of the file system, or for some reason it was decided to delete a previously existing index for the file system.
  • the search engine When the search engine has completed the process of creating the index, it will send a get_file_list request for an initialization operation. This has the effect of creating the update list or of clearing an existing update list. If the file system was not previously known, then the file system may not likely to have an update list. In that case, an update list is created. If the file system already had an update list, then the initialization operation will serve to clear the list.
  • each file server has its own associated update list.
  • an update list can be implemented that is accessible by two or more file servers that contains references to modified files from the two or more file servers.
  • a global update list can be provided.
  • this type of update list may or may not be preferable, depending on performance considerations, implementation considerations, and so on.
  • one file server maintains multiple updated files.
  • One update list is associated with one export point of the file server.
  • a file server can be configured to provide different exports of a file system to different clients.
  • a client “mounts” an export of the file system. Mounting is a process involving a series of communications between NFS/CIFS clients and the file server in order to make the export accessible by the NFS/CIFS clients.
  • An export is a name of a file system to be shared or a name of a directory to be shared by NFS/CIFS clients.
  • the file system 0104 provides a first export 1001 that can mounted by clients other than a search engine.
  • a second export 1002 is provided by the file server to be mounted by the search engine. Both exports are on the same file system or directory 010401 .
  • the file server knows which export the search engine has mounted; for example, a mapping relationship can be described in a special file in the file server.
  • the search engine performs conventional processing to either create an index on the files on the file server, or to update the index.
  • the search engine mounts the export that has been made available by the file server.
  • An administrator of a file server creates an export for a search engine.
  • An administrator of the search engine specifies a list of exports that the search engine needs to make an index. This can be done, for example, by editing a special file in the search engine. By using a directory service, this configuration can be done systematically.
  • the search engine then makes one or more requests for directory listing(s) of files on the file server; for example, using the standard requests provided in the NFS and CIFS protocols.
  • each file identified in the directory listing(s) is parsed and indexed.
  • the search engine determines whether the file should be parsed for indexing based on the modification date (or some other similar information) of the file. If the file was modified since the last time the index for this file system was updated, then the file is parsed and indexed; otherwise it is not parsed.
  • the list of files made available via a directory listing by the file server to the search engine is less than the files that are available in a directory listing to other clients. This is made possible because the search engine mounts an export that is different than the export that is mounted by clients other than the search engine.
  • the file server is configured to perform differently depending on which export the file service request is being made; e.g., a directory listing service request.
  • a file server configured according to this aspect of the invention includes a file filtering table 0901 .
  • the table contains conditions (criteria) 090101 that describe what kinds of files will be made available to an export that is mounted by the search engine. For example, users of the search engine may want to restrict files to be searched based on file type. Types of files can be determined by a file extension such as .ppt, .doc, .xls, and so on. In this case, files that having certain file extensions may be determined to be candidates for searching. Another criterion for determining which files can be searched might be based on file ownership, file creation time, file size, and so on.
  • the file filter table embodiment shown in FIG. 9 is an inclusive table. This means that the file filter table specifies those files which should be included in the directory listing. For example, all “.doc” files will be included in the directory listing for a given directory. However, “.exe” files will not be included; i.e., excluded from the list. It can be appreciated that the file filter table can be an “exclusionary” table. Thus, the table specifies those files which will be excluded from the directory list. Thus, for example, an exclusionary table might contain the criterion of “.exe”, meaning that all files in a directory will be included in the directory list except for files of type “.exe”. Still another variation of the file filter table is to be able specify files to be included and files to be excluded.
  • files that are indexed are those that contain text.
  • Some search engines will also index files that have graphics or some kind of image data, if there is corresponding text in the file.
  • the file filter table can reduce the set of files that the search engine must consider by filtering out executable files or other files which do not contain data that can be searched.
  • FIG. 7 illustrates an example of the processing for a directory request that is made on an export that a search engine has mounted.
  • the file server determines if the directory listing request issued from the search engine, step 0701 .
  • the directory listing request includes information as to which export the request was issued on. Since, the file server knows which export the search engine has mounted, the file server can make this determination. If the request did not come from a search engine, then in a step 0707 , a conventional directory listing is produced and communicated to the requesting client.
  • the file server consults the file filtering table 0901 to determine (step 0703 ) for each file in that directory whether it will be contained in the directory listing information. If the file meets the criterion(a) set forth in the file filtering table, then a reference to the file is added to a temporary list (step 0704 ).
  • the file server can determine whether the request came from a search engine or from a client by looking at which export the request has been issued or by looking at an IP address of the requester, or by some other suitable identification technique. Also, the file server can maintain a suitable list that identifies one or more computer systems (e.g., search engines) for which the file filtering table will be used to satisfy a directory request.
  • the directory listing that the search engine receives is filtered by the file filtering table, and thus can contain a subset of the files that a non-search engine client might receive.
  • processing in the search engine to create an index for the file system or to update it index can be reduced, as compared to conventional processing where an unfiltered directory listing might include many more files.
  • still another aspect of the present invention is directed to the processing in the file server of write requests.
  • a determination is made in a step 0601 whether the write request is the first write request since the specified file was last opened. If the write request is not a first write request, then the write request is processed in a conventional manner (step 0604 ), according to the specifics of the file server.
  • step 0602 If the write request is the first write request since the last file open operation, then processing proceeds to a decision step at step 0602 . There, a file filtering table 0901 is consulted. This table is used in the same manner as discussed above. If the file that is the object of the write operation satisfies any of the criteria in the file table, then a reference to the file is added to an update list 010402 , in a step 0603 . If no criteria are satisfied, then the write operation is completed in a conventional manner in step 0604 .
  • a created file initially contains no data. Therefore, it is not necessary that the file server make an entry in the update list to refer to a newly created file. When content is placed in the file, this will occur via a file write operation. However, in some file systems, the file create operation may leave the file in a state where subsequent write operations can be performed; thus obviating the need for a separate file open function call. Therefore with reference to the decision step 0601 in FIG. 6 , it can be appreciated that the test can be modified to include testing for the first write operation following a file open operation or a file create operation.
  • this aspect of the invention is similar to the aspect of the invention discussed in connection with update lists.
  • the search engine will consult the update list associated with the file system when it is ready to perform an update of its index for that file system, as discussed above.
  • the search engine need only access and parse those files referenced in the update list when performing an index update.
  • the size of the update list can be reduced somewhat. This has the desired effect of potentially reducing the index update time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
US10/688,287 2003-10-16 2003-10-16 Method and apparatus for improving the integration between a search engine and one or more file servers Abandoned US20050086192A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US10/688,287 US20050086192A1 (en) 2003-10-16 2003-10-16 Method and apparatus for improving the integration between a search engine and one or more file servers
JP2004241794A JP4559158B2 (ja) 2003-10-16 2004-08-23 データにアクセスするための方法及びシステム
US12/554,290 US9229940B2 (en) 2003-10-16 2009-09-04 Method and apparatus for improving the integration between a search engine and one or more file servers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/688,287 US20050086192A1 (en) 2003-10-16 2003-10-16 Method and apparatus for improving the integration between a search engine and one or more file servers

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/554,290 Division US9229940B2 (en) 2003-10-16 2009-09-04 Method and apparatus for improving the integration between a search engine and one or more file servers

Publications (1)

Publication Number Publication Date
US20050086192A1 true US20050086192A1 (en) 2005-04-21

Family

ID=34521135

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/688,287 Abandoned US20050086192A1 (en) 2003-10-16 2003-10-16 Method and apparatus for improving the integration between a search engine and one or more file servers
US12/554,290 Active 2025-05-05 US9229940B2 (en) 2003-10-16 2009-09-04 Method and apparatus for improving the integration between a search engine and one or more file servers

Family Applications After (1)

Application Number Title Priority Date Filing Date
US12/554,290 Active 2025-05-05 US9229940B2 (en) 2003-10-16 2009-09-04 Method and apparatus for improving the integration between a search engine and one or more file servers

Country Status (2)

Country Link
US (2) US20050086192A1 (ja)
JP (1) JP4559158B2 (ja)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050086583A1 (en) * 2000-01-28 2005-04-21 Microsoft Corporation Proxy server using a statistical model
US20050203907A1 (en) * 2004-03-12 2005-09-15 Vijay Deshmukh Pre-summarization and analysis of results generated by an agent
US20050210006A1 (en) * 2004-03-18 2005-09-22 Microsoft Corporation Field weighting in text searching
US20060059171A1 (en) * 2004-08-25 2006-03-16 Dhrubajyoti Borthakur System and method for chunk-based indexing of file system content
US20060069982A1 (en) * 2004-09-30 2006-03-30 Microsoft Corporation Click distance determination
US20060074903A1 (en) * 2004-09-30 2006-04-06 Microsoft Corporation System and method for ranking search results using click distance
US20060074871A1 (en) * 2004-09-30 2006-04-06 Microsoft Corporation System and method for incorporating anchor text into ranking search results
US20060136411A1 (en) * 2004-12-21 2006-06-22 Microsoft Corporation Ranking search results using feature extraction
US20060200460A1 (en) * 2005-03-03 2006-09-07 Microsoft Corporation System and method for ranking search results using file types
US20060294100A1 (en) * 2005-03-03 2006-12-28 Microsoft Corporation Ranking search results using language types
US20070038622A1 (en) * 2005-08-15 2007-02-15 Microsoft Corporation Method ranking search results using biased click distance
US20070130205A1 (en) * 2005-12-05 2007-06-07 Microsoft Corporation Metadata driven user interface
US20070174354A1 (en) * 2006-01-25 2007-07-26 Hitachi, Ltd. Storage system, storage control device and recovery point detection method for storage control device
US20090106223A1 (en) * 2007-10-18 2009-04-23 Microsoft Corporation Enterprise relevancy ranking using a neural network
US20090119395A1 (en) * 2007-11-01 2009-05-07 Hitachi, Ltd. Information processing system and data management method
EP2069938A2 (en) * 2006-09-26 2009-06-17 Sony Corporation Providing a user access to data files distributed in a plurality of different types of user devices
US7630994B1 (en) 2004-03-12 2009-12-08 Netapp, Inc. On the fly summarization of file walk data
US20100031361A1 (en) * 2008-07-21 2010-02-04 Jayant Shukla Fixing Computer Files Infected by Virus and Other Malware
US20100070526A1 (en) * 2008-09-15 2010-03-18 Disney Enterprises, Inc. Method and system for producing a web snapshot
US20100185633A1 (en) * 2009-01-20 2010-07-22 Jitender Bisht Techniques for file system searching
US7844646B1 (en) * 2004-03-12 2010-11-30 Netapp, Inc. Method and apparatus for representing file system metadata within a database for efficient queries
US20110107437A1 (en) * 2006-08-09 2011-05-05 Antenna Vaultus, Inc. System for providing mobile data security
US8024309B1 (en) 2004-03-12 2011-09-20 Netapp, Inc. Storage resource management across multiple paths
US8595238B2 (en) 2011-06-22 2013-11-26 International Business Machines Corporation Smart index creation and reconciliation in an interconnected network of systems
US8738635B2 (en) 2010-06-01 2014-05-27 Microsoft Corporation Detection of junk in search result ranking
US8793706B2 (en) 2010-12-16 2014-07-29 Microsoft Corporation Metadata-based eventing supporting operations on data
US8812493B2 (en) 2008-04-11 2014-08-19 Microsoft Corporation Search results ranking using editing distance and document information
US8843486B2 (en) 2004-09-27 2014-09-23 Microsoft Corporation System and method for scoping searches using index keys
US20150302111A1 (en) * 2012-12-31 2015-10-22 Huawei Technologies Co., Ltd. Method and Apparatus for Constructing File System in Key-Value Storage System, and Electronic Device
US9348912B2 (en) 2007-10-18 2016-05-24 Microsoft Technology Licensing, Llc Document length as a static relevance feature for ranking search results
US9495462B2 (en) 2012-01-27 2016-11-15 Microsoft Technology Licensing, Llc Re-ranking search results
CN111143284A (zh) * 2018-11-02 2020-05-12 浙江宇视科技有限公司 文件系统动态索引方法及装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5028218B2 (ja) 2007-10-30 2012-09-19 株式会社日立製作所 記憶制御装置、ストレージシステム及び記憶制御装置の制御方法
JP5709377B2 (ja) 2009-12-22 2015-04-30 キヤノン株式会社 画像形成装置、画像形成装置の制御方法及びプログラム
JP5728167B2 (ja) * 2010-05-12 2015-06-03 キヤノン株式会社 情報処理装置およびその制御方法、コンピュータプログラム
CN105718569A (zh) * 2016-01-20 2016-06-29 广州视睿电子科技有限公司 复合文档的上传方法、装置及系统
CN106487935A (zh) * 2016-12-21 2017-03-08 深圳市青葡萄科技有限公司 一种私有云内部服务器远程维护方法和系统

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US215601A (en) * 1879-05-20 Improvement in child s chair and carriage
US5845273A (en) * 1996-06-27 1998-12-01 Microsoft Corporation Method and apparatus for integrating multiple indexed files
US6067541A (en) * 1997-09-17 2000-05-23 Microsoft Corporation Monitoring document changes in a file system of documents with the document change information stored in a persistent log
US6269362B1 (en) * 1997-12-19 2001-07-31 Alta Vista Company System and method for monitoring web pages by comparing generated abstracts
US6356863B1 (en) * 1998-09-08 2002-03-12 Metaphorics Llc Virtual network file server
US6418453B1 (en) * 1999-11-03 2002-07-09 International Business Machines Corporation Network repository service for efficient web crawling
US6636854B2 (en) * 2000-12-07 2003-10-21 International Business Machines Corporation Method and system for augmenting web-indexed search engine results with peer-to-peer search results
US7020658B1 (en) * 2000-06-02 2006-03-28 Charles E. Hill & Associates Data file management system and method for browsers
US7231382B2 (en) * 2001-06-01 2007-06-12 Orbitz Llc System and method for receiving and loading fare and schedule data

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4747043A (en) * 1984-02-10 1988-05-24 Prime Computer, Inc. Multiprocessor cache coherence system
JPH09204442A (ja) * 1996-01-24 1997-08-05 Dainippon Screen Mfg Co Ltd ドキュメントデータ検索システム
JPH1063686A (ja) * 1996-08-20 1998-03-06 Dainippon Screen Mfg Co Ltd ドキュメントデータ検索システム
JP2000066945A (ja) * 1998-08-20 2000-03-03 Nec Corp ドキュメント収集システム、装置及び方法、並びに記録媒体
US6289362B1 (en) * 1998-09-01 2001-09-11 Aidministrator Nederland B.V. System and method for generating, transferring and using an annotated universal address
JP2001184355A (ja) * 1999-12-22 2001-07-06 Fujitsu Ltd 情報収集システム、コンテンツサーバ、情報収集装置及び記録媒体
JP4271827B2 (ja) * 2000-05-09 2009-06-03 富士通株式会社 情報提供システムおよび仲介装置
JP2002169805A (ja) * 2000-11-30 2002-06-14 Matsushita Electric Ind Co Ltd クライアント・サーバ型文書検索装置
US6714953B2 (en) * 2001-06-21 2004-03-30 International Business Machines Corporation System and method for managing file export information
US20040215601A1 (en) * 2003-04-23 2004-10-28 Win-Harn Liu Method of file management using a computer

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US215601A (en) * 1879-05-20 Improvement in child s chair and carriage
US5845273A (en) * 1996-06-27 1998-12-01 Microsoft Corporation Method and apparatus for integrating multiple indexed files
US6067541A (en) * 1997-09-17 2000-05-23 Microsoft Corporation Monitoring document changes in a file system of documents with the document change information stored in a persistent log
US6269362B1 (en) * 1997-12-19 2001-07-31 Alta Vista Company System and method for monitoring web pages by comparing generated abstracts
US6356863B1 (en) * 1998-09-08 2002-03-12 Metaphorics Llc Virtual network file server
US6418453B1 (en) * 1999-11-03 2002-07-09 International Business Machines Corporation Network repository service for efficient web crawling
US7020658B1 (en) * 2000-06-02 2006-03-28 Charles E. Hill & Associates Data file management system and method for browsers
US6636854B2 (en) * 2000-12-07 2003-10-21 International Business Machines Corporation Method and system for augmenting web-indexed search engine results with peer-to-peer search results
US7231382B2 (en) * 2001-06-01 2007-06-12 Orbitz Llc System and method for receiving and loading fare and schedule data

Cited By (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050086583A1 (en) * 2000-01-28 2005-04-21 Microsoft Corporation Proxy server using a statistical model
US7603616B2 (en) * 2000-01-28 2009-10-13 Microsoft Corporation Proxy server using a statistical model
US7630994B1 (en) 2004-03-12 2009-12-08 Netapp, Inc. On the fly summarization of file walk data
US8990285B2 (en) 2004-03-12 2015-03-24 Netapp, Inc. Pre-summarization and analysis of results generated by an agent
US8024309B1 (en) 2004-03-12 2011-09-20 Netapp, Inc. Storage resource management across multiple paths
US7844646B1 (en) * 2004-03-12 2010-11-30 Netapp, Inc. Method and apparatus for representing file system metadata within a database for efficient queries
US20080155011A1 (en) * 2004-03-12 2008-06-26 Vijay Deshmukh Pre-summarization and analysis of results generated by an agent
US20050203907A1 (en) * 2004-03-12 2005-09-15 Vijay Deshmukh Pre-summarization and analysis of results generated by an agent
US7539702B2 (en) 2004-03-12 2009-05-26 Netapp, Inc. Pre-summarization and analysis of results generated by an agent
US20050210006A1 (en) * 2004-03-18 2005-09-22 Microsoft Corporation Field weighting in text searching
US20060059171A1 (en) * 2004-08-25 2006-03-16 Dhrubajyoti Borthakur System and method for chunk-based indexing of file system content
US7487138B2 (en) * 2004-08-25 2009-02-03 Symantec Operating Corporation System and method for chunk-based indexing of file system content
US8843486B2 (en) 2004-09-27 2014-09-23 Microsoft Corporation System and method for scoping searches using index keys
US7761448B2 (en) 2004-09-30 2010-07-20 Microsoft Corporation System and method for ranking search results using click distance
US20060074903A1 (en) * 2004-09-30 2006-04-06 Microsoft Corporation System and method for ranking search results using click distance
US8082246B2 (en) 2004-09-30 2011-12-20 Microsoft Corporation System and method for ranking search results using click distance
US20060069982A1 (en) * 2004-09-30 2006-03-30 Microsoft Corporation Click distance determination
US20060074871A1 (en) * 2004-09-30 2006-04-06 Microsoft Corporation System and method for incorporating anchor text into ranking search results
US7827181B2 (en) 2004-09-30 2010-11-02 Microsoft Corporation Click distance determination
US7739277B2 (en) 2004-09-30 2010-06-15 Microsoft Corporation System and method for incorporating anchor text into ranking search results
US7716198B2 (en) 2004-12-21 2010-05-11 Microsoft Corporation Ranking search results using feature extraction
US20060136411A1 (en) * 2004-12-21 2006-06-22 Microsoft Corporation Ranking search results using feature extraction
US20060294100A1 (en) * 2005-03-03 2006-12-28 Microsoft Corporation Ranking search results using language types
US20060200460A1 (en) * 2005-03-03 2006-09-07 Microsoft Corporation System and method for ranking search results using file types
US7792833B2 (en) 2005-03-03 2010-09-07 Microsoft Corporation Ranking search results using language types
US20070038622A1 (en) * 2005-08-15 2007-02-15 Microsoft Corporation Method ranking search results using biased click distance
US20070130205A1 (en) * 2005-12-05 2007-06-07 Microsoft Corporation Metadata driven user interface
US8095565B2 (en) 2005-12-05 2012-01-10 Microsoft Corporation Metadata driven user interface
US7617255B2 (en) 2006-01-25 2009-11-10 Hitachi, Ltd. Storage system, storage control device and recovery point detection method for storage control device
US20070174354A1 (en) * 2006-01-25 2007-07-26 Hitachi, Ltd. Storage system, storage control device and recovery point detection method for storage control device
US20110107437A1 (en) * 2006-08-09 2011-05-05 Antenna Vaultus, Inc. System for providing mobile data security
US8418258B2 (en) * 2006-08-09 2013-04-09 Antenna Vaultus, Inc. System for providing mobile data security
US8959593B2 (en) * 2006-08-09 2015-02-17 Antenna Vaultus, Inc. System for providing mobile data security
EP2069938A2 (en) * 2006-09-26 2009-06-17 Sony Corporation Providing a user access to data files distributed in a plurality of different types of user devices
EP2069938A4 (en) * 2006-09-26 2010-01-06 Sony Corp PROVIDING USER ACCESS TO DATA DIVISIONS DISTRIBUTED IN MULTIPLE VARIOUS TYPES OF USER DEVICES
US7840569B2 (en) 2007-10-18 2010-11-23 Microsoft Corporation Enterprise relevancy ranking using a neural network
US9348912B2 (en) 2007-10-18 2016-05-24 Microsoft Technology Licensing, Llc Document length as a static relevance feature for ranking search results
US20090106223A1 (en) * 2007-10-18 2009-04-23 Microsoft Corporation Enterprise relevancy ranking using a neural network
US9609045B2 (en) 2007-11-01 2017-03-28 Hitachi, Ltd. Information processing system and data management method
US20090119395A1 (en) * 2007-11-01 2009-05-07 Hitachi, Ltd. Information processing system and data management method
US8473636B2 (en) * 2007-11-01 2013-06-25 Hitachi, Ltd. Information processing system and data management method
US8812493B2 (en) 2008-04-11 2014-08-19 Microsoft Corporation Search results ranking using editing distance and document information
US20100031361A1 (en) * 2008-07-21 2010-02-04 Jayant Shukla Fixing Computer Files Infected by Virus and Other Malware
US8935789B2 (en) * 2008-07-21 2015-01-13 Jayant Shukla Fixing computer files infected by virus and other malware
US20100070526A1 (en) * 2008-09-15 2010-03-18 Disney Enterprises, Inc. Method and system for producing a web snapshot
US8037113B2 (en) * 2009-01-20 2011-10-11 Novell, Inc. Techniques for file system searching
US20100185633A1 (en) * 2009-01-20 2010-07-22 Jitender Bisht Techniques for file system searching
US8738635B2 (en) 2010-06-01 2014-05-27 Microsoft Corporation Detection of junk in search result ranking
US8793706B2 (en) 2010-12-16 2014-07-29 Microsoft Corporation Metadata-based eventing supporting operations on data
US8595238B2 (en) 2011-06-22 2013-11-26 International Business Machines Corporation Smart index creation and reconciliation in an interconnected network of systems
US9495462B2 (en) 2012-01-27 2016-11-15 Microsoft Technology Licensing, Llc Re-ranking search results
US20150302111A1 (en) * 2012-12-31 2015-10-22 Huawei Technologies Co., Ltd. Method and Apparatus for Constructing File System in Key-Value Storage System, and Electronic Device
CN111143284A (zh) * 2018-11-02 2020-05-12 浙江宇视科技有限公司 文件系统动态索引方法及装置

Also Published As

Publication number Publication date
US20090327248A1 (en) 2009-12-31
JP4559158B2 (ja) 2010-10-06
US9229940B2 (en) 2016-01-05
JP2005122702A (ja) 2005-05-12

Similar Documents

Publication Publication Date Title
US9229940B2 (en) Method and apparatus for improving the integration between a search engine and one or more file servers
US10210256B2 (en) Anchor tag indexing in a web crawler system
US6952730B1 (en) System and method for efficient filtering of data set addresses in a web crawler
US6638314B1 (en) Method of web crawling utilizing crawl numbers
JP6006267B2 (ja) 索引キーを使用して検索を絞込むシステムおよび方法
US7139747B1 (en) System and method for distributed web crawling
US7065523B2 (en) Scoping queries in a search engine
US8560569B2 (en) Method and apparatus for performing bulk file system attribute retrieval
US8620926B2 (en) Using a hashing mechanism to select data entries in a directory for use with requested operations
JP4671332B2 (ja) ユーザ識別情報を変換するファイルサーバ
US6061686A (en) Updating a copy of a remote document stored in a local computer system
US8136025B1 (en) Assigning document identification tags
US6336117B1 (en) Content-indexing search system and method providing search results consistent with content filtering and blocking policies implemented in a blocking engine
US8073833B2 (en) Method and system for gathering information resident on global computer networks
US20040205047A1 (en) Method for dynamically generating reference indentifiers in structured information
US6883020B1 (en) Apparatus and method for filtering downloaded network sites
US20130124503A1 (en) Delta indexing method for hierarchy file storage
US6055534A (en) File management system and file management method
US9886446B1 (en) Inverted index for text searching within deduplication backup system
JP2007109237A (ja) データ検索システム、方法およびプログラム
US7660876B2 (en) Electronic file management
WO2001002988A2 (en) Method and system for continually tracking and reporting information available on global computer networks
JPH05233417A (ja) 分散ファイルシステムのディレクトリ管理方法
CRAWLER 20 Web crawling and indexes
JPH0934769A (ja) ファイル管理装置

Legal Events

Date Code Title Description
AS Assignment

Owner name: HITACHI, LTD., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KODAMA, SHOJI;REEL/FRAME:014626/0287

Effective date: 20031013

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION