WO2007121490A2 - Système et procédé d'identification de ressources partagées sur un réseau - Google Patents

Système et procédé d'identification de ressources partagées sur un réseau Download PDF

Info

Publication number
WO2007121490A2
WO2007121490A2 PCT/US2007/066969 US2007066969W WO2007121490A2 WO 2007121490 A2 WO2007121490 A2 WO 2007121490A2 US 2007066969 W US2007066969 W US 2007066969W WO 2007121490 A2 WO2007121490 A2 WO 2007121490A2
Authority
WO
WIPO (PCT)
Prior art keywords
search
network
user
appliance
server
Prior art date
Application number
PCT/US2007/066969
Other languages
English (en)
Other versions
WO2007121490A3 (fr
Inventor
Robert Erickson
David Fox
Original Assignee
Deepdive Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deepdive Technologies, Inc. filed Critical Deepdive Technologies, Inc.
Publication of WO2007121490A2 publication Critical patent/WO2007121490A2/fr
Publication of WO2007121490A3 publication Critical patent/WO2007121490A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the present invention relates generally to the field of searching for and identifying shared resources, and more particularly to identifying shared resources accessible via a network for search and retrieval, and to an apparatus and method for same.
  • Computer systems are typically used for various business, education, and entertainment-related applications, many of which store, retrieve and process information.
  • the increased availability of computer systems and computer networks, such as the Internet, has made vast repositories of information available to a huge segment of our population,
  • search engines For example, the World Wide Web (“WWW” or “web”) can provide access to a vast amount of information. Locating the desired information, however, can be quite challenging. This problem is compounded because both the amount of information available on the web and the number of inexperienced users searching the web are growing exponentially. In an attempt to deal with this problem, a number of specialized search tools, known as “search engines,” have been developed. Several of the more well-known search engines are Google, Yahoo, and MSN Search.
  • search engines attempt to return hyperlinks to specific web pages in which a user may be interested.
  • Most search engines base their determination of die user's interest on a collection of search terms (called a search query) entered by the user.
  • the goal of the search engine is to provide the user with multiple links to high quality, relevant results based on the user's search query.
  • the search engine accomplishes this by matching the terms in the search query against a corpus of pre-stored, pre-indexed web pages. Web pages that contain the user's search terms are called "hits" and are returned to the user.
  • a search engine may also attempt to sort the list of hits so that the most relevant and/ or highest quality pages are at the top of the list of hits returned to the user. For example, the search engine may assign a rank or score to each hit, where the score is designed to correspond to the relevance or importance of the web page. Determining appropriate scores can be a difficult task. Hot one thing, the importance of a web page to the user is inherently subjective and depends on the user's interests, knowledge, and attitudes. There is, however, much that can be determined objectively about the relative importance of a web page.
  • search engine may not be capable of indexing and/or accessing the video clip to identify content, depending on the format and/or content of the video clip and the sophistication of the search engine.
  • a similar problem may be encountered with other forms of content such as word processing documents, graphic image files, MP3 clips, interactive blogs, etc.
  • search results provided by standard search engines will continue to be sub-optimal, at least for certain classes of users and certain types of searches.
  • a networked information indexing and search apparatus and method provide access, including indexing and search access, to information located on one or more intranets, the Internet, or both.
  • the networked search apparatus also referred to herein as a network search device or network search appliance, and method comprise configuration, indexing, and searching capabilities to facilitate networked information search and retrieval.
  • a network search device comprises configuration, indexing and searching components.
  • network settings such as a network address, are dynamically established for the network search device.
  • the indexing component of the network search device searches the network, identifies sharable resources available on the network, and maintains a search repository, or database, of search information.
  • the network search device's searching component uses the search database to search for information on the network.
  • search results are scored, or ranked, according to one or more scoring mechanisms.
  • a method executed by a network search device comprises configuring the network device, including dynamically establishing network settings, such as a network address, corresponding to the network device, creating an index of sharable resources on the network, including searching the network to identify the sharable resources on the network and maintaining a search repository, or database, of search information, and searching for information on the network using the search database, in response to a search request.
  • search results arc scored, or ranked, according to one or more scoring mechanisms.
  • the network search device is configured using a user datagram protocol (UDP) client/server model, wherein messages are transmitted between the search device and a network device (e.g., a network server) to assign Internet Protocol (IP) settings, which include an IP address, for the search appliance.
  • IP Internet Protocol
  • a bootstrap client executes on the network server, which polls the network via a message broadcast to each of the network search devices physically connected to the network, or network segment.
  • each network search device provides identification information, e.g., its Medium Access Control (MAC) address and hostname.
  • the bootstrap client on the network server uses the network search device's identification information to communicate with the network search device to set an IP state of the search appliance, and to reset the search appliance.
  • MAC Medium Access Control
  • the network search device searches, also referred to herein as crawling or web crawling, the network for sharable resources, or shares, and maintains/updates a repository of information associated with each share to facilitate indexing and/or search.
  • a sharable resource may be a hard disk drive, or other storage media, fixed or removable, or one or more file system directories, files, documents, pages etc. stored thereon, with
  • a database stores information corresponding to these sharable resources, which is used for indexing and search.
  • a system and method of identifying sharable resources is provided.
  • a list of network servers is generated, and each listed server is queried to identify of sharable resources.
  • Information identifying the sharable resources located can dien be indexed, searched, and displayed.
  • the system/method performs an iterative search of the network for sharable resources, taking into account different environments of the network.
  • Embodiments of the invention can be used to search in a wide array of network environments to identify sharable resources.
  • network environments include without limitation NetBIOS workgroups, Windows NT domains, and Windows 2000/2003 domains in backward-compatible modes with Windows Name Service-enabled network environments.
  • embodiments have the ability to i ⁇ teroperate with various advanced Windows networking and security features.
  • a list of shared resources can be identified by generating a list of servers. Each server on the list is then queried to obtain a list of shared resources, a "share list".
  • the "share list” can be used to identify sharable resources (e.g., a disc drive storing shared files), resolve unresolved IP addresses, and/or identity new servers to be queried.
  • the list of servers can be generated using a network browser service to browse the network, an active director ⁇ ' service to access a directory of network objects, including servers, and predetermined configuration information.
  • an option to specify/obtain network configuration information is provided.
  • a user e.g., an administrative user can use a graphical user interface to specify network services (e.g., NetBIOS peer-to-peer services, arid/or a Windows Internet Name Service (WINS).
  • network services e.g., NetBIOS peer-to-peer services, arid/or a Windows Internet Name Service (WINS).
  • other tools e.g., Dynamic Host Configuration Protocol, or DHCP
  • DHCP Dynamic Host Configuration Protocol
  • a netwotk browser tool is used to identify sharable resources.
  • a network browser can be implemented using a NetBIOS-over- TCP/IP protocol set.
  • a collection of candidate servers on the current network can be found by broadcasting a message to a given poit (e.g., port 139 and/or 445) associated with possible addresses on the network. Servers that respond to the message are determined to be candidates for browsing.
  • a set of servers identified using a network browser can be queried using, for example, a tool such as SA ⁇ lBA's nmblookup to identify a corresponding NetBIOS name which can be mapped to a corresponding IP address.
  • the browser tool can be used to identify active directory services, ADS, LDAP server, by broadcasting a message to a known "director)' services" port (e.g., port 389).
  • An LDAP server can then be queried to identifier the names of "Domain member" computers using LDAP.
  • Active director * services can be used to identify' available shares. For example, and by joining an active direct domain, it is possible to entct into a "trust" relationship with a domain controller. This can sometimes be necessary to obtain the lists of available shares from domain member servers.
  • An active director ) ' can be used to find domain member servers. Obtaining die names of domain member servers from the director)' rather than searching for them on the network can help to streamline things with regard to certain networks (e.g., class A /B networks), on which using a broadcast technique (e.g., send a query and waiting for a response) might take considerable time, especially in a case that several thousand or more addresses need to be queried.
  • An active directory can be used to find available shared folders. Obtaining share names and locations from the directory can be advantageous advantages over a direct query to a server, especially when some resources may be located on a server that is not available (e.g., not running) at the time of an initial network survey.
  • a global catalog server can be used to find available shared folders.
  • a domain's "global catalog servers" can be used in order to identify shared resources in an entice forest of domains rather than just a "current" domain.
  • the configuration information can be used to identify a network that uses WINS, e.g., a WINS server, which maps NetBIOS names to IP addresses, can be used to identify servers.
  • WINS server can be used to identify servers as a supplement to, or in place of, a broadcast approach (e.g., broadcasting a network message and wait for a response). If a server does not support WINS, it is carried forward in the search with its IP addresses as its name. A server in this category is queried via an SMB protocol (with NetBIOS session wrapping or raw) to obtain its browse list, if available, and its list of shared disk resources.
  • SMB protocol with NetBIOS session wrapping or raw
  • DNS Domain Name Service
  • a reverse lookup i.e., using a known IP address to identify a server name
  • a DNS reverse lookup can be used identify a server name given an IP address identified during a browse of the network, and/or an IP address that failed to respond to a broadcast. If the DNS reverse lookup successfully returns a name, it can be identified, e.g., in a browse list, by name rather than by IP address. This feature can be used to support "local network segment" indexing for many Windows 20Ox Active Director ⁇ ' domains.
  • the final list of servers and shares can be provided to the administrative GUI for presentation to the user.
  • the database includes domain, uri, and page tables used to store information corresponding to pages within documents stored as files at a location, or domain, on the network.
  • the domain table includes a name corresponding to each domain.
  • the uri table includes a universal resource indicator, or uri, for each document, together with other document information (e.g., last modification date and index time).
  • the page table has an entry for each page (e.g., web page, email, page widiin a word processing document, etc.).
  • the database further includes a lexicon, or dictionary, of "original" words, which is dynamically updated to include new words.
  • the database includes parts of speech of each word.
  • One or more, preferably every, stem words constructed from an original word is stored in the lexicon, with each stem word being related in the database to the original word from which it was constructed.
  • a rank table stores entries, each of which records the frequency of occurrence of a stem word with a document/ page, as it is currently known (i.e., at the time of the last index and/or modification).
  • a word table identifies locations of original words within a document/page.
  • the database model is such that new records can be added to one or more database tables using a file import mechanism, instead of a database insert command (e.g., structured query language, SQL, insert command).
  • a database insert command e.g., structured query language, SQL, insert command
  • Existing records arc updated using an SQL update command.
  • a file import mechanism data used to populate records in one or more of the uri, page, rank and word tables is buffered, and thereafter written to the database (e.g., at the end of indexing and/ or as the data buffers become full).
  • an N-ary trie is used to buffer the lexicon and provides efficient word lookup.
  • the value of "N" is based on the particular character set used to represent the words in the lexicon. For example, "N" can represent the number of characters in an alphabet, together with a number of digits and punctuation marks.
  • the contents of the lexicon table ate written to the N-ary trie buffer structure. Updates made during an indexing operation, such as new words found in new or updated documents/pages, are first written to the N ⁇ ary trie buffer structure, and then written to the database using the file import mechanism.
  • a scoring mechanism which may include one or more "weighting" methodologies is used to provide enhanced search results.
  • a scoring mechanism is used to rank results from a search, to determine a relevance score for each item (e.g., document, page, etc.) identified from a keyword search.
  • the scoring mechanism is used to rank an item's relevance based on both a frequency of occurrence of a keyword found in a document and a correlation between multiple keywords found in the document.
  • the scoring mechanism can be used to determine correlations between multiple keywords found within a given search result item, to assist in differentiating the relevance of a search result item relative to the other search result items uncovered in the search.
  • the scoring algorithm scales products of frequencies of occurrence, using different combinations of frequencies of occurrence associated with the keyword terms, beginning with a first order and increasing to an order equal to the number of keywords m the search, to determine relevance corresponding to a search result item having multiple keywords.
  • the relevance can be determined for each search result item having multiple keywords.
  • a threshold number which identifies a number of multiple keywords, is used to determine the relevance score assigned to a search result item.
  • the scoring algorithm is used to determine a relevance score using the scoring algorithm.
  • FIG. 1 illustrates a block diagram of a representation of a network of computing devices and peripherals in which one or more embodiments of the present invention can be used in provided;
  • FIG. 2 provides an illustrative example of a block diagram of an internal architecture of a search appliance in accordance with one or more embodiments of the present invention
  • FIG. 3 illustrates a flowchart of process steps to create and update an index in accordance with one or more embodiments of the present invention
  • FIG. 4 provides an illustrative example of a block diagram of a search appliance used in indexing and searching in accordance with one or more embodiments of the present invention
  • FIGv 5 illustrates a flowchart of process steps to score and rank search results in accordance with one or more embodiments of the present invention
  • FIG. 6, which includes FIG. 6A to FIG. 6O, provides illustrative examples of screens from a user interface of a search appliance in accordance with one or more embodiments of the invention.
  • FIG. 7 which includes FIG. 7A to FIG 7Y, provides illustrative examples of screens from a user interface used in configuration operations for, and/ or associated with, a search appliance in accordance with one or more embodiments of the present invention.
  • FIG. 8 which comprises FIGs. 8A and 8B, provides an example of pseudo code of a script for use in discovering shared resources in accordance with one or more embodiments.
  • a networked information indexing and search apparatus and method provide access, including indexing and search access, to information located on one or more intranets, the Internet, or both.
  • the networked search apparatus also referred to herein as a network search device or network search appliance, and method comprise configuration, indexing, and searching capabilities to facilitate networked information search and retrieval
  • FIG. 1 a block diagram of a representation 100 of a network of computing devices and peripherals in which one or more embodiments of the present invention can be used in provided.
  • computers 150, 160, and 170, at least one instance of search appliance 180, and at least one data server 190 are coupled via a network 120.
  • an optional printer 110 and an optional fax machine 140 are shown.
  • individuals, business entities and the like for example can efficiently and effectively access and manage the storing, indexing, accessing, and retrieving of electronic data as described herein in conjunction with the various embodiments of the present invention.
  • Optional printer 110 and an optional fax machine 140 are standard peripheral devices that may be used for transmitting or outputting paper-based documents, notes, search results, reports, etc. in conjunction with die queries and transactions processed by computer-based system 100. It should be noted that optional printer 110 and optional fax machine 140 are merely representative of the many types of peripherals that may be utilized in conjunction with the present invention, and that other peripheral devices can be used with one or more embodiments of the present invention and no such device is excluded by its omission in FIG. 1.
  • Network 120 is any suitable computer communication link or communication mechanism, including a hardwired connection, an internal or external bus, a connection for telephone access via a modem or high-speed 11 line, radio, infrared or other wireless communications, private or proprietary local area networks (IANs) and wide area networks
  • IANs local area networks
  • WANs wide area network
  • intranet Internet
  • portions of network 120 may suitably include a dial-up phone connection, broadcast cable transmission line, Digital Subscriber Line (DSL), ISDN line, of similar public utility-like access link.
  • DSL Digital Subscriber Line
  • At least a portion of network 120 comprises a standard wired or wireless Internet connection between the various components of computer-based system 100.
  • Network 120 provides for communication between the various components coupled to network 120, which allows for information to be transmitted between devices coupled thereto.
  • a user of computer system e.g., computer 150, 160 and 170, connected to network 120, for example, can gain access, based on access privileges corresponding to the user, to data and information accessible via network 120.
  • network 120 serves to link the physical components of computer-based system 100 together, regardless of their physical proximity. This is especially important because it is contemplated that, in one or more embodiments of the present invention, data server 190 and computers 150, 160, and 170 may be geographically remote and physically separated from each other.
  • Computers 150, 160 and 170 may be any type of computer known to those skilled in the art that is capable of being configured for use with computer-based system 100 as described herein. This includes laptop computers, desktop computers, tablet computers, pen-based computers and the like. Computers 150, 160, and 170 are most preferably commercially available computers such as a Linux-based computer, IBM compatible computers, or Macintosh computers. Howeve ⁇ , those skilled in the art will appreciate that the methods and apparatus of the present invention apply equally to any computer or computer system, regardless of whether the computer is a traditional "mainframe" computer, a complicated multi-user computing apparatus or a single user device such as a personal computer or workstation.
  • handheld and palmtop devices are also specifically included within the description of devices that may be deployed as computers 150, 160 and 170. It should be noted that no specific operating system or hardware platform is excluded and it is anticipated that many different hardware and software platforms may be configured to be deployed as computers 150, 160 and 170. Various hardware components and software components (not shown this HG.) known to those skilled in the art may be used in conjunction with computers 150, 160 and 170.
  • Data server 190 together with computers 150, 160 and 170, are preferably configured to store and retrieve data, some or all of which is sharablc via network 120.
  • Various hardware components such as external monitors, keyboards, mice, tablets, hard disk drives, recordable CD-ROM/DVD drives, jukeboxes, fax servers, magnetic tapes, and other devices known to those skilled in the art may be used in conjunction with data server 190, and computers 150, 160 and 170.
  • Data server 190 may also be configured with various additional software components (not shown this FIG.) such as database servers, web servers, firewalls, security software, and the like. While only a single data server 190 is shown connected to network 120 of FIG. 1, embodiments of the present invention contemplate and embrace a virtually unlimited number of data servers 190.
  • the vatious data servers may vary in size, complexity and capability, but will all generally be capable of storing and retrieving information via network 120, in response to user requests.
  • data server 190 represents a network accessible data server that is configured to store data files for later retrieval by the users of computers 150, 160 and 170 via network 120.
  • a typical transaction may be represented by a request to store information or access information directly stored on data server 190 or on some other computer or computer system that is logically connected to data server 190.
  • the request to store or retrieve information may include requests involving any type of digitized data, whether voice, text, graphics, etc. and the information may be stored in any format known to those skilled in the art.
  • search appliance 180 represents a network accessible computing system configured to act as a network-based indexing and search apparatus capable of indexing data, receiving search queries and processing die search queries to return one or more data files accessible via network 120, and any other appropriately designated computers, that are responsive to the search queries.
  • a typical transaction may be represented by a request to all files containing certain keywords or phrases from the data store contained on data server 190 or stored on some other computer or computer system that is logically connected to data server 190.
  • the request to retrieve data may include search requests involving any type of digitized data, whether voice, text, graphics, etc. and the information may be stored in any format known to those skilled in the art.
  • search appliance 180 is configurable automatically via a UDP client/ server model, or using a user interface comprising displayable web pages using a standard web browser.
  • search appliance 180 is physically connected to network 120. After the physical connection has been made, as is described in more detail below, search appliance 180 transmits a message containing identification information via User Datagram Protocol (UDP) and network 120 to configure search appliance 180. Once configured on network 120, search appliance 180 can be used to identity sharable resources available on the network, and maintain a search repository, or database, of search information. In response to a search request, the search appliance 180 uses the search database to search information on the network. In one embodiment of the invention, search results are scored, or ranked, according to one or more scoring mechanisms.
  • UDP User Datagram Protocol
  • the UDP client/ server model used in one or more embodiments of the invention addresses an issue present when installing a network appliance on a network, such as network 120. That is, when configuring a network appliance, such as search appliance 180, on network 120, it is necessary to configure the device for network communications, e.g., TCP/IP Ethernet communication. For example, in a TCP/IP network environment, an IP address and subnet mask should be established for search appliance in order to operate over TCP/IP within the network in which it is deployed.
  • Another approach, which can be used with embodiments of the present invention, to configure a network device such as the search appliance 180 involves the use of BOOTP, or the superseding and encompassing DHCP, to obtain IP settings.
  • BOOTP the superseding and encompassing DHCP
  • search appliance 180 e.g., identify valid IP settings, for communication on network 120.
  • search appliance 180 e.g., identify valid IP settings
  • this approach provides an ability to establish initial communication, between search appliance and data server 190.
  • the UDP host/servet contemplates the use of a set of connectionless UDP broadcast messages that can be used to communicate between a network device, e.g., network data server 190, and search appliance 180, without the need for search appliance 180 to be configured with TCP/IP settings, e.g., a TCP/IP address.
  • a network device e.g., network data server 190
  • search appliance 180 without the need for search appliance 180 to be configured with TCP/IP settings, e.g., a TCP/IP address.
  • TCP/IP settings e.g., a TCP/IP address.
  • client/ server model is described with reference to UDP, other protocols may be used.
  • a communication protocol defining a set of messages used to communicate with search appliance 180 is described, it should be apparent that other, messages types can be used to communicate with search appliance 180 via UDP, or other network protocol.
  • the communication protocol defines a structure for messages used in implementing the UDP client/ s erver model.
  • examples are provided of use cases to illustrate end-user network setup using the UDP client/ server model.
  • messages can be passed between UDP client and server. More particularly, message types are presented in terms of commands issued by the UDP client, e.g., a networked device such as data server 190, to one O ⁇ more UDP servers, e.g., search appEance 180.
  • a typical command consists of a message sent by a UDP client to one or more UDP servers listening on a dedicated port.
  • a response message can be in the form of a message sent by one or more UDP servers back to the UDP client, which in turn listens on its own dedicated port.
  • the command types are different from remote procedure calls at least with respect to the transmission of messages in the form of UDP limited broadcasts, which are connectionless, and thus, without state. In particular, there is no guarantee that an intended recipient of a message will actually receive the message, ⁇ lessages ate broadcast to all devices on the network segment. Examples of messages/commands that can be used with the UDP client/ server model of one or more embodiments of the invention are as follows:
  • the first command, the POL message is issued by a UDP client, e.g., data server 190, to identify all of the UDP servers, e.g., instances of search appliance 180, in a network, or network segment.
  • a UDP client e.g., data server 190
  • Each UDP server that receives a POL message replies with a PLR message.
  • additional messages can be sent to specific ones of search appliance 180 to cause search appliance 180 to perform an operation specified by the message.
  • another message that can be issued by a UDP client a GET message, which requests IP information from a specific UDP server, a specific instance of search appliance 180.
  • the intended UDP server replies with a GTR message, which contains the requested information,
  • Another message issued by a UDP client requests the recipient UDP server to set its IP state.
  • the intended UDP server replies with a STR message, which indicates the result, e.g., success or failure, of the requested operation.
  • a RES message can be issued by a UDP client to instruct a specific instance of the UDP server to initiate a reset operation to reset its state, which is accompanied by a restart of the appliance.
  • each message is no greater than 512 bytes in length.
  • four of the messages are sent by the UDP client to the UDP server to inmate an operation to be performed by the UDP server.
  • the remaining types of messages identified above are sent by a UDP server to the UDP client in reply.
  • Kach message body identifies the sender via a MAC address field.
  • the POL message sent by the UDP client is intended for all UDP servers that might be listening.
  • the remaining message types axe intended for a specific recipient, as is identified by its ⁇ lAC address in the message body.
  • One example of the structure and syntax used for the seven message types is shown below
  • each instance of search appliance 180 continuously runs a LDP server and is configured in the factory' to accept an IP addressed leased to it by a DHCP server running in its network If a DHCP server does not exist in the network, then 1CP/IP configuration of search appliance 180 occurs through commands received by the UDP server executing in search appliance 180, using the UDP client/server model described above.
  • the UDP client/server model described herein for use with one or more embodiments of the invention is provided to the end user for uses including the following: (i) discovering all search appliances 180 connected to the network, e g , network 120, (ii) obtaining the IP address and subnet mask of a specified search appliance 180 so discovered, and (iii; setting the IP address and subnet mask of a specified search appliance 180 so disco ⁇ ered borne example scenarios encoumeied b ⁇ the end user, and the actions that can be taken, are categorized belou [00072] In one such scenario, search appliance 180 boots in a network containing a DHCP server.
  • search appliance 180 obtains a valid IP address from the DHCP server, and network setup of the search appliance 180 can be completed without a need for the UDP client/ server model described herein.
  • the following are among the alternatives available to the user in a case that the network contains a DHCP server:
  • the end user may run the UDP client/server bootstrap client on the network server to discover a search appliance 180 connected to the network, for example, to:
  • search appliance 180 boots in a network that does not contain a DHCP server.
  • search appliance 180 waits for its IP address and subnet mask to be set, e.g., using the SET command of the UDP client/server model from the UDP server.
  • the end user configures the appliance within the network by running the program code which implements the UDP bootstrap client on the network device, e.g., data server 190.
  • the UDP bootstrap client communicates with instances of search appliance 180, as described above, to allow the user to discover each instance of search appliance 180 and issue the command to set its IP address and subnet mask, to configure search appliance 180 for network communications.
  • the end user may run the UDP boostrap client to discover one or more instances of search appliance 180 to, for example:
  • FIG. 1 shows only a few computers 150, 160, and 170 connected to network 120, if is anticipated that dozens or hundreds or even thousands of similarly configured computers 150, 160, and 170 can be "indexed" and searched using instances of search appliance 180. In one or more embodiments of the present invention, multiple computers 150, 160, and 170 will all be configured to communicate with search appliance 180 and one or mote data servers 190 and with each other via network 120.
  • search appliance 180 a user of a computer, such as one of computers 150, 160, and 170, can initiate a search request to locate and retrieve desired data files from data server 190, for example, with the search request being received and processed by search appliance 180. In response to receipt of such a request, search appliance 180 will, if appropriate, provide access to die requested data files to the requester.
  • search appliance 180 a user of one of computers 150, 160, and 170 , for example, may request and retrieve information in this fashion from not only data server 190, but from any other computer or computer system coupled to network 120, indexed using search appliance 180.
  • search appliance 180 it is possible to submit a search request, review the results of a search, and mdex volumes of data located on a local shared resource, at a remote location connected to network 120, and across an intranet and die Internet.
  • search appliance 180 it is contemplated that the present invention may be used for other searching applications, including for example, electronic discover ) ' and computer forensics.
  • FIG. 2 a block diagram illustrates one example of an internal architecture of search appliance 180 in accordance with one or more embodiments of the invention.
  • Search appliance 180 may also be configured with various additional software components (not shown this FlG.) such as servers, firewalls, comprehensive security softwarc, and the like. Given the relative advances m the state-of-the-art computer systems available today, it is anticipated that functions of search appliance 180 may be provided by many standard, readily available computing devices and systems.
  • Search appliance 180 suitably comprises at least one Central Processing Unit
  • FIG. 2 is not intended to be an exhaustive example, but is presented to simply illustrate some of the salient features of search appliance 180.
  • Processor 210 performs computation and control functions of search appliance 180, and comprises a suitable central processing unit (CPU).
  • processor 210 may comprise a single integrated circuit, such as a microprocessor, or may comprise any suitable number of integrated circuit devices and /or circuit boards working in cooperation to accomplish the functions of a processor.
  • Processor 210 suitably executes one or more software programs contained within main memory 220.
  • Auxiliary storage interface 240 allows search appliance 180 to store and retrieve information from auxiliary storage devices, such as external storage mechanism 270, magnetic disk drives (e.g., hard disks or floppy diskettes) or optical storage devices (e.g,, CD- ROM).
  • auxiliary storage devices such as external storage mechanism 270, magnetic disk drives (e.g., hard disks or floppy diskettes) or optical storage devices (e.g,, CD- ROM).
  • auxiliary storage devices such as external storage mechanism 270, magnetic disk drives (e.g., hard disks or floppy diskettes) or optical storage devices (e.g, CD- ROM).
  • DASD direct access storage device
  • DASD 280 may be a floppy disk drive that may read programs and data from a floppy disk 290.
  • signal bearing media include: recordable type media such as floppy disks ⁇ e.g., disk 290) and CD ROMS, and transmission type media such as digital and analog communication links, including wireless communication links.
  • Memory controller 230 through use of an auxiliary processor (not shown) separate from processor 210, is responsible for moving requested information from main memory 220 and/or through auxiliary storage interface 240 to processor 210, While for the purposes of explanation, memory controller 230 is shown as a separate entity-; those skilled in the art understand that, in practice, portions of the function provided by memory controller 230 may actually reside in the circuitry associated with processor 210, main memory 220, and/ or auxiliary storage interface 240.
  • Terminal interface 250 allows users, system administrators and computer programmers to communicate with search appliance 180, normally through separate workstations or through stand-alone computer systems such as computer systems 170 of FIG. 1.
  • search appliance 180 depicted in FIG. 2 contains only a single main processor 210 and a single system bus 260, it should be understood that the present invention applies equally to computer systems having multiple processors and multiple system buses.
  • system bus 260 of one or mote embodiments of the present invention is a typical hardwired, multi-drop bus, any connection means that supports bi-directional communication in a computer-related environment could be used.
  • Main memory 220 preferably contains an operating system 221, a user interface 222, a database management system 223, together with program code to implement an index mechanism 224, a search mechanism 225, a report mechanism 226, a scoring mechanism 227, and preferably a security mechanism 228.
  • the term "memory" as used herein refers to any storage location in the virtual memory space of search appliance 180. It should be understood that main memory 220 may not necessarily contain ail parts of all components shown. For example, portions of operating system 221 may be loaded into an instruction cache (not shown) for processor 210 to execute, while other files may well be stored on magnetic or optical disk storage devices (not shown).
  • main memory 220 may consist of multiple disparate memory locations.
  • Database management system 223 is preferably a relational database management system, together with various data model, or schema, definitions, and data stored according to the data model, such as is described in more detail herein.
  • the data stored using database management system 223 may change from query to query, depending on updated made ro the stored data using database management system 223.
  • search appliance 180 can include additional components, not shown in this FiG.
  • embodiments of the present invention include a security mechanism 228 for verifying and validating user access to the data files located by search appliance 180.
  • Security mechanism 228 may be incorporated into operating system 221.
  • security mechanism 228 may also be configured to provide different levels of security and/or encryption for computers 150, 160, and 170 and data server 190 of FIG. 1.
  • security' mechanism 228 may be determined by the nature of a given search request and/or response to the search request, including the identity of the requestor.
  • security' mechanism 228 may be contained in or implemented in conjunction with certain hardware components (not shown this FIG.) such as hardware- based firewalls, routers, switches, dongles, and the like.
  • Operating system 221 includes the software that is used to operate and control search appliance 180.
  • processor 210 typically executes operating system 221
  • Operating system 221 may be a single program or, alternatively, a collection of multiple programs that act in concert to perform the functions of an operating system. Any operating system known to those skilled in the art may be considered for inclusion with the various embodiments of the present invention.
  • user interface 222 may take another form, it preferably comprises web pages, which can be displayed, using a browsing software application such as those identified heiein, on a monitor locally coupled to search appliance 180 and/or displayed on a monitor coupled to computer connected to search appliance 180 via network 120, such as computer systems 150, 160 and 170.
  • User interface 222 may be used to configure the various components shown in memory 220, including index mechanism 224, search mechanism 225, report mechanism 226, scoring mechanism 227, and security mechanism 228.
  • Database management system 223 is representative of any suitable database known to those skilled in the art. As discussed above, in one or more embodiments of the invention, database management system 223 is a relational database. As such, database management system 223 uses a Structured Query Language (SQL) to manipulate (e.g., create, update, query, etc.) data stored in the database. While database management system 223 is shown residing in main memory 220, it should be noted that database management system 223 may also be physically stored in a location other than main memory 220. For example, database management system 223 ma ) ' be stored on external storage device 270 or DASD
  • SQL Structured Query Language
  • database 223 will contain keywords for the content contained or accessible via a corporate intranet or the Internet.
  • Database management system 223 can consist of multiple disparate databases stored on many different computers or computer systems.
  • search appliance 180 includes a network interface for connecting to network 120, together with the network protocols needed to communicate via network 120.
  • search appliance 180 includes the suite of protocols typically referred to as the Transmission Control Protocol/Internet Protocol, or TCP/IP.
  • Index mechanism 224 is a user configurable indexing tool for categorizing various types of information and creating an index to be used in conjunction with searching and retrieving information over network 120, such as from data server 190.
  • Index mechanism 224 may be configured manually with various levels of user intervention or programmatically, depending on the specific type of data to be indexed. Index mechanism will perform an initial index and will be configured to re-index the data files contained in database 223 at user-specified intervals, thereby ensuring that the contents of database 223 are capable of being searched in an effective and efficient manner.
  • Search mechanism 225 can include a web-based software application accessible via a graphical user interface such as user interface 222 for the purpose of requesting and retrieving information from database 223.
  • search mechanism 225 can include a Natural Language Processor (NLP) based search engine that, in conjunction with the other components of search appliance 180, such as indexing mechanism 224, index 229, scoring mechanism 227 and report mechanism 226, for example, provides a robust search tool for locating and retrieving desired content.
  • NLP Natural Language Processor
  • search mechanism 225 In general, a user of computers 150, 160, and 170 of FIG. 1 will access search mechanism 225 via a standard web browser such as Safari, FireFox, Netscape, Internet Explorer, etc. By using search mechanism 225, the user will be able to request information. This requested information, if available, will be provided by accessing database 223. Search mechanism 225 will serve as the interface to the information stored in database 223. It is anticipated that various reports related to the information contained in database 223 will be generated by report mechanism 226, which preferably includes a browser-based user interface for displaying search results.
  • Report mechanism 226 preferably provides for output, either via a hard copy or display on a monitor, a variety of reports, including reports of the results from accessing database 223 via search mechanism 225. These reports will typically include the results of die various searches performed by a computer user, such as computer system 170 of FIG.l. These various reports will be formatted and presented to the user based on the specific type of request made by the user and the type of information to be returned to the user.
  • Scoring mechanism 227 is provided and configured to score and rank the results obtained by search mechanism 225 in response to a user's search query. While those skilled in the art will recognize that various scoring methodologies may be employed, scoring mechanism 227 is specifically designed to provide an easily implemented yet highly effective methodology for presenting search results in a way most likely to rank the most relevant results first. In one or more embodiments of the present invention, scoring mechanism 227 is user configurable, allowing the user to determine which features and scoring factors (weighting methods) to apply when search results are returned in response to given search
  • scoring mechanism 227 comprises a ranking of documents returned from a search query in order based on the total number of occurrences of the ⁇ ⁇ unique stem words contained in the original search query. In the case where M document results arc returned, then the /?l b result is ranked according to the formula shown below:
  • K ⁇ is the frequency of occurrence of the ⁇ h stem word within the m' h document. Note, however, that this "frequency weighting" formula does not provide any special consideration for occurrences of more than one stem word in a document. Using this ranking scheme, the sum of the frequencies of all the stem words is measured.
  • the first result contains 10 occurrences of the first stem word while the second contains 5 occurrences of each stem word. If the only measure of relcvancy is the total number of "hits" for the stem words, then both documents would be scored the same and would have the same relevance in the search result. However, in this case, it would probably be appropriate to consider the second tesult to be more relevant than the fkst result since the second result contains both stem words identified in the original search query. Accordingly, it is desirable to modify the original scoring formula to account for the occurrences of both stem words occurring in the same document, thereby increasing the probability that the most relevant documents will be identified m response to a given search query.
  • the original formula is expanded using combinatorial analysis, by introducing combinations of the products of frequencies, in ever higher-order products, to an order equal to the number of stem words in a given multi-keyword search query. Additionally, in order to maintain scale, each product created in this fashion is most preferably scaled to the size of the original term, and thus, to each term that precedes it in the expansion. This is accomplished by dividing each product by the appropriate
  • the modified formula may also be written in a fottn that is easier to apply within a computational setting, as shown below:
  • the second term of this formula corrects for the simultaneous occurrences of pairs of the three words within the document while the third term corrects for the simultaneous occurrence of all three words in the document.
  • Table 3 the document containing all three stem words, corresponding to m—2 y is ranked higher than others of the same overall count. Additionally, the document ol m—3 is ranked second since it contains two out of the three stem words.
  • a document containing only one stem word will not always be scored or ranked lower than a document containing multiple stem words. For example, if the document corresponding to m—4 had a third-word count of 30 instead of 15 it would be deemed more relevant than the document of m—2. Nevertheless, when total frequency counts for keywords are comparable, the scoring formula of the present invention produces increased relevancy when multiple keywords from a search query are found in a given document.
  • represents the minimum allowable number of unique stem words that can appear in a document.
  • the score corresponding to a document is set to zero if the number of unique stem words appearing in the document is less than ⁇ , This allows document results with correspondingly high total frequencies but little correlation betu een
  • proximity weighting can be added to further enhance the relevancy of the search results.
  • proximity weighting when using more than one keyword in a search query, additional emphasis can be given to those search results where the key ⁇ words are in close physical proximity to each other in the document. This allows the result set to consider those instances to be more relevant than those results where the key words are not located in close proximity to each other,
  • Category weighting allows a user to specify specific document types as being more relevant for a particular search request. For example, if two documents (one an email document and one a word processing document) are found to be responsive to a given search request and both documents contain the same number and frequency of keywords; category weighting may be used to break the tie. If the user has specified that the most important document category is email, then the email document will be deemed more relevant and will be displayed higher in the search result listing than the word processing document.
  • location weighting can be used to further identify the most relevant results provided in response to a search query.
  • location weighting when keywords are found in the most prominent locations of a document, that document is given a higher score or ranking in the overall search results.
  • the most prominent location may vary for a given document or documents but examples include the title of a document for word processing documents, the subject line for an email message, etc. Those skilled in the art will recognize that other prominent locations could be identified and incorporated into various embodiments of the present invention.
  • the user will be able to select any or all of the various features of scoring mechanism 227 including standard frequency weighting, enhanced frequency weighting, proximity weighting, category weighting, and location weighting.
  • search appliance 180 of FIG. 2 will typically include a security mechanism 228.
  • Security mechanism 228 is configured to provide a security' model for providing enhanced search results, based on the identity and role of the searcher.
  • security mechanism 228 employs a log-in model where each user must have a user ID and a password to authenticate their identity on the network and to access search mechanism 225. Security mechanism 228 is described in more detail below.
  • Index 229 represents the index that is constructed by index mechanism 224, based on the content stored m shares accessible via network 120.
  • Index 229 is used by search mechanism 225 to locate content relevant to a given search query presented by a user of a computer, such as one of computers 150, 160, and 170.
  • Index 229 will be periodically rebuilt at a configurable interval in order to accurately reflect any changes made to the content m shares accessible via network 120.
  • index 229 is shown separate] ⁇ from database management system 223, it should be appreciated that index 229 can be created and maintained using database management system 223.
  • a discussion of one example of a data model used for indexing and searching is provided below.
  • index mechanism 224 search mechanism 225, report mechanism 226, scoring mechanism 227, and security mechanism 228 are shown as separate entities in FIG. 2, index mechanism 224, search mechanism 225, report mechanism 226, scoring mechanism 227, and security mechanism 228 may be combined into a single software program or application or program product.
  • FIG. 3 a process 300 of maintaining and updating the index for the data files used in conjunction with a search appliance in accordance with one or more embodiments of the present invention is depicted.
  • the initial indexing of the data files to be searched is accomplished by first mounting all appropriate target volumes (step 310).
  • network 120 is searched to identify sharable resources, or shares. More particular] ⁇ , search appliance 180 searches, also referred to herein as crawling of web crawling, the network for sharable resources, or shares, and maintains/updates a repository of information, using database management system 223, associated with each share to facilitate indexing and/or search. It is important to note that search appliance 180 is capable of performing network searches, including all files stored on a server or network of servers determined to be shared, not mere HTlT (rndex.htm) searches.
  • a sharable resource may be a hard disk drive, or odier storage media, fixed or removable, or one or more folders, files, documents, pages etc.stored thereon, with "sharable" access rights.
  • sharable resources can include web pages typically displayed via web browser.
  • the initial index will be built using database management system 223 index mechanism 224 (step 320).
  • the original indexing may be accomplished by any means known to those skilled in die art.
  • the creation date and/or last modified date for each data file is captured and stored.
  • a keyword database is constructed (step 330) using the key words or terms contained in the data files stored on data server 190. This keyword database will be later accessed by search mechanism 225 when a search query is submitted by a user.
  • the database model used to store indexing and shared resource information is discussed in more detail below.
  • the index is re-bvult to identify changes in sharable resources, e.g., resources for which the sharable characteristics have changed, and/or to identify changes in content to be reflected in the index.
  • sharable resources e.g., resources for which the sharable characteristics have changed
  • the previously captured creation date and/or last modified date will be examined and compared with a modification date associated with each file that is to be indexed. If there has been no change in the relevant date, then the file need not be re-indcxed and the key words associated with that file need not be modified in the keyword database. However, if an existing file has been modified, as determined by examining the previously captured date with the new file modification date, the new modification date will be captured and the document will be re-indexed and the keywords associated with that document will be updated in the keyword database.
  • Security mechanism 228 is preferably configured to provide various levels of security functionality, In one or more embodiments of the present invention, both indexed content and query results are protected from unauthorized access by security mechanism 228. The approach to securing data from unauthorized access may be implemented at the enterprise level and also deployed at the desktop, as appropriate or desired. In one or more embodiments of the present invention, security mechanism 228 comprises an internal database, used by security mechanism 228 to track a variety of user and context sensitive information m order to ensure access to information only by approved system users.
  • database 440 may comprise data from multiple disparate data stores and the security assigned to the data in database 440 may van- from dataset to dataset.
  • database 440 is comprised of three separate data stores identified as domain 1, domain 2, and domain 3. Those skilled in the art will recognize that the use of three separate data stores and domains is for illustration only and more or fewer data stores and/or domains may be used in conjunction with various embodiments of the present invention.
  • Security for search results returned by search mechanism 225 and reported ⁇ * ia report mechanism 226 may be implemented via the role-based administration of web services. More particularly, a system of one or more federated servers is constructed in which a password-protected, server-shared database is used to define relational tables that store various types of administrative information and correspondences.
  • users, groups, domains, user roles, and domain groups are defined security components and used by security * mechanism 228 to allow or deny access to various types of data stored in database 440 or potentially accessible via search mechanism 225, depending on die status of the various security components.
  • security mechanism 228 can be used to provide customized search results and protect sensitive data files.
  • User 1, User 2, and User 3 are assigned to user group 410.
  • User 3 and User 4 ate assigned to user group 420.
  • user 4 and user 5 are assigned to user group 430.
  • each of user 2, user 3, and user 4 submit the same search query to database 440.
  • security mechanism 228 allows dataset 450 to be returned to user 2.
  • dataset 460 is returned.
  • security mechanism 228 allows dataset 470 to be returned.
  • the various sy stem user security components define all registered users of the system and provide a framework or methodology for determining which users may access which information, lhe information relative to each user is stored in die database tables associated with the database for security mechanism 228
  • the various fields typically include at least the unique username and a password for each user of search appliance 180 of FIG. 1.
  • Group permissions are similarly stored in a database table which includes fields such as a name for each permission group, where a permission gtoup is a customized text string descriptive of a role or function of the enterprise, such as "sales,” “support,” or “admin " A user may inherit security -related permissions and restrictions, based on the specific group permissions for the group to which the user is assigned.
  • Searchable domains are stored in a database table whose fields define the location, such as a website URI text string, of each domain from which content may be extracted by indexing operations conducted by index mechanism 224 at the request of a user.
  • a user may be restricted to searching only those domains that are identified in the searchable domains tables for that user and/or for the specific group to which that user belongs.
  • User roles are stored in a database table whose fields serve to relate system users to group permissions, thus defining one or more roles a user plays within the enterprise. Specifically, a field exists in which a primary key of the system users table may appear in multiple records, each time uniquely corresponding to a second field containing a primary key of an entry of the group permissions table.
  • Domain groups are similarly stored in a database table whose fields serve to relate searchable domains to group permissions, thus associating a domain with one or more group permissions of the enterprise.
  • a field exists in which a primary key of the searchable domains table may appear in multiple records, each time uniquely corresponding to a second field containing a primary key of an entry of the group permissions table
  • the above database tables and their relationships are sufficient to provide a role- based security protocol for protecting the results returned from a given user search request. More particularly, using the same security components and sequence/numbering scheme identified above, a specific security protocol can be implemented.
  • User authentication is provided via a match of input username and password to those stored in the system users table, identifying the user as the individual claimed.
  • the text string names of groups of the enterprise are obtained from the group permissions table. Domains of content within ot without the enterprise are obtained from the searchable domains table.
  • the user roles table indicates the groups to which the authenticated user belongs.
  • the domain groups table indicates, for a given searchable domain, what groups of users may access that domain's content, and thus, via the user roles table and the matching of group permissions primary keys, what searchable domains the authenticated user has privilege to see
  • the above administrative information can be applied to filter the query of a search request, so as to return only information from those domains the authenticated user is permitted to see, based on that individual's role within the enterprise.
  • the level of granularity of search restriction is generally that of a searchable domain since group permissions are assigned to searchable domains.
  • the access granted users is not usually granted at the level of individual documents, as in a typical file system.
  • an administrator may define searchable domains with a granularity that can vat ) * from finely grained (as a single file), to medium grained (as a set of sub directories), or coarsely grained (as an entire website).
  • the granularity of group permissions is variable, depending on how the searchable domains are defined. Since documents of a common level of sensitivity are t) pically grouped together, domains, are generally defined correspondingly.
  • step SlO When a search request is received from a user (step SlO), search mechanism 225, in conjunction with database 223 and index mechanism 224 can be deployed to perform the requested search and retrieve the results (step 520).
  • scoring mechanism 227 may be deployed to further enhance the search results.
  • any or all of the various weighting mechanisms previously described may be used to enhance the search results.
  • a user may determine that the desired search results can be enhanced by applying frequency weighting (step 530), proximity weighting (step 540), category weighting (step 550), and or location weighting (step 560). Since the application of these various weighting factors is user configurable, it is possible for each user to configure scoring mechanism 227 for maximum benefit.
  • search results can be ordered (step 570) and presented to the user (step 580). In this fashion, the search results can be enhanced and customized for each individual user of search appliance 180.
  • a search model is used to facilitate searching performed in response to a query consisting of one or more keywords, for example.
  • the search model includes a data model used for searching, indexing and ranking operations, techniques such as word stemming and parts-of-speech tagging, and a lexicon that can learn new words encountered while performing initial and incremental indexing.
  • the search model can use a pipeline architecture, as is described in more detail below.
  • the search model can also include scoring, or ranking, of search result items, e.g., documents, such as that performed using scoring mechanism 227 to rank the results of a query used with one or more embodiments of the present invention.
  • Word stemming can be used to remove common morphological and inflectional endings from words, so as to normalize terms.
  • One example of such a word stemming mechanism is the Martin Porter Stemming Algorithm, a fuller discussion of which is found at http://www.tartarus.org/ ⁇ marun/PorterStemmer/, which discussion is incorporated herein by reference.
  • One example of parts-of-spcech tagging is the University of Pennsylvania (Penn) Treebank Tagset For example, see the discusssion found at http.//www.comp.leeds.ac uk/arnalgam/tagscts/upenn.htrnl, which discussion is incorporated herein by reference.
  • a file import mechanism is used in embodiments of the present invention to achieve efficiencies. More particularly, m view of the numbers of records to be created in generating a search model index, use of an SQL IKSERT to insert records in database tables m a relational database is particularly time consuming and impractical. Accordingly, in embodiments of the invention, data that is to be inserted into the database is first written to temporan files, or buffers, and then imported into the database.
  • One example of an exception to this approach involves the domain table, which defines an auto incremented index field, and the key table, which maintains counts of indices. Since relatively few records are involved, the file import mechanism need not be used m creating records in the domain and key tables.
  • the domain, M ⁇ , and page tables are used to store information about the document pages that arc visited during indexing.
  • a domain refers to a location where documents are stored, such as a website or file directory. According to dns model, every- domain that is indexed is recorded as an entry in the domain table.
  • a document is referred to by its Universal Resource Indicator, or URI, which is associated with a specific domain.
  • liven document that is indexed is recorded as an entry in the un table, tor even page visited there is a record entered into the page table that corresponds to a specific document and domain, for example, when an e-mail archive that resides in a file system director;' is indexed, each e-mail of the archive is recorded as an entry of the page table.
  • the lexicon and rank tables are used in indexing the information accessible via network 120, ⁇ lore particularly, the lexicon table, which contains the learning dictionary of the keyword search model, contains an entry for every original, case-insensitive word known to the indexing algorithm, including the parts of speech of each word.
  • the pos field which is a comma delimited list of tags constructed, for example, from the Penn Treebank tag set.
  • the lexicon table contains an entry for every stem word that can be constructed from the set of known original words. Every entry in the lexicon table is associated with a unique index, denoted by the /key field.
  • the ukey field is a specific lkey index corresponding to a stem word.
  • the ukey field is used to establish a relationship between ever original word and its corresponding stem word, within the same table. That Is, for example, every stem word entry in lexicon is self-referential, such that the values of lkey and skey of a stem word entry are identical.
  • ⁇ n entry in the rank table records the frequency of occurrence of a stem word within a document page, as it is known within the lexicon table.
  • the word table records the positions of original words encountered during indexing, so that they may be highlighted in subsequent search result presentations.
  • the original words need only be referred to by their corresponding stem words, hence the appearance of the field skey within the definition of the word table.
  • buffering and a file import mechanism can be used in one or more embodiments of the present invention.
  • a data structure is used to provide a buffer for data before it is written to the database.
  • the data that is buffered corresponds to die fields in the mi, page, rank, and word tables.
  • buffered data is preferably written at the end of indexing, or when memory availability reaches a predefined threshold, requiring a flush of data to free the memory.
  • New records are written to the tables from the buffered data via a file import mechanism, and existing records can be updated via an SQL UPDATFi command.
  • ⁇ r ⁇ ary Trie tree Another type of data structure used in indexing is an ⁇ r ⁇ ary Trie tree, where ⁇ " is the number of (upper case) characters in die alphabet, plus digits and punctuation marks.
  • This tree structure can be used to hold in memory die contents of the entire lexicon and to provide fast lookups (e.g., a word lookup). Initially and prior to commencing indexing, the tree structure is populated using the contents of the lexicon table. If new words are encountered during indexing, they are added to die tree. At the end of indexing, the contents of die tree are written back to die lexicon table.
  • the tree's contents are written back to the lexicon table using a file import mechanism, as discussed
  • entries in the tree which represent new words found during indexing are imported to the lexicon table via a temporary buffer, or file, using a file import mechanism.
  • the ⁇ " -ary Trie tree structure is ideally suited for use with large dictionaries of words because text-string lookup within the Trie structure is quite fast.
  • Each node of the tree contains an array of size N, where each element of the array is potentially a child node.
  • FIG. 1 An example of a 3-ary Trie tree is provided below, which constructed from an alphabet consisting of the upper case letters A, B, and C.
  • the elements (circles) of the 3-size arrays (rectangles) depicted below follow this same sequence. That is, the first element corresponds to A, the second to B, and the third to C. Shaded circles represent allocated nodes.
  • the squares represent the allocation of data at a node, such as the parts of speech of a word.
  • the example of the 3-ary Trie tree depicts the storage of data for the words AB, ABC, C, and CC.
  • indexing can be performed using a pipeline thread architecture. More particularly, the sequential nature of indexing can be broken up into segments and assigned to the multiplexing stages of the pipeline, so as to enhance throughput. For example, web crawling can be assigned to the first stage of the pipeline, the second stage can be used to perform initial format parsing of documents. Additional stages may be needed for further passes through documents (such as to apply sophisticated image recognition algorithms). In one of the final stages of die pipeline, indexed content can be written to the working store.
  • a single multiplexing stage can be assigned to perform all of the tasks of indexing, from web crawling, to format parsing, to indexing of words.
  • indexing procedure we refer to the concatenation of all of these sequential tasks as the indexing procedure.
  • ue discuss only the salient features of it, those aspects that might be construed as unique or noteworthy.
  • indexing includes a parsing of documents, or other items found on network 120, to identify new words to be added to the lexicon.
  • indexing identifies the words contained within the document, the locations of each of these words, and a frequency of occurrence of the words found m the document.
  • embodiments of the present invention contemplate the ability of the lexicon to learn new words.
  • indexing begins, the current content of the lexicon is loaded into memory, as discussed herein. This includes any predefined entries whose parts of speech and corresponding stem words have been carefully reviewed, such as by visual inspection.
  • stem words are estimated using the Porter stemming algorithm, for example.
  • each new word is assigned a default part of speech, such as by using the NN tag of the Penn Trcebank tag set, for example.
  • the lexicon of the keyword search model can be initialized, e.g., in a version shipped to the end customer, with predefined entries or no entries at all
  • incremental indexing which can be used uith a keyword search model used in one or more embodiments of the present invention.
  • two distinct time values (i) the start time, index _t ⁇ me, of the indexing procedure and (ii) the last modification time, lasf_mod_l ⁇ me, arc maintained for each document visited. These values are stored, respectively, m the ⁇ t ⁇ dex_t ⁇ me and Jast_jnod_t ⁇ jne fields of each record of the un table of the database schema set forth above.
  • document information stored in the un table is preferably loaded into a data structure m memory to facilitate comparison of last modification times. If the document cannot be found in the data structure, it is added to the data structure, together with its last modification time and the start time of the present indexing. If the document is found in the data structure, dien its modification time is compared to the modification stored tn the data structure corresponding to the document. If the two times are equal then the document is not indexed again. Otherwise, the document is again fully indexed, i.e., every page, and the information pertaining to the document, including its last_mod_t ⁇ me and index _hme, is updated in the data structure.
  • a "final scrub" of the database can be performed prior to completing an indexing operation is about to complete. This, final scrub can remove obsolete records from the database. For example, those entries that correspond to documents that are identified during the indexing operation as no longer existing (e.g., a document no longer resides within the domains indexed by the current indexing operation) or for whatev er reason no longer able to be indexed. Documents
  • Obsolete records of the uri table are those whose values within the indexjime field do not equal the present start time of indexing.
  • the query is processed against the search model described above.
  • the example query includes a keyword, "FOO", which is taken from the user request (e.g., the user request might involve a request for documents containing the word "FOO")
  • the query shown below is an SQL query involving the lexicon table of the keyword search model, which is used look up each unique keyword in the lexicon table of the model database.
  • the lexicon table of the database contains entries for words and their stems and maintains a relationship between each word and its stem.
  • a further SQL query of the database can be performed to obtain the frequencies of occurrence of the stem word within the pages of indexed documents.
  • An example of this later SQL query follows: SELECT domain_name, uri, page_num, ⁇ age__title, ⁇ age_freq FROM tank, page, uri, domain
  • the above SQL query is an example of an inner join that exploits the relationships between the document, page, and rank tables, which were introduced earlier.
  • the relevant pages of documents can be returned to the end user after the scoring operation, such as that performed by scoring mechanism 227 described herein, is applied to sort the results.
  • results with a score of zero can be pruned from the list before return to the end user.
  • search appliance 180 identifies servers which provide shared resources, or shares. Servers are identified using several methods depending on the characteristics of the target network.
  • search appliance 180 can browse the network address space (e.g., the network address space of search appliance 180) using network browsing tools and/or use director)' services to End shared resources.
  • the search appliance 180 can locate resources by browsing the network using a browser service.
  • a browser service or server, provides a list of available resources on a network domain.
  • a master browser maintains the main or master list of computers and shared resources. For example, all workgroups or domains can have one master browser.
  • a master browser maintains a master list of shared resources, and browser servers maintain a subset of the master list of shared resources. These lists are updated periodically to reflect shared resources added or removed.
  • search appliance 180 searches network 120 to identify sharable resources using SAMBA, an open source utility suite which provides information about shared resources. Documentation for the SAMBA utility suite can be found at www.samba.org.
  • SMBtree which can be used to browse the network to identify a list, e.g., in the form of a tree, showing known domains, the servers in those domains, and the shares on the servers. It has been determined by the inventor of the present invention that this utility does not necessarily provide an accurate and complete listing of the domains, servers and/or shares. Accordingly, in accordance with embodiments of the present invention, other SAMBA utilities are used to supplement the SMBtree utility, in order to obtain a more complete identification of shares accessible via the network.
  • Another SAMBA utility a master and browser lookup utility, used to supplement, or in place of, the SMBtree utility, locates all of the browsers, i.e., the master browser and browser servers, on the network, together with their XetBIOS names.
  • Another utility, the SMBclient utility is then used in embodiments of the present invention to obtain directory information from the servers identified by the former utility.
  • the SMBtree utility can be used to provide a list of the servers and shares on the servers.
  • the search appliance 180 can be configured to find shared resources by consulting a director)- service.
  • search appliance 180 uses a director)' access protocol (e.g., Light-weight Director ⁇ ' Access Protocol, or "LDAP”) to consult directories, such as those directories maintained by Windows Domain Controllers, and Windows Catalog Servers, for example.
  • directories such as those directories maintained by Windows Domain Controllers, and Windows Catalog Servers, for example.
  • the process can be itcrarively performed until no new servers are returned.
  • the iterative process is implemented as a PERL script.
  • FIG. 8 which comprises FIGs. 8A and 8B, provides an example of pseudo code of a script for use in discovering shared resources in accordance with one or more embodiments.
  • search appliance 180 can examine network configuration information to determine the type of network services that are being used on the network.
  • the network configuration information can be obtained from information entered via a graphical user interface, for example.
  • search appliance 180 can be configured as a DHCP client, which communicates with a DHCP server to request network configuration information (e.g., IP address information, information regarding available domain name servers, NetBIOS servers and/or Windows TM Name Service-enabled servers, etc.).
  • network configuration information e.g., IP address information, information regarding available domain name servers, NetBIOS servers and/or Windows TM Name Service-enabled servers, etc.
  • search appliance 180 can retrieve shared resource information identified in a previous network search, as well as previously-supplied authentication information, In some cases, if not most, authentication information (e.g., username and password) must be supplied to a server to obtain information regarding the server's shared resources, or other information regarding the network.
  • authentication information e.g., username and password
  • search appliance 180 can use its IP address to identify an address space, e.g., a network block extent, and the IP addresses in the address space.
  • Search appliance 180 can search for devices that accept TCP connections on ports known to correspond to specific file sharing services. For example, a NetBIOS-over-TCP protocol set can be used to attempt to open a connection to a port (e.g., an SMB ports 139 and/or 445).
  • An Active Directory Service (ADS) LDAP can be identified by accessing port 389.
  • An accessible server is identified, and each server identified can be queried directly to identify shared resources (e.g., by obtaining a "share list" from an identified server).
  • a server name list is generated using the servers identified by a search of the address space. Each LDAP server found (e.g.. by attempting to open a connection to port
  • Each IP address found (e.g., by attempting to open a connection to ports 139 and 445) is used to identify a corresponding server name.
  • the NetBIOS or WINS protocols can be used to retrieve a server name corresponding to an IP address. If a server name corresponding to an IP address cannot be determined, the IP address is used as the server name.
  • An IP address can be resolved, and a corresponding server name identified, using a reverse lookup operation. For example a Domain Name Service (or DNS), which can typically be used to supply an IP address for a given server/domain name, can be used to identify a server name corresponding to an IP address.
  • DNS Domain Name Service
  • Each named server, or unresolvable IP address, identified can then be queried to obtain a share list.
  • domain or server-level authentication credentials e.g., login name and password
  • shared resources e.g., shared resources, or "share list”.
  • available authentication credentials e.g., from configuration/initialization information
  • a utility such as the SAMB A's SMBclient, can be used to request a "share list" from a named server, or IP address. For those servers/IP addresses lacking authentication credentials, or in a case that a server/IP address does not require authentication, the SMBclient can be used without authentication credentials. If authentication credentials are needed to retrieve the "share list", the SMBclient can be used with authentication credentials.
  • a "share list” If a "share list” is obtained, it can be examined, and server name information contained in the "share list” can be used to resolve a server name.
  • server name information contained in the "share list” can be used to resolve a server name.
  • a new server name is identified from the "share list” (e.g., a new server name is listed in the "share list” and/or information contained in the "share list” is used to resolve and previously-unresolvable IP address)
  • authentication credentials are identified (if available), and the server can be queried to retrieve its "share list", as previously discussed.
  • An obtained "share list” can be examined to identify shared resources, or shares, which can be accessed for shared files.
  • the "share list" can be examined to determine whether a previously- undiscovered domain and/or workgroup is identified, which can be added to a domain/workgroup list.
  • domain-level authentication credentials might be available for a newly-discovered domain, which credentials can be used to obtain a "share list”.
  • previously-undiscovered peer servers can be identified and added to the list of servers to be queried for a listing of shared resources.
  • An iterative discover)' process is used to discover named servers and IP addresses. In accordance with at least one embodiment, the iterative process continues until no new servers can be identified.
  • Shares discovered using the above-identified iterative process can be mounted to provide access to shared files. That is, for example, a mount operation which references a network device, such as a server or storage appliance and/or a file system, storage device, directory, file, etc. of the network device, makes the referenced item available for access.
  • a mount operation which references a network device, such as a server or storage appliance and/or a file system, storage device, directory, file, etc. of the network device, makes the referenced item available for access.
  • SAMBA SMB protocol/ file system implementation of SAMBA
  • older versions of the SMB protocol do not support digital signatures, or digital signing. This can result in an incompatibility with file systems that use an authentication technique, such as digital signing, in connection with, or as part of, a mounting operation.
  • more recent implementations of Microsoft's implementation of the CIFS protocol use digital signing for mount authentications.
  • the CIFS VFS i.e., Common Internet File System Virtual File System
  • CIFS VFS is used to mount shares discovered using die above-described iterative process.
  • CIFS VFS is an open source initiative in collaboration with Samba, which allows access to such shares as servers and storage appEances
  • CIFS VFS implements digital signing, and encompasses the S ⁇ lB protocol, and is compatible with newer Microsoft implementations of the CIFS protocol, of which S ⁇ lB is a predecessor.
  • CIFS VFS which implements digital signing and encompasses the SMB protocol, can be used to mount SMB file shares and the newer CIFS file shares, for example, particularly when digital signing is used within mount authentications.
  • FIG. 6, which includes FIG. 6A to FIG. 6O, provides illustrative examples of screens ftom a user interface of a search appliance in accordance with one or more embodiments of the .m ention. More particularly, the screens provide examples of selections/options offered via a user interface used in one or more embodiments of the invention It should be appended that the examples provided in these figures are not exhaustive, and that other and/or additional screens and information can be displayed in connection with one or more embodiments of the present invention.
  • FIG 6A 5 ⁇ user login screen is shown in FIG 6A 5 which allows a user to log into and gain access to functionality provided by search appliance 180, in accordance with various embodiments of the present invention, for example, after successfully logging in, a user can be presented with a screen as shown in FIG 6B, which provides a number of options for indexing configuration Tt should be apparent that die options shown m FIG 6B are examples of indexing configuration options, and are not meant to limit or exclude other options that might be provided with one or more embodiments of the present invention.
  • FIG. 6B One of the options shown in FIG. 6B is the "Monitor Indexing” option, which provides a ⁇ lew the status of an indexing operation, start an indexing operation or stop an indexing operation.
  • FIG. 6H illustrates a screen which includes information showing the status of a indexing operation m progress For example, the start, end and elapsed times associated with an indexing operation can be displayed. In addition, information related to a pipelined indexing operation can be monitored using the "Monitoring Indexing" option. It is also possible to terminate an indexing operation.
  • Selection of the "Schedule Indexing” option in FIG. 6B provides the ability to schedule an indexing operation to automatically begin at the designated time.
  • FIG. 61 shows a sample screen displayed in response to selection of the "Schedule Indexing” option, wherein day of the week and start time can be specified for an indexing operation.
  • FIG. 6B 5 the "Define Searchable Locations" option selection provides the ability to define location that are to be indexed, and thus from where search results may be obtained.
  • FIG. 6D and FIG. 6G illustrate display screens responsive to selection of the "Define Searchable Locations" option.
  • the "Choose Document Types" option allows a user to select the types of documents that are to be indexed in an indexing operation.
  • the scope of a search as well as the search results can be indirectly identified using this option.
  • 6C provides an example of a screen displayed in response to selection of the "Choose
  • Document Types As illustrated by the sample selections shown in FIG. 6C, examples of document types include electronic mail, generic text, presentation, publication and spreadsheet. In addition, as illustrated, it is also possible to specify document type by the application used to generate the document.
  • the "Set Operational Parameters" option shown in FIG. 6B allows a user to set parameters associated with the operation of search appliance 180.
  • FIG, 6J provides an example of a screen displayed in response to selection of the "Set Operational Parameters" option. For example, a maximum number of documents indexed from searchable locations can be specified, as well as a level of messages to be logged during operation of search appliance 180, e.g., during a search or indexing operation.
  • FIG. 6K illustrates an example of a help screen displayed in response to selection of a help option.
  • help can be obtained for search appliance 180, and/or contents of a log file can be displayed.
  • FIG. 6L provides an example of a screen in which a search is entered according to one or more embodiments of the invention.
  • FIG. 6M and FIG 6N provide examples of results of a search, using keywords "alan”, “larry”, “presentation” and “publication”, conducted using search appliance 180, m accordance with one or more embodiments of the present invention.
  • FIG. 6N the contents of a document uncovered in a search can be displayed.
  • FIG. 6O shows examples of options which can be used to perform "Users Administration” operations, such as "Add User”, “Change User Password”, “Change User Permissions”, “Remove User”, “Add Groups”, and “Remove Groups”.
  • FIG. 7, which includes FIG. 7A to FIG. Ti, provides illustrative examples of screens from a user interface used in configuration operations for, and/or associated with, search appliance 180 in accordance with one or more embodiments of the present invention. It should be apparent that the examples provided in these figures are not exhaustive, and that other and/or additional screens and information can be displayed in connection with one or more embodiments of the present invention.
  • FIG. 7A depicts a login screen, in which a user can enter a username and password to gain access to some or all of the remaining portions of the user interface.
  • the screen shown in FIG. 7B can be displayed to allow the user to select between "Network & Internet Connections", “Network File Sharing & Security” and "Search Appliance File Sharing".
  • the "Network & Internet Connections" option can be used to configure search appliance 180 for a specific computer network, in order for the search appliance 180 to communicate with other computers on the network and/or the Internet.
  • FIG. 7C to FIG. 7G provide examples of screens that can be displayed in response to selection of this option.
  • FIG. 7C can be used to specify host and domain names associated with search appliance 180.
  • FIG. 7D provides the option to either manually or automatically discover the IP settings for search appliance 180.
  • the IP settings corresponding to an instance of search appliance 180 can be established automatically using a LIDP client/ server model.
  • FIG. 7E In a case that manual configuration of the IP settings of a search appliance 180 is selected, a screen such as that shown m FIG. 7E can be displayed, to allow a user to enter an IP address, subnet mask, and default gateway for search appliance 180.
  • FIG. 7F can be used to enter IP addresses corresponding to primary and secondary domain name servers which will assist search appliance 180 in obtaining network domain names-
  • FIG. 7G provides an example of a screen displayed at the successful completion of the manual configuration of IP setting for search appliance 180.
  • [000198 ⁇ ⁇ screen such as that shown in FIG. 7H can be displayed in response to selection of the "Network File Sharing & Security" option given in FIG. 7B.
  • a workgroup and domain for search appliance 180 can be identified.
  • FIG. 7J provide the ability to specif ⁇ enhanced file sharing features for search appliance 180, e.g., use of local master browsing Search appliance 180 can communicate via using encrypted transmissions based on options provided in the screen shown in FIG. 7K.
  • FIG. 7J provide the ability to specif ⁇ enhanced file sharing features for search appliance 180, e.g., use of local master browsing Search appliance 180 can communicate via using encrypted transmissions based on options provided in the screen shown in FIG. 7K.
  • FIG. 7M to FIG. 7R provide examples of screens containing options to "mount" file shares, for purposes of indexing and searching using search appliance 180.
  • FIG. 7O and FIG. 7P illustrate a screen, bottom and top, respectively, which lists shared resources obtained by search appliance 180 browsing network 120. The file system volumes that are to be mounted can be selected using this screen.
  • FIG. 7Q provides a screen containing a listing of file system volumes confirming the selections made using the screen shown in FIG. 7O and FIG. 7P
  • the screen shown in FIG. 7R provides a status of the mounting opeiation.
  • FIG. 7S provides an example of a maintenance screen, which can be used to determine the status of updates, for example, that have alread ) been or should be installed on search appliance 180.
  • FIG. 7T provides an example of a log displayed in response to selection of the "View Message Log" option of FIG. 6K.
  • FIG. 7U to FIG. 7Y illustrate screens related to various system-level options, e.g., security and restarts, as well as some help topics.
  • the present invention provides an apparatus and method for the broad application of indexing, locating and retrieving desired information in an efficient and effective manner.
  • the illustrated embodiments are exemplar) embodiments only, and are not intended to limit the scope, applicability, or configuration of the present invention in any way. Rather, the foregoing detailed description provides those skilled m the art with a convenient road map for implementing the exemplary embodiments of the present invention. Accordingly, it should be understood that various changes may be made m the function and arrangement of elements described in the various exemplary embodiments without departing from the spirit and scope of the present invention as set forth in the appended claims.

Abstract

Dispositif et procédé d'indexation et de recherche d'information en réseau fournissant l'accès, y compris un accès d'indexation et de recherche, à l'information hébergée sur un plusieurs réseaux intranet, sur l'Internet, ou sur les deux types de réseaux. Le dispositif considéré, également dénommé dans l'invention dispositif ou appareil de recherche de réseau, et le procédé décrit, offrent des fonctions de configuration, d'indexation et de recherche pour faciliter la recherche et l'extraction d'information en réseau.
PCT/US2007/066969 2006-04-19 2007-04-19 Système et procédé d'identification de ressources partagées sur un réseau WO2007121490A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US79343106P 2006-04-19 2006-04-19
US60/793,431 2006-04-19

Publications (2)

Publication Number Publication Date
WO2007121490A2 true WO2007121490A2 (fr) 2007-10-25
WO2007121490A3 WO2007121490A3 (fr) 2008-11-27

Family

ID=38610472

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/066969 WO2007121490A2 (fr) 2006-04-19 2007-04-19 Système et procédé d'identification de ressources partagées sur un réseau

Country Status (1)

Country Link
WO (1) WO2007121490A2 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2606305C2 (ru) * 2014-07-23 2017-01-10 Сяоми Инк. Способ и устройство для совместного использования ресурсов
CN110430043A (zh) * 2019-07-05 2019-11-08 视联动力信息技术股份有限公司 一种认证方法、系统及装置和存储介质
WO2023211571A1 (fr) * 2022-04-27 2023-11-02 Microsoft Technology Licensing, Llc Procédé et système de fourniture d'accès à des documents stockés dans des supports de stockage personnels

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030078987A1 (en) * 2001-10-24 2003-04-24 Oleg Serebrennikov Navigating network communications resources based on telephone-number metadata
US20040205244A1 (en) * 2003-02-14 2004-10-14 Marsico Robert G. Network device management

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030078987A1 (en) * 2001-10-24 2003-04-24 Oleg Serebrennikov Navigating network communications resources based on telephone-number metadata
US20040205244A1 (en) * 2003-02-14 2004-10-14 Marsico Robert G. Network device management

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2606305C2 (ru) * 2014-07-23 2017-01-10 Сяоми Инк. Способ и устройство для совместного использования ресурсов
CN110430043A (zh) * 2019-07-05 2019-11-08 视联动力信息技术股份有限公司 一种认证方法、系统及装置和存储介质
CN110430043B (zh) * 2019-07-05 2022-11-08 视联动力信息技术股份有限公司 一种认证方法、系统及装置和存储介质
WO2023211571A1 (fr) * 2022-04-27 2023-11-02 Microsoft Technology Licensing, Llc Procédé et système de fourniture d'accès à des documents stockés dans des supports de stockage personnels

Also Published As

Publication number Publication date
WO2007121490A3 (fr) 2008-11-27

Similar Documents

Publication Publication Date Title
US20070073894A1 (en) Networked information indexing and search apparatus and method
US7440964B2 (en) Method, device and software for querying and presenting search results
US9367637B2 (en) System and method for searching a bookmark and tag database for relevant bookmarks
JP5368554B2 (ja) モバイルサーチによるモバイルアプリケーション発見
US5907680A (en) Client-side, server-side and collaborative spell check of URL's
US8209317B2 (en) Method and apparatus for reconstructing a search query
US7293012B1 (en) Friendly URLs
WO2008070415A2 (fr) Appareil et procédé de collecte d'informations réparties dans un réseau
US20020178394A1 (en) System for processing at least partially structured data
CA2713932C (fr) Generation d'expression booleenne automatisee permettant la recherche et l'indexage informatises
KR100463208B1 (ko) 로컬 네임 서버 중심의 내부 도메인 시스템 구현 방법
US8074226B2 (en) Systems and methods for switching internet contexts without process shutdown
US7467136B2 (en) System and method for persistent query information retrieval
US20050154719A1 (en) Search and query operations in a dynamic composition of help information for an aggregation of applications
US20060218208A1 (en) Computer system, storage server, search server, client device, and search method
WO2007121490A2 (fr) Système et procédé d'identification de ressources partagées sur un réseau
JP2004110080A (ja) リアルネームによるインターネット上コンピューターネットワーク接続方法及びそのコンピューターネットワークシステム
US20030046276A1 (en) System and method for modular data search with database text extenders
US9032193B2 (en) Portable lightweight LDAP directory server and database
JP2000285052A (ja) Url変換方法および装置
US20080046416A1 (en) Dynamic program support links
US20050235197A1 (en) Efficient storage of XML in a directory
CA2537269C (fr) Procede, dispositif et logiciel permettant de demander et de presenter des resultats de recherche
US7779057B2 (en) Method and apparatus for retrieving and sorting entries from a directory
US8745030B2 (en) Fast searching of directories

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07797255

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07797255

Country of ref document: EP

Kind code of ref document: A2