EP1305740A2 - Systeme de noms de domaine international a conversion iterative - Google Patents

Systeme de noms de domaine international a conversion iterative

Info

Publication number
EP1305740A2
EP1305740A2 EP01937676A EP01937676A EP1305740A2 EP 1305740 A2 EP1305740 A2 EP 1305740A2 EP 01937676 A EP01937676 A EP 01937676A EP 01937676 A EP01937676 A EP 01937676A EP 1305740 A2 EP1305740 A2 EP 1305740A2
Authority
EP
European Patent Office
Prior art keywords
recited
domain name
character
expression
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP01937676A
Other languages
German (de)
English (en)
Inventor
Daniel G. Pouzzner
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nu Domain
Original Assignee
Nu Domain
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nu Domain filed Critical Nu Domain
Publication of EP1305740A2 publication Critical patent/EP1305740A2/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/30Managing network names, e.g. use of aliases or nicknames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/30Types of network names
    • H04L2101/32Types of network names containing non-Latin characters, e.g. Chinese domain names

Definitions

  • the technology described here generally relates to data processing, such as electrical computers and digital processing systems with computer-to- computer data addressing, including an internationalized Domain Name System.
  • the modern Internet provides easy access to variety of information "resources” using a uniform naming syntax that works with various schemes for accessing different types of resources.
  • Each of these resources is specified using a universal resource identifier ("URI") consisting of an access scheme “identifier” ending in a colon, followed by a "path” for locating the resource on a specific computer.
  • URI universal resource identifier
  • the access schemes are typically defined by standardized “protocols” while the path includes the "name” of the machine that is providing, or “hosting,” the resource. a. Protocols and RFCs
  • the term "protocol” is generally used to refer to a procedure for regulating the transmission of data between computers.
  • ISOC Internet Society
  • IETF Internet Engineering Task Force
  • RFCs Requests for Comments
  • These RFCs form a series of notes, starting from 1969, that discuss various aspects of computer communication including networking protocols, procedures, programs, and other concepts.
  • the Task Force's "RFC Editor” maintains a master file of all RFCs (at www.rfc- editor.org) that can be searched and downloaded over the Internet at no charge.
  • Every RFC is assigned an index number by which it can be retrieved.
  • RFC 2026 entitled “The Internet Standards Process - Revision 3,” documents the process used for the standardization of protocols.
  • STD the additional label
  • STD 1 currently RFC 2800, is periodically updated to give the latest RFC number for all protocols and to indicate whether that RFC has been adopted as a standard.
  • STD 5 specifies the Internet Protocol, or "IP,” upon which all other protocols in the Internet suite are based.
  • IP Internet Protocol
  • the fundamentals of IP are set forth in the RFC 791 portion of STD 5.
  • each computer on the Internet known as a "host” machine
  • IP address that uniquely identifies it from all other machines on the Internet.
  • Data which is to be transmitted (for example, an e-mail message or Web page) is first divided into chunks, called “packets,” which each contain the sender's and receiver's Internet addresses.
  • Each of these packets is then consecutively sent to a "gateway" computer, often referred to as an IP router, or simply a “router,” that reads the destination address and then forwards the packet to an adjacent computer, which again forwards the packet to another computer.
  • a "gateway” computer often referred to as an IP router, or simply a “router”
  • IP router or simply a “router”
  • the last computer recognizes the packet as belonging to a computer within its immediate neighborhood, or "domain,” it forwards the packet directly to the machine in the address.
  • TCP Transmission Control Protocol
  • IPv4 Internet Protocol Version 4, or simply "IPv4,” discussed in RFCs 1812 and 2644
  • IPv4 Internet Protocol Version 4, or simply "IPv4”
  • the numbers in each segment are limited to 8 bits and thus range from zero to 255.
  • 199.103.194.129, 207.106.7.7, 209.124.64.11 , and 207.230.32.23 are IP addresses for various machines that are used by an organization called .NU Domain.
  • IPv6 a newer version of IP, called “IPv6” (discussed in RFCs 2373 and 2463), is currently being implemented in order to allow numerical IP address segments as long as 128 bits.
  • domain name space set forth in RFC 1591 , entitled “Domain Name System Structure and Delegation.”
  • Each area of the Internet is identified by a "domain name” which consists of that part of the domain name space that is at or below the portion of the hierarchy specified by the name.
  • An area is then referred to as a "subdomain” of another domain if it is contained within that domain.
  • a domain consists of a set of locations that are logically related. At the top of this hierarchy are the now-familiar generic top level domains, or "gTLDs," - .com, .edu, .gov, .ext, .mil, .net, .org, and .int. There are also top level “country” domains based upon two-character abbreviations for each country. A second level is then added to each top level domain name in order to identify a particular area or machine in that top level domain. For example, the ".nu" top-level domain is set aside for the pacific island of Niue and "whats.nu" identifies a host machine in the .nu domain. That particular machine is operated by the Network Information Center, or "NIC,” for the .nu domain which acts as the registrar for all second level names in the domain.
  • NIC Network Information Center
  • URI uniform resource locator
  • RFC 1738 The most common form of URI is the uniform resource locator, or "URL,” described in RFC 1738 and others.
  • a network resource is located in the domain name space by a string of characters forming one or more "labels" (each up to a maximum of 63 characters) where each label is separated by a period and the last label is a TLD identifier.
  • labels each up to a maximum of 63 characters
  • TLD identifier TLD identifier
  • the currently preferred convention for these labels is set forth in RFCs 952 and 1123 which require that labels include only the numerals 0 - 9, the letters A - Z, and the hyphen character. These characters are therefore referred to as "DNS-legal characters.”
  • no blank or space characters are permitted and no distinction is made between upper and lower case characters.
  • a domain name that includes only DNS-legal characters and satisfies any other syntax requirements of the DNS protocol is said to be a "DNS-legal name” or "fully-qualified name.”
  • DNS-legal name or "fully-qualified name.”
  • the definition of which characters and names are DNS-legal is flexible and expected to change as new domain naming conventions are adopted.
  • Certain of the remaining DNS-illegal characters, such as the "unsafe” or “reserved” characters described in RFCs 1630 and 1738, are particularly troublesome when used in domain names and are sometimes referred to as "unclean" characters.
  • DNS-legal domain names are typically created by transliterating a symbolic name from another language into a DNS-legal name using only the limited group of "Latin" characters for the letters and numbers that are discussed above.
  • many languages do not have an accepted standard for transliteration, there can be several plausible transliterations for any non-English, symbolic host name.
  • a meaningful domain name can be created through transliteration, there is no guarantee that a casual user will be able to easily remember that name, or spell the transliteration using only the English alphabet. Consequently, the requirement for using only these Latin letters and numbers can be quite burdensome, especially,for inexperienced users.
  • DNS Domain Name System
  • the current DNS protocol provides for a distributed database for mapping the names of host machines to their IP addresses.
  • DNS concept is therefore sometimes referred to as a "distributed name space" since the entire database no longer resides on just a single host computer in a
  • the DNS protocol thus allows a program running on one host machine to perform the association of a symbolic host name with a numeric IP address (and/or other information) without the need for all machines to have a complete and accurate database of all names and addresses, or the need for a single machine to receive all requests for information.
  • the first software implementation of the DNS protocol was written by Paul Mockapetris.
  • BIND Berkeley Internet Name Domain
  • BIND and other implementations of the DNS protocol typically include two major components called a "name server” and a “resolver.”
  • a server is a computer or program which provides some service to other "client” computers or programs.
  • the connection between client and server is normally by means of message passing, often over a network, and uses some protocol to encode the client's requests and the server's responses.
  • the server may run continuously as a “daemon,” waiting for requests to arrive, or it may be invoked by some higher level daemon which controls a number of specific servers (inetd on Unix).
  • daemon generally refers to a program that is not invoked explicitly, but lies dormant waiting for some condition(s) to occur. The idea is that the perpetrator of the condition need not be aware that a daemon is lurking, though often a program will commit an action only because it knows that it will implicitly invoke a daemon.
  • Unix-based systems typically run many daemons, chiefly to handle requests for services from other hosts on a network. Most of these are now started as required by a single real daemon, "inetd,” rather than running continuously.
  • This particular Berkeley daemon program also known as "netd,” listens for connection requests or messages for certain ports and starts server programs to perform the services associated with those ports. Daemon and “demon" are often used interchangeably
  • the most common example hardware server is a file server which has a local disk and services requests from remote clients to read and write files on that disk, often using Sun's Network File System (NFS) protocol or Novell Netware on IBM PCs.
  • the name server receives DNS protocol queries (i.e., requests for information about a host or other "resource” on the Internet) and returns DNS protocol replies that either contain the answer to the query or a referral to another name server that is more likely to have the desired information.
  • the name server also stores complete information about some portion of the domain name space for which it is authoritative, called a "zone,” including the locations of any name servers for which it has delegated authority for a "subzone.”
  • Resolvers on the other hand, merely obtain resource records from name servers. Normally they do so at the behest of an application, like a browser, but they may also do so as part of their own operation.
  • the resolver is typically located on the same machine as the program that requests the resolver's services. However, the resolver can often consult name servers that are running on other host machines.
  • Information about the resources in a particular zone is stored on the name server in the form of "resource records" in a "zone data file.”
  • Each record in the zone data file data is typically represented by one "line,” or row, that contains several "fields,” or columns. Certain fields may also be designated as “key fields” which are then indexed to speed the lookup of unique identifiers, or "keys,” for each record.
  • the set of keys for all records in the database forms an "index.” Multiple indexes may also be built for the zone database.
  • the first column in each resource record contains the "owner" domain name where that resource is found. Other columns contain information concerning the record type, class, and/or other information as set forth in STD 13 and others.
  • a type "A” record would contain a name-to-address mapping with four columns, such as “whats.nu IN A 209.124.64.1 ,” where "whats.nu” is the name of the owner of the Internet (“IN”) host at the indicated numeric IP address ("A").
  • the master file containing these textual records is then highly encoded before being stored on the name server in its encoded form. All of this data can then be transferred between name servers by simply copying the resource records to another name server.
  • the resolver When a user program, such as a web browser, issues a request for a resource record, the resolver formulates a "query" to the local name server. If that name server has fielded a request for the same information within a certain period of time (to prevent passing old information), it will locate, or "lookup," the information in its own memory (if possible) and send a reply.
  • the lookup is typically a key value retrieval operation; however, it may also be completed using a variety of other methods such as on-the-fly computation, hashing and/or conversion algorithms, and various other indexing techniques.
  • the resolver will then attempt to "solve" the problem by asking the second server for the same information. If that does not work, the resolver will ask yet another server until it finds one that knows the answer to its query, or exceeds a time limit for fulfilling the request and issues an error message. i. Wildcard Resource Records
  • wildcard resource records that control the response when the server is unable to answer certain kinds of queries.
  • wildcard records can be thought of as instructions for synthesizing a new resource record under certain conditions. When those conditions are met, the name server creates a resource record with an owner name equal to the query name and with contents taken from the wildcard record.
  • Wildcard records are typically designated in the master zone file by owner names starting with an asterisk (*). This facility is most often used to create a zone that will be used to forward mail from the Internet to some other mail system. The general idea is that any name in that is not already in a certain * portion of the zone files will be presumed to exist nonetheless. For example, adding a wildcard resource record such as " * .whats.nu IN MX mail.nic.nu” will pause mail for the whats.nu domain to be forwarded to the mail server at the network information center for the .nu ccTLD, unless other resource records for the whats.nu subdomain are available in the zone files.
  • * whats.nu IN MX mail.nic.nu
  • HTTP Hyper Text Transfer Protocol
  • the HTTP server After receiving and interpreting an HTTP request message from a web- browser, the HTTP server will typically respond with a "full-response message" in which the first line is referred to as the "status-line.”
  • the status-line contains a three-digit "status-code” element and a short textual description of the code. For example, if the action that was requested in the query was successfully completed, then the response message includes a "2XX-series" status code. If the server has not found anything that matches the request, then the response will include a 4XX-series "client error” status code (such as a "404" status code) and a descriptive error message such as "file not found.”
  • the response message might include a "302" status code with the new address.
  • the requesting client will then redirect itself to the new address.
  • the redirection may be automatic or it may require the user to manually click on a hyperlink before receiving information from the temporary HTTP server.
  • HTTP also allows for the identifications of certain types of what it refers to as "character sets" by case-insensitive tokens.
  • the complete set of tokens are defined by the IANA Character Set registry. However, because that registry does not define a single, consistent token for each character set, RFC 1945 defines "preferred names" for those character sets most likely to be used with HTTP entities. These character sets include those registered by RFC 1521- the US- ASCII and ISO-8859 character sets - and other names specifically recommended for use within MIME charset parameters.
  • the "charset" parameter in the HTTP Protocol is used with some media types to define the character set of the data.
  • media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.
  • Data in character sets other than "ISO-8859-1" or its subsets must be labeled with an appropriate charset value in order to be consistently interpreted by the recipient.
  • RFC 1945 points out that many current HTTP servers provide data using charsets other than "ISO-8859-1" without proper labeling. This situation reduces interoperability and is not recommended. To compensate for this, some HTTP user agents provide a configuration option to allow the user to change the default interpretation of the media type character set when no charset parameter is given.
  • HTTP also provides for "product tokens" that are used to allow communicating applications to identify themselves with an optional slash and version designator. Most fields using product tokens also allow subproducts which form a significant part of the application to be listed, separated by whitespace. By convention, the products are listed in order of their significance for identifying the application.
  • the Simple Mail Transfer Protocol (“SMTP,” STD 10) is based on a model of communication that is somewhat similar to HTTP in that it provides for the transport of mail across networks in what is referred to as "SMTP mail relaying.”
  • SMTP Simple Mail Transfer Protocol
  • mail can be transferred on the same network or to some other network via a relay or gateway that is accessible to both networks.
  • This transmission normally occurs directly from the sending user's host to the receiving user's host, or via one or more relay SMTP servers.
  • an intermediate host that acts as either an SMTP relay or as a gateway into some other transmission environment that is usually selected through the use of the Mail exchanger ("MX") mechanism in DNS that is discussed above with regard to wildcard resource records.
  • MX Mail exchanger
  • a mail message can pass through a number of intermediate relay or gateway hosts on its path from sender to ultimate recipient.
  • a name server Before mapping a host name to its numeric IP address, a name server must first decipher the binary code representing the resolver query. Part of this query will include a binary representation of a string of characters that make up the symbolic host name. In order to describe this character decoding process, this document follows the character encoding model set forth in the "Unicode Technical Report #17" available from the Unicode Consortium in Mountain View, California (and at www.unicode.org), which is hereby incorporated by reference into this document.
  • a "character” can be any member of a set of elements that is used for organization, control, or representation of data.
  • a character is usually thought of, however, as the smallest component of a written language that has semantic value.
  • Each character comes in many "forms" which can be distinguished by width, height, size, position, rotation, case, font, italicization, underlining, or other similar typographical nuance.
  • the collection of these symbols for a particular language, or languages, is often referred to as a "script.” Characters are typically defined in a script by specifying the names of characters and a sample presentation of the characters in visible form referred to as a "glyph.”
  • these characters When used to express host names, these characters usually take the form of a printable symbol having phonetic or pictographic meaning that may also form part of a word of text, depict a numeral, and/or express grammatical punctuation.
  • Internet host names are conventionally formed by a string of characters that are selected from the "DNS-legal" character set which is limited to a portion of the Latin script including the letters of the alphabet used in the U.S., the numerals in the decimal number system, and certain special symbols such as the hyphen ("-DNS-legal" character set which is limited to a portion of the Latin script including the letters of the alphabet used in the U.S., the numerals in the decimal number system, and certain special symbols such as the hyphen ("-DNS-legal"
  • the CCS mapping is typically defined by a table providing one-to-one correspondence between values and characters arranged in "code positions,” inside the table.
  • the code positions are then defined by a numerical index, called a "code point” or “scalar value,” that may also implicitly define the code value.
  • code points are then defined by a numerical index, called a "code point” or “scalar value,” that may also implicitly define the code value.
  • Many coded character sets also have code positions that are designated for "control functions” other than displaying text. Some code positions may also be reserved for future characters and/or control functions.
  • Various aspects of coded character sets are sometimes loosely referred to as "character encodings,” “coded character repertoires,” “character set definitions,” “code pages,” “character sets,” “charsets,” or “code sets.”
  • ISO 10646, US-ASCII, and ISO-8859 are generally accepted standards that define coded character sets.
  • coded character sets For example, in the ISO 10646 coded character set, the equivalent decimal code values for "a,” “I,” and “a” are 97, 33, and 228, respectively. Further information about various character names, or “mnemonics,” and character sets is available in RFC 1345.
  • the "character encoding form,” or “CEF,” defines the size of the "code unit” and the number of code units that are used to represent each character.
  • the encoding form thus defines how the values from the CCS are converted into sequences of a base datatype. Since most character encoding forms use a single 7-bit (“septet”) or 8-bit (“octet”) code unit for each character, the CEF is often implicitly understood. However, the use of multiple code units and/or variable length code units for each character is becoming more common.
  • the "character encoding scheme,” or “CES,” is a mapping of code units into serialized byte sequences that are dictated by the computer architecture being used. Such “serialization schemes” define the byte-order for multiple code unit CEFs and any switching between different CCSs.
  • the UTF-8 encoding scheme (discussed below) applies only to the ISO 10646 coded character set while the ISO 2022 encoding scheme can be applied to a variety of coded character sets.
  • the character encoding form maps code points to code units, while the character encoding scheme maps code units to bytes.
  • character map The complete mapping of a character string to a sequence of bytes is referred to here as a "character map” or "CM.”
  • a simple character map thus implicitly includes a CCS mapping from characters to code values, a CEF mapping defining the width and number of code units for each character, and a CES mapping from code units into a series of bytes.
  • the use of such character maps is also referred to here as character mapping, character map encoding, "character encoding,” or simply “encoding.”
  • CEF is implicitly defined to be 8-bits long (as in "Requirements of Internationalized Domain Names” by James Seng discussed above)
  • a combination of one or more CCSs with a CES results in a "charset” character map for converting a sequence of octets into a sequence of characters.
  • the names of various such charsets are registered with the Internet Assigned Numbers Authority (“IANA") using the procedures set forth in RFC 2278.
  • IANA Character Set Register is available on the Internet (at http://www.isi.edu/in-notes/iana/assignments/character-setsauthority).
  • Morse Code is based on combinations of two possible values, either a dot or a dash, for each character in the set defined by the letters in the English alphabet and certain punctuation marks. However, unlike the character encoding form used by many modern binary computers where the length of a code unit is typically fixed at seven or eight bits, the number of dots and/or dashes representing each character in Morse Code can vary from one to six.
  • the character encoding scheme calls for each dash to be encoded as a signal which is three times as long as the signal for a dot. The individual characters are then separated by a time interval equivalent to one dot, while the space between the individual characters of a word is separated by an interval equivalent to three dots, and the words in message are separated by an interval equivalent to six dots.
  • Baudot's code Messages using Baudot's code were printed on narrow paper tapes by operators using a special five-key keypad. Unlike the variable-length encoding form of the Morse Code, every character in the Baudot Code was represented by a unique group of five binary digits. Since there were insufficient combinations of fixed- length, 5-bit code units all of the letters of the Latin alphabet, Arabic numerals, and punctuation marks, Baudot also added a "locking-shift” encoding scheme (similar to the shift key on a manual typewriter) to essentially double the number of characters that could be transmitted. These latter "control characters” were encoded as marks pr spaces on the tape representing a "current on” or “current off” condition in the transmitter.
  • ASA American National Standards Institute
  • ASCII American Standard Code for Information Interchange
  • ASCII Code was ultimately adopted by all U.S. computer manufacturers. Since U.S. vendors dominated the world market for computers at the time, ASCII Code also became the de facto international standard. It therefore became necessary to further modify the ASCII character set for use with other languages. Since there are now many national variants of ASCII, the original version of the ASCII coded character set is often referred to as "US-ASCII," or by the name of its formal specification, ANSI X3.4-1986, which is incorporated herein by reference.
  • ISO 646 basically called for the ASCII character set and character encoding scheme to be used except for ten character positions which were left open for "national variant” characters. The default characters for those ten positions were then specified in a version of the recommendation known as the International Reference Version, or "IRV.” US-ASCII was also used as the basis for creating various other 7-bit character maps for languages that did not employ the Latin alphabet, such as Arabic, Greek, and Japanese. At least 180 character codes based on similar extensions of ASCII have now been registered with the ISO.
  • ISO 8859 is a multi-part specification using an 8-bit encoding form that was designed for the data processing needs of Western and Eastern Europe.
  • ISO 8859 "family" of character sets extends the ASCII character set in different ways, with different special characters for various languages and cultures.
  • ISO 8859-1 (so called "Latin-1") contains the ASCII character set and a collection of additional characters needed for the languages of Western and Northern Europe, while ISO 8859-2 (“Latin-2”) is constructed for languages of Central and Eastern Europe.
  • ISO 8859 is similar to ASCII in that code positions 0 - 127 contain the same characters as in ASCII, while positions 128 - 159 are reserved for control characters, and positions 160 - 255 are used differently in each part of the ISO 8859 family.
  • ISO 10646 is one of the latest attempts to establish a standard multilingual character map and is often referred to as a Universal Character Set ("UCS").
  • UCS Universal Character Set
  • "Unicode” is a particular UCS standard specified by the Unicode Consortium in Mountain View, California (and at www.unicode.org) to define a character set that is compatible with ISO 10646.
  • the Unicode Standard corresponds to the Basic Multilingual Plane, or "BMP,” of ISO 10646, or "ISO-10646-1.”
  • BMP Basic Multilingual Plane
  • ISO 10646 or "ISO-10646-1”
  • ISO-10646 ISO-10646-1
  • the Unicode Standard provides for text elements to be encoded as composite character sequences which, when presented, are rendered together. For example, "a” may be encoded as a "composite character” by rendering “a” and “ ⁇ " together.
  • Such "composed character sequences” are typically made up of a base letter, which occupies a single space, with one or more formatting "marks.”
  • a combining character whose positioning in presentation depends on the upon its base character is referred to as a "nonspacing mark” while all other combining characters are referred to as “spacing marks.”
  • Certain characters may also be encoded as "precomposed characters" represented by a single code value rather than two or more code values which are combined during rendering.
  • the character “ ⁇ ” can be encoded either as the single code value U+00FC “ ⁇ ” or as the base character U+0075 “u” followed by the non-spacing character U+0308 " " ".
  • the Unicode Standard offers such precomposed characters as an alternative composed character sequences so as to retain compatibility with, and correspondence to, established standards, such as Latin 1 , that include many precomposed characters such as " ⁇ " and "n.”
  • the precomposed characters that are defined by Unicode are therefore sometimes referred to as "compatibility characters.”
  • One possible encoding for this character is the precomposed LATIN CAPITAL LETTER A WITH RING ABOVE (Unicode Code Value U+00C5, in hexadecimal notation).
  • a second encoding is the decomposed LATIN CAPITAL LETTER A (U+0041 ) followed by the COMBINING RING ABOVE (U+030A) while a the third alternate encoding for this character is the ANGSTROM SIGN (U+212B).
  • the equivalence between the first and third encodings is a singleton equivalence while the equivalence between the first and second is a precomposed/decomposed equivalence.
  • the Unicode Standard more specifically defines two types of equivalencies between characters: "canonical” equivalence and "compatibility” equivalence.
  • Canonical equivalence is the fundamental equivalency between characters or sequences of characters that are indistinguishable to users when correctly rendered in text.
  • singleton equivalence is one type of canonical equivalence.
  • Canonical equivalence should also not be confused with the "aliasing" of canonical host names that is provided in many versions of BIND where, when a name server finds a CNAME record, it simply replaces the alias with a canonical name (in a process that is unrelated to Unicode canonical mapping) before looking up the appropriate resource record.
  • canonical equivalence is actually a subset of "compatibility equivalence.”
  • the Unicode standard also provides numerous "compatibility characters" that are taken from other standards, but are really just nominal Unicode characters that are displayed in a different format.
  • a compatibility character may be equivalent to a nominal Unicode character which is displayed in a certain font. Consequently, the visual representation of these compatibility characters is only a subset of the many possible visual representations of the Unicode nominal character.
  • Compatibility equivalence then occurs when a character is a visually- distinguishable variant of a nominal character such as a font variant, superscript, or subscript.
  • the nominal canonical mappings are essentially a subset of the compatibility mappings.
  • replacing a character by its compatibility equivalent may result in the loss of certain information, such as formatting information, about its textual representation. Consequently, compatibility mappings generally provide the correct equivalence for only searching and sorting, rather than transcoding.
  • Normalization Form D so-called “canonical decomposition” or “decomposed normalization,” is the process of taking a string, recursively replacing composite characters using the Unicode canonical decomposition mappings, and then putting the result in "canonical order.”
  • a string is put into canonical order by repeatedly replacing any exchangeable pair by the pair in reversed order. When there are no remaining exchangeable pairs, then the string is in canonical order.
  • a decomposition that results from recursively applying the "canonical mappings" found in the Unicode Standard until no character can be further decomposed (and any nonspacing marks have been reordered) is referred to in the Unicode Standard as a "canonical decomposition.”
  • so-called “canonical equivalence” is the fundamental equivalency between characters, or sequences of characters, in the Unicode standard.
  • NFKD Normalization Form KD
  • Unicode encodes only "plain text" without any formatting information
  • performing a "compatibility" decomposition on a compatibility character can remove any formatting information and thus prevent the character from being re-composed, or "round-trip converted,” in a reversal of the decomposition process. Therefore, canonical decomposition is sometimes considered to be a subset of compatibility decomposition because it does not remove formatting information.
  • Normalization Forms D and KD are normalizations to decomposed characters which retain canonical or compatibility equivalence, respectively, with the original unnormalized text.
  • Normalization Forms C (“NFC”) and KC (“NFKC”), on the other hand, provide normalization to composite characters and are a bit more complicated because they further require canonical composition. More specifically, NFC uses canonicai decomposition followed by canonical composition while NFKC uses compatibility decomposition followed by canonical composition.
  • canonical composition is the composing of the previously decomposed string according to the Unicode canonical mappings by successively composing each unblocked character with the last "starter.”
  • a character is a starter if it is defined in Unicode with a combining class of zero, meaning that it acts as a base letter for determining how it will interact typographically with other combining characters.
  • a character is blocked from the starter if, and only if, there is another starter, or another character with the same class, between the starter and the character.
  • this document advises that Internet protocols should specify 1 ) that comparison should be carried out purely binary (after it has been made sure, where necessary, that the texts to be compared are in the same character encoding); 2) that any kind of text, and in particular identifier-like protocol elements, should be sent normalized to Normalization Form C; 3) that in case comparison fails due to a difference in text normalization, the originator of the non-normalized text is responsible for the failure; 4) that in case implementers are aware of the fact that their underlying infrastructure produces non-normalized text, they should take care to do the necessary tests and if necessary the actual normalization by themselves; and 5) that in the case of creation of identifiers, and in particular if this creation is comparatively infrequent (e.g. newsgroup names, domain names), and happens in a rather centralized manner, explicit checks for normalization should be required by the protocol specification.
  • Character identification is also influenced by case.
  • case is derived from use of moveable type during the Middle Ages when the letters for each font were stored in a box with two sections (or “cases") and where the "uppercase” was for the capital letters and the “lowercase” was for the small letters.
  • Unicode Technical Report #21 entitled “Case Mappings” (available at http://www.unicode.org/unicode/reports/tr21/) and incorporated by reference here, discusses various case operations such as case conversion, case detection, and caseless matching.
  • downcasing is used here to refer converting each character in a string to its lowercase.
  • Case-folding is the process of mapping strings to a normalized form where case differences are erased. Case-folding allows for fast caseless matches in lookups.
  • caseless matching itself is only an approximation to the language-specific rules governing the strength of comparisons discussed in Unicode Technical Standard #10, entitled “Unicode Collation Algorithm,” also incorporated herein by reference (and available at http://www.unicode.org/unicode/reports/tr10/). This latter Report describes how to compare two Unicode strings while remaining conformant to the requirements of the Unicode Standard.
  • canonicalized strings which have received canonical and/or compatibility decomposition, and have been downcased, are referred to in this document here as being "canonicalized.”
  • An expression containing only such canonicalized strings is essentially in the simplest and most significant form to which the expression may be reduced without loss of generality.
  • Two canonicalized strings may therefore be compared with a very high degree of specificity as generally discussed in D ⁇ erst, "Requirements for String Identity Matching and String Indexing," published by the World Wide Web Consortium on July 10, 1998 (available at http://www.w3.org/tr/wd-charreq) and incorporated herein by reference.
  • character maps define not only the identity of each character in a character set and its corresponding numeric value, but also how this value is mapped, or "encoded,” into bits.
  • the Unicode Standard endorses at least two different character encoding schemes for use with the ISO 10646 character set. These so-called “transformation formats” are referred to as “UTF-8” and “UTF-16.” In essence, these character encoding schemes are algorithms for turning code points, or "scalar values,” into the actual bits that are used by the computer.
  • UTF-8 uses an 8-bit encoding form that is serialized to a sequence of from one to four bytes while UTF-16 uses a 16-bit encoding form that is sequenced as a series of two bytes.
  • UTF-8 has certain advantages in that the characters in Unicode which correspond to an ASCII character have the same code values as in ASCII. Consequently, Unicode characters that are encoded under the UTF-8 character encoding scheme can be used with most existing software.
  • a proposal entitled "Using the UTF-8 Character Set in the Domain Name System,” was published by Stuart Kwan and James Gilroy on July 2000 (at http://search.ietf.org/intemet-drafts/draft-skwan-utf8-dns-04.txt) and is incorporated herein by reference.
  • UTF-8 is essentially a transformation algorithm that accepts an integer that may range from zero up to 2,147,483,647 (2 31 - 1 ) and outputs a string of octets that represents that integer.
  • a decoder accepts a string generated by a UTF-8 encoding and outputs the integer that was encoded by the string. The encoder and decoder are typically iterated in order to transform strings of characters.
  • UTF-8 has the characteristic that any integer at or below decimal 127
  • DNS protocol or any other protocol in the Internet suite.
  • most DNS implementations (including conventional BIND) follow the "preferred name syntax" in RFC 1034 where domain names are written in a small subset of the 7-bit US-ASCII character set that includes the letters A-Z, digits 0- 9, and the dash.
  • domain names can be stored with arbitrary case, but domain name comparisons must be done in a "case-insensitive" manner.
  • RFC 1958 similarly states that DNS names and protocol elements that are transmitted in text format should be expressed in "case-independent ASCII.” More recently, RFC 2277 has been adopted as the best current practice, "BCP 18," on characters sets and languages and states that new protocols should be able to use the UTF-8 "charset" which consists of the ISO 10646 character set combined with the UTF-8 character encoding scheme. In addition, BCP 18 addresses the use of other character encoding schemes for ISO 10646, such as UTF-16. However, since BCP 18 is merely a suggested practice, and not a requirement, various name servers and other host computers are likely to continue to use incompatible character maps.
  • the recode library contains most code and tables from the portable
  • iconv library written by Bruno Haible and described at http://clisp.cons.org/ ⁇ haible/packages-libiconv.html.
  • the iconv library provides an iconv() implementation, for use on systems which don't have one, or whose implementation cannot convert from/to Unicode. It can convert from any of the listed encodings to any other, through Unicode conversion. It has also some limited support for transliteration. For example, when a character cannot be represented in the target character set, it can be approximated through one or several similarly looking characters. Distribution of the iconv library is available on the Internet (at ftp://ftp.ilog.fr/pub/Users/haible/gnu/libiconv-1.3.tar.gz)
  • Martin D ⁇ erst made a proposal to the Internet International Ad Hoc Committee (at www.iahc.org) entitled "Internationalization of Domain Names," published on June 10, 1996 (at http://www.iahc.org/contrib/draft-Duerst-dns-i18n-00.txt) which suggests a naming scheme that uses DNS-illegal characters and then adds a suffix ("J") to the encoding so as to indicate that the encoded name falls under an entirely new gTLD.
  • J suffix
  • the scheme requires domain names to be appended with the ".idns.apng.org" subdomain name so that a mapping of the name to an IP address will be performed only by the organizations proprietary servers.
  • iBIND i-DNS.net International Inc., of Palo Alto, California (at www.i-dns.net) suggests sending DNS-illegal domain name queries one of nine "iDNS-compatible" servers where the queried domain name is converted to DNS-legal domain name using UTF-5 before the appropriate resource records are looked-up.
  • iClient the transformation is allegedly performed by the client before the query is sent to the server.
  • i-DNS.Net Inc. has informed the IETF that it has applied for one or more patents on related technology including WO00/50966 which is incorporated herein by reference.
  • U.S. Patent No. 6,182,148 B1 issued to Walid Tout on January 30, 2001 for an application filed July 21 , 1999 also discloses a method and system for internationalized domain names which uses the UTF-5 transformation and is also incorporated by reference here.
  • the domain name is converted to a standard format, such as Unicode, and then transformed to "an RFC1035 compliant" format.
  • Redirector information is then appended to the string which identifies the delegation of authoritative root servers and/or domain name servers responsible for the domain name.
  • some form of exact string identity matching might be used to match the character string in the domain name query to a character string in a resource record as discussed in "Requirements for String Identity Matching and String Indexing" by Martin J.
  • Legacy information is indexed by keys that are encoded in a single script and then merged or joined with additional information indexed by keys encoded in multiple additional scripts.
  • the system and method include a domain name system that allows the creation and operation of domain names in a plurality of national encodings and further includes methods for resolving ambiguous encodings.
  • RACE Row-based ASCII Compatible Encoding for IDN
  • ACE ASCII-Compatible Encoding
  • IDNE Internationalized domain names using EDNS (IDNE),” by Paul Hoffman and Marc Blanchet, dated July 11 , 2000, describes an extension mechanism based on EDNS which enables the use of IDN without causing harm to the current DNS. IDNE allegedly enables IDN host names with as many characters as current ASCII-only host names. It also claims to fully support UTF-8 and conforms to the IDN requirements.
  • the DNSII Multilingual Domain Name Protocol by Edmon Chung and David Leung, dated August 25, 2000, describes an extension of the DNS into a multilingual- and symbols-based system with adjustments made on both the client side and the server side.
  • the DNSII protocol is intended to preserve the interoperability, consistency, and simplicity of the original DNS, while being expandable and flexible for the handling of any character or symbol used for the naming of an Internet domain.
  • SACE Simple ASCII Compatible Encoding
  • DNSII Transitional Reflexive ASCII Compatible Encoding (TRACE), by Edmon Chung and David Leung, dated September 2000, discusses a reflexive CNAME process where non-ASCII incoming queries will be automatically CNAMEd to their ASCII counterpart without requiring an actual lookup.
  • the REflexive CNAME (“RENAME”) process is a mechanism that attaches an incoming multilingual name to its ACE counterpart as it enters a name server.
  • BRACE Bi-mode Row-based ASCII-Compatible Encoding for IDN
  • ASCII letters, digits, and hyphens
  • Non-LDH codes in the Unicode string are then encoded using a base-32 mode in which each character of the encoded string represents five bits.
  • Single hyphens are used in the encoded string to indicate mode changes.
  • VDN Virtually Internationalized Domain Names
  • IPTR Internationalized PTR Resource Record
  • Japanese Characters in Multilingual Domain Name Label by Yoshiro Yoneya and Yasuhiro Morishita, dated November 17, 2000, discusses Japanese characters and their canonicalization rules for multilingual domain name labels.
  • UTF-6 - Yet Another ASCII-Compatible Encoding for IDN discusses a transformation method which is an extension of the UTF-5 encoding that is currently deployed as part of the WALID multilingual domain name system implementation.
  • WALID Inc. has informed the IETF that it has applied for one or more patents on related technology including WO/0056035 published on September 21 , 2000 which is incorporated herein by reference.
  • IPTR Internationalized PTR Resource Record
  • a system, method, and logic for managing data including a database for implementing a key value operation with a key having a predetermined encoding, and means, such as an iterative converter, for iteratively converting the key from each of a plurality of encodings to the predetermined encoding before performing the key value operation with each converted key.
  • the key value operation may be a key value insertion operation or a key value retrieval operation and preferably accommodates at least one wildcard in the database and/or the key.
  • the system may further include means for verifying that a syntax of the converted key is valid and means for normalizing the converted key.
  • the encodings may be character encodings associated with one or more languages and the predetermined character encoding is preferably a universal character encoding, such as Unicode.
  • the system may further include means for providing image data corresponding to characters resulting from these key value operations.
  • the database may include name data, such as domain name data, and/or location data, such as IP address data and may be in the form of DNS resource records.
  • a data server such as a conversion server, and method and logic for implementing a data service, including means for receiving a request including an encoded portion, means for converting the encoded portion (such as a string of characters representing a domain name) of the request from each of a plurality of encodings to a predetermined encoding, and means for responding to the request based upon at least one of the converted portions having the predetermined encoding.
  • the plurality of encodings may be chosen to correspond with a character set token or product token in the request, with a language designation by a client, or with character encodings that are directly identified by the client or user.
  • the server may further include means for verifying a syntax of each of the converted portions and means for normalizing each of the converted portions.
  • the response may include one or more converted strings in the preferred encoding, image data corresponding to the characters in one or more of the converted strings, and/or character names corresponding to the characters in the converted strings.
  • the server may be a daemon and subsumed in a NAMED portion of the Berkeley Internet Name Domain software.
  • the server may also be configured as a file server, registration server, Network File System server, Network Information Service server, Domain Name System server, WHOIS server, File Transfer Protocol server, Hyper Text Transfer Protocol server, Simple Mail Transfer Protocol server, or a Lightweight Directory Access Protocol server.
  • an implementation of the Domain Name System protocol will include a name server for receiving a query including an encoded domain name expression, means for iteratively converting the encoded domain name expression from each of a plurality of character encodings to a predetermined character encoding, and means for providing a response to the query based upon at least one of the converted domain name expressions having the predetermined character encoding.
  • the server response may also include data representing a second domain name expression, such as a fully-qualified domain name expression, image data, an IP address, or an HTTP response with a redirection status code.
  • Each of the plurality of character encodings may be associated with one or more languages, such as the languages typically used in a particular domain or geographic region.
  • the plurality of encodings may also be chosen to correspond to the character set and/or products tokens in an HTTP message.
  • the Domain Name System may also include means for providing the query to the name server, such as a second name server having a wildcard resource record for directing the query to the first name server.
  • the system may further include means for verifying a syntax of each converted domain name expression and means for normalizing each converted domain name expression.
  • the system may include wrapper code operating in conjunction with the Berkeley Internet Name Domain (“BIND") implementation of the DNS protocol.
  • BIND Berkeley Internet Name Domain
  • the DNS system may also include a first module, such as a Referral Domain Name Service ("RDNS"), for determining whether one of the queried . domain name expressions contains 7-bit DNS-legal character strings, 8-bit DNS- legal character strings, or another type of character strings.
  • RDNS may consider the character set token and/or product token in an HTTP message.
  • the Referral Domain Name Service also determines whether the one queried domain name expression contains special character strings.
  • Eight-bit DNS-legal character strings are referred to a second module, or Unicode Validation and Canonicalization Engine (“UVCE”), for determining whether the 8-bit DNS-legal expression has been encoded with the Unicode character map. Prior to mapping, the Unicode Validation and Canonicalization Engine also validates, downcases, and decomposes the 8-bit, DNS-legal, Unicode expression.
  • UVCE Unicode Validation and Canonicalization Engine
  • the DNS system may also include a third module, or Legacy Unicodification Trial Engine (“LUTE”), for converting one of the queried domain name expressions from one character map to a universal character map, such as Unicode (preferably using the UTF-8 transformation format), prior to attempting a look-up (or other type of mapping) of the resource records for the converted expression. If the look-up attempt is unsuccessful, then the LUTE converts the queried domain name expression from another different encoding to the universal character map prior to another look-up attempt until either a successful look-up is achieved or all available conversions from various character maps to the universal character map have been attempted.
  • LUTE Legacy Unicodification Trial Engine
  • the system relates to an enterprise system such as a Network Information Center including a registration web server, a relational database management system, and a system for implementing the Domain Name System (DNS) protocol in distributed name space with a name server for mapping resource records to queried domain name expressions that are encoded with different character maps.
  • a Network Information Center including a registration web server, a relational database management system, and a system for implementing the Domain Name System (DNS) protocol in distributed name space with a name server for mapping resource records to queried domain name expressions that are encoded with different character maps.
  • DNS Domain Name System
  • a virtual internationalized domain name system including a URI forwarding agent, such as a URL forwarding agent, for attempting a mapping of a queried domain name expression that is encoded with an initially-undetermined character map to a corresponding DNS-legal domain name expression.
  • the initially-undetermined character map may be a non-ASCII character map, use a binary code unit that is longer than seven bits, and/or include at least one DNS-illegal character.
  • the system may also include a name server with a wildcard resource record for providing an IP address for referring the query to the URL forwarding agent.
  • the URL forwarding agent includes a first module for converting the queried domain name expression to a preferred character map prior to the attempted mapping.
  • the preferred character map may be a universal character map, such as Unicode.
  • the first module may also verify that the queried domain expression is encoded with a Unicode/UTF-8 character map and canonicalize the verified expression prior to the attempted mapping.
  • the URL forwarding agent may also include a second module for iteratively converting the queried domain name expression from various character maps to a preferred character map prior to said attempted mapping.
  • the preferred character map may be a universal character map, such as Unicode.
  • the first module may also verify and canonicalize the encoding of the converted expression prior the attempted mapping. When all attempted mappings are unsuccessful, the URL forwarding agent will map the queried domain name expression to a predetermined domain name expression.
  • a method of implementing a virtual internationalized domain name system includes the steps of receiving a query with a domain name expression that is encoded with an initially-undetermined character map, and attempting a mapping of the queried domain name expression to another domain name expression which is preferably DNS-legal.
  • the initially- undetermined character map may be a non-ASCII character map, use a binary code unit that is longer than seven bits, and/or include at least one DNS-illegal character.
  • the queried domain name expression is preferably received from a client that has been provided with an IP address from a participating name server in response to finding a wildcard resource record in a zone file of the name server.
  • the method may also include the step of verifying whether the queried domain expression is encoded with a Unicode/UTF-8 character map and then canonicalizing the verified expression prior to the attempted mapping.
  • the method may also include converting the queried domain name expression to a universal character map before the attempted mapping step.
  • the universal character map may be Unicode, in which case, the converted domain name expression may also be verified, and the verified expression canonicalized prior to said attempted mapping.
  • the queried domain name expression is converted from another character map to a preferred character map before the next attempted mapping.
  • the preferred or predetermined character map may be a universal character map, such as Unicode.
  • the first module may also verify and canonicalize the encoding of the converted expression prior the attempted mapping.
  • the URL forwarding agent will map the queried domain name expression to a predetermined domain name expression.
  • a virtual internationalized domain name system including a name server with a wildcard resource record for referring a queried domain name expression that is encoded with an initially-undetermined character map.
  • the participating name server may include a wildcard resource record with an IP address of a URI forwarding agent.
  • the initially- undetermined character map may be a non-ASCII character map, use a binary code unit that is longer than seven bits, and/or include at least one DNS-illegal character.
  • Another method of implementing a virtual internationalized domain name system includes the steps of receiving a query with a domain name expression that is encoded with an initially- undetermined character map, and referring the query to a forwarding agent for mapping the queried domain name expression to another domain name expression.
  • the initially-undetermined character map may be a non-ASCII character map, use a binary code unit that is longer than seven bits, and/or include at least one DNS-illegal character.
  • the other domain name expression is preferably DNS-legal.
  • the URI forwarding agent is arranged to map a queried domain name expression that is encoded with an initially-undetermined character map to a corresponding DNS-legal domain name expression.
  • the queried domain name expression may include at least one DNS-illegal character.
  • the initially-undetermined character map may be a non-ASCII character map and/or use a binary code unit that is longer than seven bits.
  • the URI forwarding agent may further include one or more modules for making multiple mapping attempts.
  • a first module verifies that the queried domain expression is encoded with a Unicode/UTF-8 character map and canonicalizes the verified expression prior to a first attempt at said mapping.
  • a second module converts the queried domain name expression to a preferred character map prior to a second attempt at mapping.
  • the preferred character map may be a universal character map, such as Unicode with the UTF-8 transformation format.
  • the URI forwarding agent maps the queried domain name expression to a predetermined domain name expression where, for example, information for registering the domain name may be presented.
  • the general system can be broadly described with regard to four main components.
  • the first component is a database for implementing a key-value retrieval using a pre-determined character encoding that is preferably a universal character encoding such as Unicode.
  • the second component is a key validator for determining whether the key follows an acceptable pattern in the character encoding.
  • the third component is an encoding converter that transforms text to and/or from the predetermined character encoding, preferably with integral validation that the input text is actually a valid source character encoding.
  • the fourth component is an iterator that performs the conversion, validation, and database lookup components in an iterative fashion.
  • Optional components of the system include a key normalization mechanism which may be combined with the key validation component.
  • a pattern matching mechanism (which is an extension to the database component) by which multiple distinct keys are made to correspond to the same value data may also be included.
  • a resolution mechanism may be provided for using interactive dialogue to resolve ambiguous or failed identifications of the character encoding of a key.
  • An image conversion mechanism may be provided for converting text in some character encoding to an image in some graphical format, and a constraining mechanism may be provided for constraining the set of character encodings that are under consideration by specifying the language of the text.
  • This latter system generally operates as follows. A key is received and passed to the key validator. If the key is determined to be valid, then a database lookup is attempted, and a reply is generated. The reply will contain either data from the database, or a failure message when no data was found. If the key is not valid, then control passes to the iterator.
  • the iterator has an encoding pointer that is initialized to the first character encoding in a prioritized list of encoding conversions that are to be attempted.
  • a conversion of the key is then attempted from the current character encoding (i.e., the character encoding currently identified by the pointer) to the first encoding in a prioritized list. If the conversion succeeds, then the resulting converted key is validated and a database lookup is attempted with that new key.
  • a conversion of the data to the current encoding is also attempted. If the conversion of the data from the first character encoding to the current encoding succeeds, then a reply is generated containing the conversion of the data that has been found in the database, and the process completes. If any of these steps fails, then the iterator encoding pointer is incremented to the next encoding in the list, and the process is repeated with the attempted conversion from the next encoding in the list of prioritized encodings. The process is then repeated until there is a successful conversion, or the encoding list is exhausted so as to generate a reply containing a failure message is generated.
  • the general system may be subsumed in an otherwise conventional DNS service whereby DNS records can be keyed on, and can contain, characters which are not part of the US-ASCII character map.
  • the DNS server's key-value lookup table is used as the database and a simple test is performed to determine if the query consists entirely of only valid ASCII character patterns before the query reaches the principal key validator. If valid ASCII character patterns are found, then the query is immediately looked up without reaching the principal validator.
  • a key normalization system that is tightly integrated with the principal key validator may also be included for all queries that will reach the principal validator so that normalization is necessary.
  • the server's built-in lookup table system ignores ASCII case. Since case is the only character attribute in ASCII that is affected by normalization, it is not necessary to perform key normalization for queries that consist entirely of valid ASCII character patterns.
  • a pattern matching mechanism may also be tightly integrated with the server's built in table lookup system.
  • FIG. 1 is a schematic diagram of a data managing system.
  • FIG. 2 is a schematic diagram of a communication system illustrating various implementations of the data managing system of FIG. 1.
  • FIG. 3 is a flow diagram illustrating the architecture, operation, and/or functionality of one of a number of possible embodiments of the data management facility of FIG. 1.
  • FIG. 4 is a flow diagram illustrating the architecture, operation, and/or functionality of another possible embodiment of the data management facility of FIG. 1.
  • FIG. 5 is a flow diagram illustrating the architecture, operation, and/or functionality of yet another possible embodiment of the data management facility of FIG. 1.
  • FIG. 6 is a flow diagram illustrating the architecture, operation, and/or functionality of yet another possible embodiment of the data management facility of FIG. 1.
  • FIG. 7 is a flow diagram illustrating the architecture, operation, and/or functionality of one of a number of possible embodiment of yet another embodiment of the data management facility in FIG. 1.
  • FIG. 8 is a screen shot of a user interface device.
  • FIG. 9 is a flow diagram illustrating the architecture, operation, and/or functionality of one of a number of possible embodiments of the data management facility of FIG. 1.
  • FIG. 10 is a group of resource records for use by the participating server shown in FIG. 11.
  • FIGS. 11 and 12 are schematic diagrams illustrating the interaction between the devices shown in these FIGs.
  • FIG. 13 is a group of resource records for use in the URL forwarding agent shown in FIG. 11.
  • FIG. 14 is a flow diagrams illustrating the architecture, operation, and/or functionality of yet another possible embodiment of the data management facility of FIG. 1.
  • FIGS. 15 and 16 are related flow diagrams illustrating the architecture, operation, and/or functionality of yet another possible embodiment of the data management facility of FIG. 1.
  • FIG. 1 is a schematic diagram of certain components in a data managing system 100.
  • the data managing 100 may be implemented in a wide variety of electrical, electronic, computer, mechanical, and/or manual configurations. However, in a preferred embodiment, the system 100 is at least partially computerized with various aspects of the system being implemented by software, firmware, hardware, or a combination thereof.
  • the preferred data managing system 100 includes a processor 110, memory 120, and one or more input and/or output (“I/O") devices 130.
  • the processor 110, memory 120, and I/O devices 130 are communicatively coupled via a local interface 140.
  • the local interface 140 may include one or more buses, or other wired or wireless connections, as is known in the art.
  • the interface 140 may have other communication elements, such as controllers, buffers (caches) driver, repeaters, and/or receivers.
  • Various address, control, and/or data connections may also be provided with the local interface 140 for enabling communications among the various components of the system 100.
  • the input/output devices 130 may include network connections, such as Internet gateways and/or routers.
  • the memory 120 may have volatile memory elements (e.g., random access memory, or "RAM,” such as DRAM, SRAM, etc.), nonvolatile memory elements (e.g., hard drive, tape, read only memory, or "ROM,” CDROM, etc.), or any combination thereof.
  • RAM random access memory
  • nonvolatile memory elements e.g., hard drive, tape, read only memory, or "ROM,” CDROM, etc.
  • the memory 120 may also incorporate electronic, magnetic, optical, and/or other types of storage devices.
  • a distributed memory architecture where various memory components are situated remote from one another, may also be used.
  • the processor 110 is preferably a hardware device for implementing software that is stored in the memory 120.
  • the processor 110 can be any custom-made or commercially available processor, including semiconductor- based microprocessors (in the form of a microchip) and/or macroprocessors.
  • the processor 110 may be a central processing unit ("CPU") or an auxiliary processor among several processors associated with the computer 100.
  • suitable commercially-available microprocessors include, but are not limited to, the PA-RISC series of microprocessors from Hewlett-Packard Company, U.S.A., the 80x86 and Pentium series of microprocessors from Intel Corporation, U.S.A., PowerPC microprocessors from IBM, U.S.A., Sparc microprocessors from Sun Microsystems, Inc, and the 68xxx series of microprocessors from Motorola Corporation, U.S.A.
  • the memory 120 stores software in the form of instructions and/or data for use by the processor 110.
  • the instructions will generally include one or more separate programs, or modules, each of which comprises an ordered listing of executable instructions for implementing one or more logical functions.
  • the software contained in the memory 120 includes a suitable operating system ("O/S") 150, along with a database 160 and a data management facility 170 including one or more modules as described in more detail below.
  • O/S operating system
  • the operating system 150 implements the execution of other computer programs and provides scheduling, input-output control, file and data management, memory management, communication control, and other related services.
  • Various commercially-available operating systems 160 may be used, including, but not limited to, the Windows operating system from Microsoft
  • the database 160 will include one or more structured sets of persistent data along with associated with software to update and query the data. For example, a simple database could be arranged as a single file containing many records, each of which contains the same set of fields where each field may be a certain fixed width.
  • the database 160 will also include various conventional database management programs that support query languages and report writers that allow users to interactively interrogate the database and analyze its data.
  • the database may be local, remote, or distributed in arrangement.
  • the database may also be deductive hierarchical, functional, object-oriented, or relational in configuration.
  • Records in the database 160 are preferably retrieved, inserted, or otherwise operated on or manipulated using a key value.
  • the key may be one of the fields, e.g. a column if the database is considered as a table with records being rows.
  • the key may be obtained by applying some function, e.g. a hash function, to one or more of the fields.
  • the set of keys for all records forms an index. Multiple indexes may be built for one database depending on how it is to be searched.
  • the data in the database 160 may contain name data, such as domain name data, and/or location data, such as IP address data, that may be arranged in the form of DNS resource records.
  • name data such as domain name data
  • location data such as IP address data
  • the database 160 may also be configured as a separate, remote or local, hardware component of the system 100.
  • the data management facility 170 may be a source program (or "source code"), executable program ("object code”), script, or any other entity comprising a set of instructions to be performed as described in more detail below.
  • any such source code will typically be translated into object code via a conventional compiler, assembler, interpreter, or the like, which may (or may not) be included within the memory 120.
  • the various modules of the data mapping facility may be written using an object oriented programming language having classes of data and methods, and/or a procedure programming language, having routines, subroutines, and/or functions.
  • suitable programming languages include, but are not limited to, C, C+ +, Pascal, Basic, Fortran, Cobol, Perl, Java, and Ada.
  • a "computer readable medium” includes any electronic, magnetic, optical, or other physical device or means that can contain or store a computer program for use by, or in connection with, a computer-related system or method.
  • the computer-related system may be any instruction execution system, apparatus, or device, such as a computer- based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and then execute those instructions. Therefore, in the context of this document, a computer-readable medium can be any means that will store, communicate, propagate, or transport the program for use by, or in connection with, the instruction execution system, apparatus, or device.
  • the computer readable medium may take a variety of forms including, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples of a computer-readable medium include an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic), a random access memory (“RAM”) (electronic), a read-only memory (“ROM”) (electronic), an erasable programmable read-only memory (“EPROM,” “EEPROM,” or Flash memory) (electronic), an optical fiber (optical), and a portable compact disc read-only memory (“CDROM”) (optical).
  • the computer readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, for instance via optical sensing or scanning of the paper, and then compiled, interpreted or otherwise processed in a suitable manner before being stored in the memory 120.
  • the system may be implemented using a variety of technologies including, but not limited to, discrete logic circuit(s) having logic gates for implementing logic functions upon data signals, application specific integrated circuit(s) ("ASIC") having appropriate combinational logic gates, programmable gate array(s) (“PGA”), and/or field programmable gate array(s) (“FPGA”).
  • ASIC application specific integrated circuit
  • PGA programmable gate array
  • FPGA field programmable gate array
  • the processor 110 will be configured to execute instructions in the operating system 150 that is stored within the memory 120.
  • the processor 110 will also receive and execute further instructions in the data mapping facility 170 so as to generally operate the system 100 pursuant to the instructions and data contained in the software and/or hardware as described below.
  • the data management facility 170 is configured with three modules. However, the facility may also be provided in other configurations, and with any number of modules.
  • the referral service module 172 directs traffics and/or certain queries for further processing without completing the query for that key value.
  • the iterative encoding conversion module 174 includes a converter and an iterater for iteratively converting a key in a query from each of a plurality of encodings to a (predetermined) preferred encoding before performing the key value operation with each converted key. The converter may also normalize the converted key value.
  • the encodings are preferably character encodings with the preferred encoding being Unicode. However, a variety of other encodings may also be used.
  • the optional key validation and normalization module 176 verifies that the syntax of the converted key is valid in the preferred encoding and/or normalizes the key according to the requirements of the preferred encoding.
  • the modules 172, 174, and 176 shown in FIG. 1 may be referred to as the Referral Domain Name Service ("RDNS"), Legacy Unicodification Trial Engine (“LUTE”), and Unicode Verification and Canolicalization Engine (“UVCE”), respectively.
  • RDNS Referral Domain Name Service
  • LUTE Legacy Unicodification Trial Engine
  • UVCE Unicode Verification and Canolicalization Engine
  • FIG. 2 illustrates a communication system 200 in which various embodiments of data managing system 100 may be implemented.
  • the communication system 200 may include client devices 212, service providers 214, root servers 216, web servers (such as DNS servers, URL forwarding agents, and conversion servers) 218, mail servers 220, WHOIS servers 222, and a communications network 210.
  • Service providers 214 may facilitate communication between client devices 212 and root servers 216, web servers 218, mail servers 220, WHOIS servers 222, and registration servers 222 via the communications network 210.
  • the communications network 210 may be any type of communication network employing any network topology, transmission medium, or network protocol.
  • communications network 114 may be a local area network (LAN), a metropolitan area network (MAN), a wide are network (WAN), any public or private packet-switched or other data network, including the Internet, circuit-switched networks, such as the public switched telephone network (PSTN), wireless networks, or any other desired communications infrastructure.
  • LAN local area network
  • MAN metropolitan area network
  • WAN wide are network
  • any public or private packet-switched or other data network including the Internet
  • circuit-switched networks such as the public switched telephone network (PSTN), wireless networks, or any other desired communications infrastructure.
  • PSTN public switched telephone network
  • client devices 212 As will be understood by one of ordinary skill in the art, the precise configuration of client devices 212, service providers 214, root servers 216, web servers 218, mail servers 220, WHOIS servers 222, and registration servers 222 is not critical. The important aspect is that the various embodiments of data managing system 100 may be implemented by, or in connection with, client devices 212, service providers 214, root servers 216, web servers 218, mail servers 220, WHOIS servers 222, and registration servers 222.
  • FIG. 3 is a flow diagram for one embodiment of the data management facility 170 shown in FIG. 1. More specifically, FIG. 3 shows the general architecture, functionality, and operation of a software system 370 for implementing the referral service module 172, iterative encoding conversion module 174, and key validation module 176. However, as noted above, a variety of other computer, electrical, electronic, mechanical, and/or manual systems may be similarly configured.
  • Each block in FIG. 3 represents an activity, step, module, segment, or portion of computer code that will typically comprise one or more executable instructions for implementing the specific logical function(s). It should also be noted that, in various alternative implementations, the functions noted in the blocks will occur out of the order noted in the FIGs. For example, multiple functions in different blocks may be executed substantially concurrently, in a different order, incompletely, and/or over an extended period of time, depending on the functionality involved. Various steps may also be skipped or completed manually.
  • the data management facility 370 starts with a key 302 provided as part of a database query or other key value operation associated with the database 160.
  • the key will have an initially-undetermined encoding, such as a US-ASCII, Unicode, or EUC-TW (Taiwanese) character encoding.
  • the optional referral service module 172 performs gross analysis at step 304 and "traffics" queries containing certain keys for further processing at step 306. For example, certain keys may not be suitable for use with the database 160 without further processing.
  • the string is then optionally downcased, if necessary, at step 308 and a database lookup is attempted at step 310. If the lookup is deemed successful at step 312, then a check will be made as to whether any other lookup attempts (discussed below) were also successful at step 314.
  • the retrieved data or other output 316 from the database 160 is provided.
  • a mechanism may be provided for resolving the ambiguity associated with multiple successful lookups at step 318.
  • multiple database retrievals may be provide to the client and/or displayed to the user for selection of the appropriate data.
  • the key 302 is sent for further processing by the key validation module 176 and iterative encoding module 174. However, on subsequent passes, if it is determined that the key 302 has been previously looked up at step 320, then further processing by the key validation module 176 may be skipped and the key 302 sent directly to the iterative encoding conversion module 174 as illustrated in FIG. 3.
  • an analysis is performed to determine whether the encoding of the key 302 in a preferred encoding is valid. For example, the analysis may consider whether the encoding of the key 302 follows an acceptable syntax in the preferred encoding format. For keys containing encoded characters, this will preferably include a determination as to whether the key follows a valid Unicode syntax, such as by examining the validity of the Unicode code points. Of course, other encodings, or portions of encodings, such as just a character encoding form, may be considered instead of an entire Unicode character mapping such as compression encodings, and security encodings or encryptions. If the syntax is acceptable, then the key is assumed to have the preferred encoding and is optionally normalized at step 324 before being sent back for another lookup. The normalization step 324 may alternatively be performed before validity checking step 322.
  • the module 174 includes an encoding converter that converts the key 302 from one of a plurality of encodings to the preferred encoding.
  • the key 302 (with an unknown character encoding at this point in the system 370) may be converted from EUC-TW (Taiwanese) to Unicode at step 328 before another lookup is attempted at step 310.
  • a reverse conversion (not shown) may also be provided where, for example, the Unicode key is converted back to EUC-TW (Taiwanese) and compared with the original key as a further check on the conversion process.
  • the module 174 also includes an iterator (or iterater) for choosing another encoding from which to convert the key to Unicode before sending the key back for another lookup.
  • the key 302 could be converted from EUC-JS (Japanese) to Unicode before being sent for another attempted lookup.
  • EUC-JS Japanese
  • any number of character (or other) encodings in addition to EUC-TW and EUC-JS will also be attempted until it has been determined that all conversions have been attempted at step 326 and the process stops at step 330. If there are no successful lookup attempts, then an error message (not shown) may be provided.
  • the encodings that are used for these conversions may be explicitly or implicitly identified by the client or user submitting the key 302.
  • the key may be submitted as part of and HTTP protocol message including a character set token and/or product token from which the appropriate character encodings can be deduced.
  • Various other protocols could also be defined to include similar character encoding and/or product identifiers.
  • FIGs. 4 and 5 illustrate alternative embodiments 470 and 570 of the system 370 that is shown in FIG. 3.
  • each converted key is sent from the iterative encoding conversion module 174 to the key validation module 172 before being normalized at step 324 sent for lookup at step 310.
  • the conversion step 328 results in a converted key with an invalid syntax for the resulting preferred encoding, then no lookup is required for the invalid key and the next conversion will be attempted until all conversions have been attempted at step 326.
  • an additional step 340 is added to determine if the key originated from the iterative encoding conversion module 174 in order to bypass the normalization step 324 under those circumstances.
  • Figure 6 illustrates another system 670 for implementing the data management facility 170 (FIG. 1 ) that is particularly useful for mapping names to locations, and, in particular, domain names to resource records including IP addresses.
  • This embodiment is particularly useful with the BIND version 8.2.2-P5 software which is distributed by the Internet Software Consortium.
  • BIND includes a database with resource records for implementing key value retrievals for keys including only DNS-legal characters with the US- ASCII encoding.
  • RDNS is optional wrapper code which resides inside, or is called by, the main lookup function of BIND.
  • This wrapper code acts as an interface between BIND and the UVCE and LUTE modules. More particularly, RDNS performs gross analysis and "traffics" queries for further processing depending upon the results of a string analysis of the query.
  • a queried domain name expression, username name expression, hostname expression, or other expression 602 including a domain name that is encoded with an "initially- undetermined" character map is received by RDNS from the resolver.
  • the character map which was used to encode this queried domain name expression is initially-undetermined because it is not specifically identified by the client making that request. Consequently, it is unknown or unspecified to the system at this point, but will eventually be determined.
  • RDNS acts as a query filter which preferably classifies the queried expression into one of four groups: 1 ) a special string, 2) a 7-bit DNS legal string, such as one encoded with US-ASCII 3) an 8-bit DNS legal string, such as one encoded with ISO 8859, or 4) an illegal string that might include DNS-illegal • characters and/or be encoded with Unicode.
  • a special string is one that is identified for special processing such as immediate delegation to another server or to another module on the same server. For example, the special string is identified at step 604 and referred to an external module for further processing at step 606.
  • the special string is problematic (such as one with unclean characters) and therefore delegated to another module (not shown) on the same server, it could receive some additional preliminary processing before being returned to RDNS. Alternatively, the other module may simply cause an error or warning message to be issued.
  • the string may also be delegated to another server by RDNS if it falls within a subzone for which authority has been delegated to another name server in the same domain. This latter configuration allows groups of resource records for a particular zone to be divided and conveniently stored on different machines for ease of administration and possibly faster lookups.
  • legacy expressions 610 include seven-bit, DNS-legal strings and are sent for mapping to the appropriate resource record using conventional, or "legacy,” lookup technology provided by existing implementations of the DNS protocol such as existing BIND software.
  • the eight-bit DNS-legal strings are passed to the UVCE for further analysis.
  • the 8-bit DNS-legal strings may also be further grouped by the RDNS into ISO-8859-1 and/or Unicode encodings and flagged accordingly for further processing by the UVCE or LUTE modules module.
  • the remaining “other" strings are likely to be “DNS-illegal” strings and are passed to LUTE, if enabled, or delegated to a LUTE-enabled server upon synthesis of an appropriate name server ("NS") record.
  • NS name server
  • UVCE is a key validator for determining whether the key follows an acceptable pattern for a universal character encoding, such as
  • UVCE first checks whether the string is a valid Unicode/UTF-8 encoding at step 612 by confirming the integrity of the UTF-8 encoding and the validity of all Unicode code points that appear in the input string.
  • other character map encodings besides Unicode, including other universal character sets and/or character encoding schemes, may also be used.
  • the universal character encoding may also be constructed by combining a set of distinct character encodings and then using tagging, or another scheme, to identify the encoding that is used in each portion of text.
  • UVCE also confirms at step 612 that only "clean" characters appear in the input, i.e., there are no "unsafe” or “reserved” characters that are particularly troublesome when used in domain names as is generally described in RFCs 1738 and 1630. Most punctuation and all whitespace are rejected when DNS restrictions are flagged.
  • the validation stage also computes the size of the output string so that if conversion (downcasing and/or decomposition) is necessary, the output string can be allocated with a single call to the memory allocator.
  • any combination of Unicode characters will be validated at step 612 except for control characters, whitespace, and unassigned, private, and surrogate code points. Letters, letter modifiers, and decimal and alphabetic numbers, are always permitted. A mid-dot is permitted in non-label boundary positions. Non-spacing marks and spacing-combining marks are permitted anywhere except at the start of a label. Punctuation is prohibited, except that dash and period are permitted as provided for in conventional DNS implementations. Symbols, fractions, control characters, separators, whitespace, and surrogate, private, and unassigned code points, are also prohibited.
  • Valid hostname labels are limited to 63 octets with a maximum domain text length of 255 octets and maximum packet size of 512 octets in order to conform with legacy DNS systems.
  • Unicode is a variable length character map, with many scripts being encoded by two or three octets per character (or four per surrogate pair), it is expected that these label lengths may be expanded and could be easily accommodated by modifications to the system disclosed here.
  • UVCE also normalizes the key at step through, for example, downcasing any uppercase characters in the string to lower case and/or decomposing the characters into their constituent elements.
  • the string Once the string has been validated as a proper Unicode/UTF-8 encoding, it is "canonicalized" (i.e. downcased and fully decomposed) at step 614 using compatibility decomposition and/or canonical decomposition.
  • Canonicalization is preferably performed by recursively downcasing and performing a single stage of decomposition/normalization mapping, in that order, until further recursion has no effect, or performing any operation with the same result.
  • Normalization Form KD (compatibility decomposition).
  • Normalization Form KC can then be computed from Form KD by performing canonical composition, i.e. recombining character sequences that can be represented with generic combined forms
  • Compatibility decomposition (Form KD) is preferred because, in contrast to canonical decomposition, all characters which differ only in typographic nuance, are treated as equivalents. This is precisely the transformation that is preferred for the DNS name space. For example, superscript and subscript forms are simply converted to their ordinary forms using Form KD. Thus, once canonicalized in this manner, "emc2.nu" will refer to the same domain, regardless of whether the client submits the '2' character in plain or superscript form.
  • the resulting canonicalized, or canonical, expression 616 is then sent for lookup at step 618. If the lookup is successful at step 620, then a resource record 622 is identified and/or returned at step 622. Resource records may be returned using the same character map as the queried domain name expression or another character map. If an attempt to look up the canonical expression 616 fails, then an error message (not shown) may be issued indicating that no matching records have been found. However, such unsuccessful lookups of canonical expressions (or 7-bit DNS legal strings) 616 at step 618 are preferably referred to LUTE for further processing.
  • LUTE includes an encoding converter for transforming a key in one encoding to another encoding that is preferably a universal character encoding.
  • LUTE also includes an iterator for controlling the database, key validator, and encoding converter to perform in an iterative fashion using a plurality of different character encodings.
  • LUTE is the final step for query categorization and lookup. It performs an iterative process of converting a DNS-illegal string
  • each converted expression 628 may also be validated and canonicalized using the UVCE module. After any validation and/or canonicalization by UVCE, a lookup is attempted at step 618 with each converted domain name expression 628. If successful, an optional reverse conversion may be performed (not shown) on the retrieved records 622 in order to confirm that they can be converted to match the character map encoding that was used by the client.
  • next lookup or reverse conversion
  • another conversion is performed from the next character encoding to the preferred universal encoding at step 626. If the next lookup with the next converted expression 628 is unsuccessful, then the original expression is converted from a subsequent character map encoding to Unicode (or other universal character map) and another look up is attempted. The conversion and look up process is then repeated until all encodings have been attempted at step 624 in an order that is set at runtime and can be updated at any time while the server is running. If all lookups are unsuccessful, then client making the original query may be referred to another server at step 630, such as a registration server for registering the domain name 602.
  • a typical DNS resolution using the system shown in FIG. 6 with BIND can also be described as follows.
  • the resolver library receives a hostname and record type to be resolved.
  • the hostname is then validated, to assure that it contains no invalid character patterns and does not exceed any length constraints.
  • the server is asked to resolve a hostname that contains an invalid character pattern, it returns an ns_r_nxdomain "Name error" message immediately, and does not attempt a resolution. If there are no error messages, a query packet is constructed from the hostname as it was received by the library.
  • the packet is then transmitted to the recursing name server at which the client is currently pointed according to the resolver library configuration (e.g., localhost).
  • the packet is then received and deconstructed by the recursing server.
  • the recursing server extracts and canonicalizes the queried hostname from the packet. It then attempts a lookup in the name server's table of records using the computed canonical form and the extracted target record type. If any matching records are found, a reply packet is constructed and returned to the client, which then deconstructs the reply and supplies the information in an appropriate form to the agent that invoked the resolver library.
  • the uncanonicalized form of the queried hostname is preferably used in constructing the reply packet so that the client is certain to see a bitwise match between the expected and actual key in the reply.
  • a "treewalk" is performed in order to identify another name server for answering the hostname query.
  • the identified name server is then queried for the requested data.
  • the domain text that is used for constructing query packets is preferably uncanonicalized even though canonicalization might result in no changes to the query.
  • the system is preferably based on BIND version 8.2.2-p5 running under Solaris with the several additional components including a validating UTF-8 coder/decoder.
  • a set of utility library routines are provided such as a Unicode character table loader, a set of character categorizers and other attribute tests, replacements for the C library str[n]casecmp( ) calls, and various other support routines.
  • a Unicode/UTF-8 UVCE generating Normalization Form KD and a Unicode character table generator utility program are also provided in the preferred configuration.
  • the preferred configuration will also include alterations to various resolver libraries in conventional BIND.
  • "Res_hnok( )” is made to operate with UVCE and "resolv.conf is enhanced to add a switch for enabling the process.
  • a character table path specification for use when the table is not in the usual location is also added.
  • Alterations to the name server are also made in connection with "nlookup( ),” “ns_req( ),” “ns_resp( ),” and others while a various utility client alterations are made to cause verbatim, rather than escaped, output of the UTF-8 transformed octets.
  • the behavior of the system is preferably set at runtime in the startup configuration file ("named. conf for BIND ver. 8) on a general, or zone by zone, basis.
  • the startup configuration file named. conf for BIND ver. 8
  • all RDNS name daemons will typically have RDNS enabled, but only some will have LUTE enabled.
  • front-line root name servers would preferably have only the UVCE enabled, with the RDNS referring LUTE queries to a LUTE-enabled server via synthesis of, and reply with, an appropriate name server ("NS") resource records.
  • N name server
  • LUTE preferably takes place within the caching server, rather than in the authoritative server.
  • the list of candidate character maps to be used when a name server acts as a cache, and its order of precedence, can (and usually will) be tailored by trimming out those character maps that are not used by clients of the caching server, and by ordering those that remain so that more-used encodings precede less-used encodings in the LUTE conversion process.
  • the name server can also be configured so that, when a domain name query resolved through recursion does not unambiguously identify a suitable character encoding for use in the reply, a predetermined character encoding will be used in replies whenever conversion is possible.
  • This system allows for scalability since new character map conversions can be added to LUTE as they are developed.
  • the system also allows the most common and computationally fastest encodings to be placed earlier in the
  • the system also allows both the DNS query and reply to contain DNS- illegal characters from a universal character set such as Unicode. It "deterministically” accommodates the universal character map encoding in a predictable and repeatable fashion. It also “heuristically” accommodates other character maps in a manner which does not necessarily produce the expected and desired result. Moreover, the system deterministically ignores variations of character form in a queried domain name expression so that all such forms can be treated equally.
  • the system may be operated as part of an enterprise system, such as a domain name registry business operated by an NIC.
  • the system will include a registration web server, relational database management system, and a middleware "glue" scripting system such as Cold Fusion by Allaire. General information about middleware is available in RFC 2768.
  • the NIC will use these components to maintain a master database for, among other things, implementing a key value insertion and retrieval operations using a universal character encoding such as Unicode.
  • Each record in the registration database is typically stored in three forms.
  • each domain name is stored, verbatim, as it was submitted in the application for registration.
  • the character encoding is then copied verbatim from the HTTP submission portion of the application, and/or optionally confirmed from a dialogue with the customer, before it is stored in a second column in the database.
  • This form is used solely for reference purposes, for example, when a user reports a problem indicating that the name was improperly encoded.
  • the domain name is recorded with any compatibility characters, and any upper and lower features, in what is sometimes referred to as "colloquial Unicode," or the ISO 10646-1 "presentation form.” This presentation form is used for the zone and name daemon configuration files used by BIND.
  • the information in the third column is then downcased and fully decomposed (including any canonical and/or compatibility decomposition) and placed in a fourth column which is used to check for the availability of a domain during registration.
  • Normalization Form KC is used, rather than Form KD, then canonicalization of conventionally presented (composed lowercase) hostname text will not result in any significant change in process size as compared to keying records on distinct canonical forms with Form KD.
  • Form KC is very attractive, despite the additional computational load it involves over Form KD.
  • the information in the fourth column is also extracted to build configuration files for other services that perform such look ups, such as, "WHOIS" services. Similar domain name applications that would not collide using the information in the third column, such as those having only case or decomposition differences, will collide when compared with information from the fourth column, and will be properly rejected.
  • the system can also be operated as a zone file filter by passing the domain name of the zone through the UVCE as it appears in the configuration file. Any name that is not an 8-bit, DNS-legal, valid Unicode expression is kept in verbatim form for presentation and canonical form for lookup. Then, for each record in the corresponding zone file, the record can be rejected if it is not a DNS-legal Unicode expression. For records which are not rejected, if the "left hand side" contains non-ASCII characters, they are passed through the UVCE to compute the canonicalized expression for lookup. The verbatim (non- canonicalized version) may then be preserved for presentation purposes.
  • the character encoding of the domain name in a request to a name server is generally not identified by the client making that request. Consequently, any server that incorporates the invention shown in FIG. 6 can be used to find the corresponding resource records for DNS-illegal domain names, if they exist, in the database files of that server.
  • the client making the request may not be using the appropriate character map to decode any textual portions of the server's reply. This can be particularly problematic for WHOIS servers where the encoding used for the domain name information in the reply will significantly alter how that information is expressed in textual format by the client.
  • a response from an improved WHOIS server will preferably include multiple encodings of the same domain name as shown in the screen shot illustrated in FIG. 8.
  • images of the glyphs that correspond to the characters in the domain name and/or names of the characters may also be provided. Any image information will then be presented as an image file (or link to an image file) in JPEG, PDF, and/or other image file formats. In this way, the information that is returned by the WHOIS server will be independent of the character map decoding being used by the client making the request.
  • an image may be presented to the user instructing them how to change the settings on their web browser in order to view the WHOIS server's response with the appropriate character map.
  • the WHOIS server may also be arranged to use a particular character map (such as Japanese shift JIS) in its response and/or to provide instructions for changing browser settings in a particular language (such as Japanese) depending on the location of the client making the request (such as a destination domain name including the ".jp" country code).
  • a particular character map such as Japanese shift JIS
  • a particular language such as Japanese
  • the server will preferably respond with a proposal for registering the name.
  • This registration proposal may ask the user to specify the character map with which for the requested domain name registration is encoded.
  • the unregistered domain name may be provided to the user in multiple character encodings, or with character images or names and the user will be asked to chose one encoding for registration.
  • FIG. 7 illustrates a system 700 for limiting the number of possible encodings that must be considered by the user in order to resolve this ambiguity.
  • the client or user may optionally be prompted to designate a language(s) or character encoding(s) which is received by the server at step 704.
  • the sever identifies one or more character encodings corresponding to that language, such as US-ASCII and Unicode for English. Alternatively, the appropriate character encoding(s) may be explicitly or implicitly determined from a character set designation, product designation, or other designation used by the protocol being implemented by the user or client.
  • the requested domain name is converted from each of the plurality of encodings to multiple Unicode strings as described below with regard to FIG. 14.
  • each character in the domain name string may be presented as a separate image and/or with a corresponding name for the particular glyph represented by the image.
  • the user may also be presented with options (such as a drop down box or link) for changing a particular character image to one that is phonetically, textually, contextually, positionally, or otherwise related to the first character shown in the registration proposal.
  • additional registration instructions may be provided as an image and/or text which uses the character encoding corresponding to the character encoding in the domain name being applied for.
  • additional registration instructions may be provided as an image and/or text which uses the character encoding corresponding to the character encoding in the domain name being applied for.
  • FIG. 8 when separate encoded strings are provide to the client they will appear as shown in FIG. 8, since the client 800 will typically operate properly with only a single character map. Consequently, strings that are encoded with any other character map will appear garbled when decoded by the client 800. For example, as shown in FIG. 8, only the top domain name is not garbled because it has been provided using the same character encoding as the client.
  • all of the domain name choices shown in FIG. 8 may displayed correctly by providing the user with image data for each character in each of the domain name choices as shown in FIG. 7 at step 710. The user can then be more easily prompted to select one of the groups of character images at step.
  • FIGS. 9-13 illustrate various aspects of another embodiment of the technology described above that does not require the zone files in every name server to include domain names with DNS-illegal characters or characters maps with eight-bit code units. Instead, the master zone files for each participating name server 1120 (FIG. 11 ) on the Internet are only slightly modified to include a wildcard resource record as shown in FIG. 10 and discussed below.
  • a wildcard is a special character or character sequence which matches any character in a string comparison, like ellipsis ("") in ordinary written text.
  • Unix filenames '?' matches any single character and '*' matches any zero or more characters.
  • regular expressions '.' matches any one character and "[...]" matches any one of the enclosed characters.
  • Authoritative name servers that do not wish to support internationalized domain names with non- ASCII character maps and/or DNS-illegal characters can continue to operate without making any changes by simply not including the wildcard resource record. It is therefore much easier to convince innovative network administrators to implement the second embodiment of the invention than the previously discussed DNS embodiment.
  • the "$ORIGIN nu.” record is a control entry, or directive, that resets the current origin so that lower records in the database with owner names that do not end in a dot (".") are treated as if they were appended with ".nu.”
  • the second and third records are name server records indicating that there are two name servers, "ns.nic.nu” and "ns2.nic.nu,” for the zone "nunames.nu.”
  • these particular servers are being operated by the Network Information Center (“NIC") that acts as the official registrar for all top-level domain names ending in ".nu.”
  • NIC Network Information Center
  • These servers will also handle registrations for other domain names that are expressed in non-ASCII character maps, with code units that are longer than seven bits, and/or include at least one DNS-illegal character.
  • multiple registrars may also be accommodated, for example, by their implementation of the first embodiment of the invention discussed above and/or by using additional wildcard resource records in the participating name server 1120 (FIG. 11 ).
  • Load balancing may also be accomplished by providing different wildcard resource records for each zone.
  • the next group of records in FIG. 10 represent other subdomain name servers that might be registered with the NIC for the .nu domain.
  • authority for the "aaaa.nu" subdomain is delegated to the name server at the address ns.aaaa.se. Consequently, queries concerning hosts in the aaaa.nu subdomain will be delegated to the name server at ns.aaaa.se.
  • the last record on the last line of Fig. 10 is a wildcard address record that covers all other domain names ending in ".nu.” This record will cause any domain ending in ".nu” that does not match with any specific resource records for this origin to be forwarded to the host at IP address 206.33.200.73. This includes any domain name queries that use 7-bit DNS-illegal characters, characters with 8-bit code units, and/or any other non-ASCII character map. Therefore, such atypical domain names do not have to be added to the zone files in the conventional name servers that are currently operating on the Internet.
  • the host at IP address 206.33.200.73 is the URI forwarding agent 1130 shown in FIG. 11.
  • forwarding agents for other types of URIs, besides URLs, could also be used.
  • the forwarding agent 1130 could also be located at a different IP addresses if the wildcard record in the referring server was modified accordingly.
  • FIGs. 11 and 12 are schematic diagrams illustrating the interaction between the devices shown in the FIGs.
  • a client device 1110 first sends and HTTP request to the participating name server 1120 that has been provide with the appropriate wildcard resource record shown in FIG. 10.
  • the reply from name server 1120 includes a DNS response with the IP address of the URL forwarding agent 1130.
  • the client device 1110 then sends another request to the URL forwarding agent 1130.
  • the URL forwarding agent 1130 illustrated in FIG. 11 is a server that accepts an HTTP request and, based upon the hostname component of the header in the request, replies with a redirect message identifying the location at the at which the desired content is available.
  • the forwarding agent 1130 includes a data managing system 170 that is configured to operate according to the sequence shown in FIG. 9.
  • the forwarding agent 1130 receives queries having a header with domain names that are encoded with a character map that is initially-unknown to the forwarding agent.
  • This initially-undetermined character map may be a non-ASCII character map and/or include DNS-illegal characters.
  • the forwarding agent 1130 responds with URLs that are encoded in the currently-preferred DNS-legal character map.
  • other character maps may also be used for the URL in the response depending upon the most current lingua franca of the Internet or the character map that is used in the query.
  • FIG. 1300 Several hypothetical records 1300 (FIG. 1300) from at least a portion of the data database 160 (Fig. 1 ) associated with the forwarding agent 1120 are shown in Figure 13.
  • the first column of each record contains a domain name that may include a non-ASCII character and/or DNS-illegal character.
  • the domain names in the first column are preferably encoded in the same universal character map, such as the ISO-10646-1 or Unicode character map.
  • each of the domain names illustrated in the first column of this example has been registered by the same ccTLD, registrations from other (current and future) top level domains may also be provided in the database.
  • each top level domain on the Internet may operate its own URL forwarding agent or they may simply delegate the authority for one or more forwarding agents to service various subzones.
  • the second column of the records 1300 in FIG. 13 contains a corresponding DNS-legal domain name expression for each of the names in the first column.
  • the names in the second column are preferably all encoded with a standard "preferred character map," such as the seven-bit US-ASCII character map, or another character map that is supported by most hosts on the Internet.
  • a preferred character map such as the seven-bit US-ASCII character map, or another character map that is supported by most hosts on the Internet.
  • different character maps corresponding to a preferred character map for the top level domain of each name in the second column, or other groupings of domain names may also be used.
  • the forwarding agent 1130 first checks its database for a corresponding DNS-legal domain name and, if it finds one, returns a message containing information in the second column of FIG. 13.
  • the client device 1110 is redirected to the new URL by an HTTP full-response that includes a 302 status code in the status line, and the DNS-legal domain name of the redirected destination at which the information may be found.
  • the mechanics of such HTTP requests and responses are generally set forth in RFCs 1945 and 2068.
  • various HTTP responses allow client devices 1110 to be automatically redirected to a new location, or provided with a clickable link to the new address.
  • the forwarding agent 1130 may return actual content such as a static web page.
  • This web page could include information on how to register the name at issue.
  • HTTP service request other types of services, such as mail services may be implemented in a similar fashion.
  • FIG. 9 illustrates one embodiment of a URL forwarding agent system 970 for use with the URL forwarding agent 1130 (FIG. 11 ) and generally corresponding to the system 370 in Fig 3. Similar URL forwarding systems may be configured to correspond to system 470, 570, and 670 in FIGs. 4-6.
  • a conventional name server (not shown) with the appropriate wildcard resource record has directed the client 1100 to the URL forwarding agent 1130.
  • the agent 1130 may also receive queries from other clients, including direct queries from those clients.
  • the query 902 is initially assumed to contain a DNS-legal domain name expression (that is optionally downcased) and sent for lookup at step 310 using conventional technology.
  • the database for the forwarding agent system 970 includes records such as those shown in FIG. 13. If a match is found, then an HTTP redirect reply with a 302 status code is formulated and returned at step 916 to the client 1100 with the matching DNS-legal domain name from the database associated with the forwarding agent 1130. (DNS- illegal domain names may also be provided, and the process repeated.) If there is no match, then the query is next assumed to be encoded with a Unicode character map and sent to the UVCE module. If the unsuccessful lookup was attempted for a previously converted expression from LUTE at step 320, then the process returns to LUTE for another conversion and lookup until all conversions have been attempted or a match is found.
  • the validity of the assumed Unicode encoding is checked, and, if valid, the domain name portion of the queried expression is canonicalized at steps 322 and 324 and a second lookup is attempted. If a match is found in the database 12 on the second lookup attempt, then the client 1130 will receive an "HTTP redirect" reply message 916 that includes a "302" status code, and/or other appropriate codes, with the appropriate redirect message including a DNS-legal domain name expression for the conventional server 1140.
  • the domain name portion of the query is processed by the LUTE module as discussed in more detail above with regard to FIG. 3.
  • LUTE performs a conversion from a third assumed character map to Unicode and either sends the conversion for a third attempted lookup (Fig. 3) or passes the result back to UVCE (FIG. 4) for validation and canonicalization before a third lookup is attempted.
  • the validity of the third conversion may be checked without canonicalization (Fig. 5).
  • the forwarding agent 1130 (FIG. 11) then issues either an error message or a redirect reply message 930 to a predetermined location where, for example, information concerning how to register the domain name may be provided.
  • the forwarding agent 1130 may respond with a proposal for registering the name. Since the character map being used by the client is still unknown at this point, the registration proposal will ask the user to specify the character map with which for the requested domain name registration is encoded. Alternatively, or in addition to asking the user to specify a character map, the proposal may contain one or more encodings of the domain name and/or images of the domain name as it would appear using different character maps as discussed above. The user may then simply choose one of the displayed encodings and/or images for registration.
  • each character in the string to be registered may be presented as a separate image and/or with a corresponding name for the particular glyph represented by the image.
  • the user may also be presented with options (such as a drop down box or link) for changing a particular character image to one that is phonetically, textually, contextually, positionally, or otherwise related to the first character shown in the registration proposal.
  • additional registration instructions may be provided as an image and/or text which uses a language and/or character map corresponding to the character map encoding in the domain name being applied for.
  • the registration proposal that is returned by the forwarding agent 1130 will be independent of the character map decoding being used by the client 1110 making the request.
  • an image may be presented to the user instructing them how to change the settings on their web browser in order to view the registration proposal with the appropriate character map.
  • the proposal may even use a particular character map (such as Japanese sift JIS) in its response and/or provide registration information and/or instructions for changing browser settings in a particular language (such as Japanese) depending on the location of the client making the request (such as a destination domain name including the " p" country code) or any tokens or other designations in the request.
  • a particular character map such as Japanese sift JIS
  • the encoding may be obtained from a character set token or product token in the HTTP header in the original request by the client 1110.
  • the inventions described above are not limited to the mapping of host names to numeric IP addresses or other host names. They can also provide other information about internet resources that can be used with virtually all types of internetworking software including electronic mail ("e-mail"), remote terminal programs such as “Telnet,” file transfer programs such as “ftp,” and “web browsers” such as Netscape Navigator and Microsoft Internet Explorer. Consequently, the inventions described above may also be applied to WHOIS servers, mail hubs, web servers with virtual host features, WHOIS services, authentication and authorization systems, and other devices that work with host names within the bounds of the DNS, HTTP, and/or other protocols. For example, they may be used by domain registrars, corporate networks, certificate users, internet service providers, and network administrators.
  • FIGs. 15 and 16 illustrate a schematic flowchart of other embodiments of the invention which includes pattern matching attempts by a pattern resolution engine (“PRE") when the table lookups discussed above are unsuccessful.
  • PRE pattern resolution engine
  • the wildcard resource records discussed above may be are supplemented by pattern matching wildcard resource records such as:
  • FIGs. 15 and 16 may be broadly described as a system for accommodating multiple character encodings in keyed database retrieval or insertion operations, without advance knowledge of the particular character encoding used in each key. This allows the system to interoperate with a wide variety of legacy systems that themselves use various mutually incompatible character encodings.
  • the invention is immediately applicable in all circumstances in which heuristic recognition of the character encodings of various text is useful. This is particularly relevant to software systems on the global Internet, but is not constrained thereto.
  • the basic system includes four components: a database proper implementing key-value retrievals in a single distinguished character encoding (usually a universal character encoding), a key validator that determines whether a key follows permitted patterns in the distinguished character encoding, an encoding converter that transforms text from and to the distinguished character encoding (with integral validation that the input text is actually a valid instantiation of the source character encoding), and an encoding iterator that applies the conversion, validation, and database components.
  • a database proper implementing key-value retrievals in a single distinguished character encoding (usually a universal character encoding)
  • a key validator that determines whether a key follows permitted patterns in the distinguished character encoding
  • an encoding converter that transforms text from and to the distinguished character encoding (with integral validation that the input text is actually a valid instantiation of the source character encoding)
  • an encoding iterator that applies the conversion, validation, and database components.
  • Optional components of the system are: a key normalization mechanism
  • the basic system operates as follows where the numbers in parenthesis here correspond to the numerals shown in the "Intercoding Name Server Logical Flow Diagram" illustrated shown in FIGs. 15 and 16.
  • a key is received (2) and passed to the key validator (4). If the key is determined to be valid (5), a database lookup (18) is attempted, and a reply is generated, either containing the found data (17), or containing a failure message if no data was found (26).
  • the iterator's encoding pointer is initialized to the first character encoding in a prioritized list of encodings to be attempted.
  • a conversion of the key is attempted (8) from the current encoding (the character encoding currently identified by the iterator's encoding pointer). If the conversion to the distinguished character encoding succeeds (9), the resulting converted key is validated (10).
  • the technology illustrated in FIGs. 15 and 16 is subsumed within a common, freely available server (e.g. Internet Software Consortium (ISC) "named", a portion of ISC Berkeley Internet Name Domain (BIND)) for the Domain Name Service (DNS).
  • ISC Internet Software Consortium
  • BIND ISC Berkeley Internet Name Domain
  • DNS Domain Name Service
  • the server's key-value lookup table is used as the database proper.
  • an additional simple test (3) is formed to determine if the query consists entirely of valid ASCII character patterns. If so, the query never reaches the principal validator (4), but instead is looked up directly (18, 19).
  • This embodiment includes a key normalization system tightly integrated with the principal key validator (4 and 10), and normalization is necessary for all queries except those that never reach the principal validator.
  • a pattern matching mechanism (20, 21 , 22, 23, 12, 24, 25) is tightly integrated with the server's built in table lookup system.
  • the first embodiment can be subsumed within any of a variety of directory servers, including other servers for DNS, and servers for Lightweight Directory Access Protocol (LDAP) and Network Information System (NIS).
  • LDAP Lightweight Directory Access Protocol
  • NIS Network Information System
  • the inventive technology operates to adapt the operation of a recursive directory server, such as a DNS server, to a multiple encoding environment.
  • the query key sent by a client to the caching server can be in any of a variety of character encodings, but the recursive server converts the query key to a distinguished character encoding for retrieval of the requested information from elsewhere on the network (that is, for recursion).
  • the character encoding of the recursive server's response matches the character encoding used by the client in the query key.
  • the first embodiment normally, but not necessarily, accompanies the second embodiment.
  • the second embodiment operates specifically as follows.
  • a query with a particular key is received from a client.
  • the procedure from the first embodiment is optionally performed at this point, except that pattern matching (20, 21 , 22, 23, 12, 24, 25) is not performed and error response (26) is deferred until the completion of the procedure of the second embodiment.
  • the second embodiment proceeds as in the basic system, with the following qualifications. If the first embodiment is not included, each database lookup is performed by querying a directory server on a remote host, as identified by delegation data known in some fashion by the second embodiment directory server.
  • the local database in which the first embodiment lookups are performed is used by the second embodiment as follows: each query to a directory server on a remote directory server is prefaced by a lookup in the local database (which may obviate the remote query), delegation data is stored in and retrieved from the local database, and data contained in replies from remote hosts (including notification that a record does not exist) are entered into the local database.
  • a data caching system results.
  • a virtual hosting web server In a virtual hosting web server, the local lookup and retrieval operation is affected by the name by which the client addresses the server (this information is supplied by the client to the server in the request message header). This allows a single web server at a single numeric network address to take the place of many separate web servers.
  • the virtual hosting web server can (and often does) act as a Uniform Resource Locator (URL) forwarding agent, efficiently and quickly redirecting clients to other web servers based on the name by which the client addressed the server.
  • URL Uniform Resource Locator
  • the operation of embodiment [D] is as in the basic system, with the addition of the key normalization mechanism and the pattern matching mechanism.
  • a generic key-value database system is used as the database proper.
  • the inventive technology is subsumed within a WHOIS server, a server whose purpose is to provide technical and biographical information on Internet networks and domains and those responsible for them.
  • This embodiment uses the same generic key-value database system used in the fourth embodiment.
  • the fifth embodiment permits the client to explicitly specify the character encoding used in a query, and the character encoding that should be used in the reply, thereby overriding the algorithm of the basic system.
  • the inventive technology is subsumed within a conversion server - a server whose dedicated purpose is to perform character encoding validations, transformations, and categorizations.
  • the operation is as in the basic system, with the addition of a normalization mechanism and facilities that permit the use of interactive dialogue to resolve ambiguous or failed character encoding identifications, that convert text to image formats, that permit constraints on the character encodings by specifying the language of the text at issue, and that allow various other adjustments and extensions of the basic system.
  • the conversion server is itself subsumed by a complete registration system, which orchestrates the actual interactive dialogue by which ambiguous or failed character encoding identifications are positively resolved.
  • the inventive technology is subsumed within a mail server.
  • a mail server honors Simple Mail Transfer Protocol (SMTP) requests, forwarding messages to other mail servers or passing them to local handlers as dictated by the active mailer configuration (including various databases).
  • SMTP Simple Mail Transfer Protocol
  • the invention is used as in fourth embodiment.
  • Email addresses are the keys, and database lookup is resolution of the delivery address by the highly configurable address resolution subsystem.
  • the inventive technology is subsumed within the query interface of a database search facility such as a web search engine.
  • the procedure is as in the basic system, with the search expression submitted by the client acting as the key.
  • the client can explicitly specify the encoding used in the query, and the encoding desired in the reply.
  • ISO-10646-1 is a universal character encoding, equivalent to Unicode 3.0.
  • the embodiments can be readily adapted to use any other universal character encoding.
  • the universal character encoding can be constructed by combining a set of distinct character encodings, and using a tagging scheme to identify the encoding in use in distinctly encoded segments of text.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention concerne un système, un procédé et une logique permettant de gérer des données et comprenant une base de données pour mettre en oeuvre une opération de valeur de clé, telle qu'une consultation d'enregistrement de ressource DNS, avec une clé comprenant un codage prédéterminé, tel qu'un Unicode. L'invention concerne également un convertisseur itératif permettant de convertir de manière itérative la clé de chaque codage dans le codage prédéterminé avant de réaliser l'opération de valeur de clé avec chaque clé convertie. Ce système peut également comporter un validateur permettant de vérifier qu'une syntaxe de chaque clé convertie est valable et un normaliseur permettant de normaliser chaque clé convertie.
EP01937676A 2000-05-22 2001-05-22 Systeme de noms de domaine international a conversion iterative Withdrawn EP1305740A2 (fr)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US57575300A 2000-05-22 2000-05-22
US575753 2000-05-22
US27979901P 2001-03-29 2001-03-29
US279799P 2001-03-29
PCT/US2001/016706 WO2001090955A2 (fr) 2000-05-22 2001-05-22 Systeme de noms de domaine international a conversion iterative

Publications (1)

Publication Number Publication Date
EP1305740A2 true EP1305740A2 (fr) 2003-05-02

Family

ID=26959896

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01937676A Withdrawn EP1305740A2 (fr) 2000-05-22 2001-05-22 Systeme de noms de domaine international a conversion iterative

Country Status (3)

Country Link
EP (1) EP1305740A2 (fr)
AU (1) AU2001263389A1 (fr)
WO (1) WO2001090955A2 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7210125B2 (en) * 2003-07-17 2007-04-24 International Business Machines Corporation Method and system for application installation and management using an application-based naming system including aliases
US11552997B2 (en) * 2018-02-06 2023-01-10 Akamai Technologies, Inc. Secure request authentication for a threat protection service
CN114492312B (zh) * 2021-12-22 2022-09-20 深圳市小溪流科技有限公司 一种ip国家映射信息的编解码方法及系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5729689A (en) * 1995-04-25 1998-03-17 Microsoft Corporation Network naming services proxy agent
US5793381A (en) * 1995-09-13 1998-08-11 Apple Computer, Inc. Unicode converter
IL123129A (en) * 1998-01-30 2010-12-30 Aviv Refuah Www addressing
US6560596B1 (en) * 1998-08-31 2003-05-06 Multilingual Domains Llc Multiscript database system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0190955A2 *

Also Published As

Publication number Publication date
WO2001090955A3 (fr) 2003-03-06
WO2001090955A2 (fr) 2001-11-29
AU2001263389A1 (en) 2001-12-03

Similar Documents

Publication Publication Date Title
US20040044791A1 (en) Internationalized domain name system with iterative conversion
Faltstrom et al. Internationalizing domain names in applications (IDNA)
KR100751622B1 (ko) 네트워크 어드레스 서버, 도메인 명칭 분석 방법, 및 컴퓨터 판독 가능 기록 매체
US6314469B1 (en) Multi-language domain name service
JP4518529B2 (ja) ドメイン名国際化用の方法とシステム
Berners-Lee et al. RFC 3986: Uniform resource identifier (uri): Generic syntax
Berners-Lee Universal resource identifiers in WWW: a unifying syntax for the expression of names and addresses of objects on the network as used in the World-Wide web
Berners-Lee Universal resource identifiers in www
WO2002031702A1 (fr) Enregistrement et utilisation de noms de domaines multilingues
JP2002502073A (ja) Wwwアドレス指定
US20030115040A1 (en) International (multiple language/non-english) domain name and email user account ID services system
EP1234434B1 (fr) Serveur d'adresses reseau
Newton et al. Registration data access protocol (rdap) query format
Berners-Lee RFC1630: Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web
KR100706702B1 (ko) 도메인네임서버를 이용한 한글 인터넷컨텐츠주소 서비스 방법 및 시스템
EP1305740A2 (fr) Systeme de noms de domaine international a conversion iterative
KR20020081049A (ko) 단일 어드레스 스트링을 이용하여 다양한 통신 응용들을통하여 통신하기 위한 시스템 및 방법
JP2001217871A (ja) 2バイトドメインネームサーバシステム
Hollenbeck et al. RFC 9082: Registration Data Access Protocol (RDAP) Query Format
Hollenbeck et al. RFC 9083: JSON Responses for the Registration Data Access Protocol (RDAP)
Newton et al. RFC 7482: Registration Data Access Protocol (RDAP) Query Format
Costello IMC & VPNC
CA2392619C (fr) Serveur d'adresses reseau
Costello Internet Draft Patrik Faltstrom draft-ietf-idn-idna-06. txt Cisco January 7, 2002 Paul Hoffman Expires in six months IMC & VPNC
Costello Internet Draft Patrik Faltstrom draft-ietf-idn-idna-10. txt Cisco June 24, 2002 Paul Hoffman Expires in six months IMC & VPNC

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

17P Request for examination filed

Effective date: 20030903

17Q First examination report despatched

Effective date: 20051025

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20080109