CN1383517A - Method and system of intelligent information processing in network - Google Patents

Method and system of intelligent information processing in network Download PDF

Info

Publication number
CN1383517A
CN1383517A CN 01801846 CN01801846A CN1383517A CN 1383517 A CN1383517 A CN 1383517A CN 01801846 CN01801846 CN 01801846 CN 01801846 A CN01801846 A CN 01801846A CN 1383517 A CN1383517 A CN 1383517A
Authority
CN
China
Prior art keywords
internet
speech
ikepl
result
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 01801846
Other languages
Chinese (zh)
Other versions
CN100422987C (en
Inventor
周鸿祎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fly Upward Management Co Ltd
Original Assignee
Inter China Network Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inter China Network Software Co Ltd filed Critical Inter China Network Software Co Ltd
Publication of CN1383517A publication Critical patent/CN1383517A/en
Application granted granted Critical
Publication of CN100422987C publication Critical patent/CN100422987C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and system of intelligent information processing in the Internet comprises identifying whether an input is one of a URL address, English words, native language characters, and native language pronunciation notations. If the input is a regular URL, the system queries the input in a corresponding server through the Internet, and directly obtains the query result therefrom. If the input includes the native language pronunciation notations, the system parses the input against at least one phonetic spelling word list to find out corresponding Internet keyword, and then fetches a corresponding query result; and if the input includes characters of a native language, the system processes the input as a natural language input in a natural language table, and obtaining a desired Internet keyword, and fetches a corresponding query result of website URL.

Description

The method and system of Intelligent Information Processing in the network
Invention field
The present invention relates to a kind ofly in being similar to the wide area network of Internet, use natural language,, carry out the method and system of Intelligent Information Processing as Chinese.More particularly, the present invention relates to the method and system of Chinese intelligent retrieval on Internet.
Background of invention
Network is by various electronic communication link and the interconnected computer distribution type communication system of computer software agreement.WAN (wide area network) is the communication network by location distribution, and this term is with the telecommunications architecture difference and the Local Area Network of broad.Wide area network can be had or rented by the individual, but this term means usually and comprises public (sharing users) network.A kind of well-known WAN is an international information infrastructure, is commonly referred to the Internet.The Internet is the network of globalization, its e-sourcing includes, but is not limited to various forms of texts, graphic file, and the WEB webpage or the various expansion of HTML (HTML (Hypertext Markup Language)) form comprise XML, the file of various files and any binary format, and e-mail address.As in many other networks, the pattern as the sign e-sourcing on the Internet is a kind of " electronic address ", and it specifies it in network and the position in the computing machine at place uniquely.
For example, on the Internet, this electronic address is called as unified resource positioning mark or URL.It is formed by the information serial connection of a series of specific formats: visit the required protocol type information of this resource, network host domain name identifier (discerning the concrete computing machine at e-sourcing place), port numbers, resource in computer file system directory path information and the filename of resource.The similar sign pattern of the URL of the Internet and e-sourcing is very inconvenient for the user.URL length often surpasses 50 characters, and contained information was not only uninteresting but also do not have an implication concerning the information seeker.Therefore, people have done a few thing, and making more has implication to the network address retrieval of representing with URL concerning information seeker or retrieval person.This just makes searchers or retrieval person needn't remember URL accurately, and only with some speech that uses naturally or term.
United States Patent (USP) the 5th, 764, No. 906 a kind of system has been described, can provide and safeguard the another name of a weak point to information resources and supplier thereof, and these another names can be translated into useful electronic address, as URL, fax and voice telephone number and e-mail address etc., and, visit resource with these addresses.Similarly, disclosed PCT application on August 5th, 1999 WO99/39275 discloses a kind of internet navigation method based on natural language, navigates to the resource that is stored in the network and is discerned by station location marker.Some software products have entered commercial field, use natural language title access the Internet resource to help the user.
At present, existing many this services, for example, RealNames (http://www.realnames.com) is with the IP address of brief " key word " replace complex, or URLs, and it by explorer and the MSN portal website of Microsoft (Microsoft), provides this service.Microsoft also announces to comprise RealNames in its web browser software.The service of RealNames is equivalent to the key word system of America Online.This system allows AOL member can key in common phrase and removes to search the particular content channel.Similarly, Netword Agent software (http://www.netword.com) also allows the user to key in the Internet key word, rather than URL.In addition, Internet Engineering TaskForce (IETF) is developing the Internet key word standard.IETF has formed working group, is devoted to design " namespace uri analysis protocol ", or realizes the standard mode of network (Web) key word.
Yet the Internet key word software product as those products of RealNames or Netword, or combines with browser, or as the plug-in unit of browser.When new browser occurred, plug-in unit also must upgrade.
In addition, these the Internet key word software products or key search both be not suitable for also being not easy to handling the natural language that certain writes country, as the language in Asia, and particularly Chinese, Japanese and Korean, or any other hieroglyphic language.Each character may not have definite implication, and, during with one or more other character combination, have multiple implication.Therefore, use common key search technology, can not be fast and obtain the result for retrieval of this desirable electronic address exactly.
Therefore, an object of the present invention is to provide a kind of natural language of using,, come the method for process information retrieval as Chinese.
Another object of the present invention provides a kind of natural language of using, as Chinese, and the system of process information retrieval.
Further aim of the present invention provides a kind of method and system of the Internet Chinese intelligent retrieval based on Chinese or Chinese pinyin (pronunciation of word).
Further aim of the present invention provides the method and system of a kind of the Internet Chinese intelligent retrieval, even import southern sound phonetic, also can obtain correct result automatically.
Summary of the invention
According to the present invention, the method and system of the Internet intelligent retrieval comprises whether the identification input is URL address, natural language character, or the phonetic phonetic symbol of natural language.If that input is common URL, text input is just inquired about in name server, and Query Result is sent back to browser.If input comprises the natural language literal, input just is treated to the natural language input.It is the long-range or local engine that intelligent retrieval is carried out on the basis that retrieval and inquisition is sent to the natural language literal.Result for retrieval is sent back to browser, indicates desired URL or network address.
If input is confirmed as the natural language diacritic, i.e. phonetic spelling will determine further then whether input is complete phonetic symbol (phonetic spelling) or the abbreviation of phonetic prefix.If input is complete phonetic symbol (spelling) inquiry, this inquiry phonetic retrieval list processing (LISP), obtaining desirable URL or network address, and the result is sent back to browser for you to choose.Otherwise this inquiry can be handled according to natural language literal phonetic prefix abbreviation key, and the Query Result of URL or network address is sent back to browser, for you to choose.
Intelligent retrieval of the present invention also comprises determines whether inquiry accurately mates with certain website, network address or webpage.If, just possible result for retrieval is not tabulated and offers the user, for its selection with the accurate coupling of website or webpage.
The Chinese character input is difficult for many users.Yet if viewer's computing machine is equipped with input in Chinese software, Chinese character can be used as retrieval and inquisition and is transfused to.This just can start Chinese intelligent retrieval.For more selection is provided to the user, in specific embodiments more of the present invention, intelligent information handling system and method can be accepted " phonetic ", that is, diacritic, or " phonetic " prefix, that is, the acronym of the word pronunciation that inquire about is so that obtain possible result for retrieval table.
This system and method can also be handled the telephone number input, and can obtain and the corresponding related web site of registration telephone number.If input name (Chinese or English) can be from telecommunication network business card server, as the server that is provided by http://www.letscard.com, perhaps other any similar server obtains this people's the online business card.In other corresponding patented claim that is included in the applicant of these aspects of the present invention.
Brief description of drawings
Appended accompanying drawing illustrates specific embodiments of the present invention, and, by following detailed description and accompanying drawing, can better understand the present invention.
Fig. 1 illustrates the example of the network computer system that can be used for carrying out specific embodiments of the present invention;
Fig. 2 illustrates a specific embodiments of the present invention;
Fig. 3 illustrates the processing procedure of control browser URL input window;
Fig. 4 illustrates the visit that has Chinese natural language and the browser screen sectional drawing of navigation Service;
Fig. 5 A, 5B and 5C illustrate three basic structures of Intelligent Information Processing in the wide area network of the present invention;
Fig. 6 illustrates the processing procedure of Chinese natural language processing;
Fig. 7 illustrates another processing procedure of Chinese natural language processing;
Fig. 8 illustrates the method for Chinese character of the present invention and/or english processing;
Fig. 9 illustrates the method that Chinese phonetic alphabet spelling speech of the present invention is handled;
Figure 10 illustrates the method that Chinese spelling abb. of the present invention is handled;
Figure 11 illustrates the present invention before information processing, determines the processing procedure of inquiry input part of speech;
Figure 12 A and 12B illustrate the search method of phonetic spelling homonym of the present invention and the search method of the wrong phonetic spelling speech of piecing together of dialect respectively.
Detailed description of the invention
It is the same that any those of ordinary skill in like this area is enough recognized, the present invention can comprise a kind of method, data handling system or program product.Can be existed in some computer-readable carrier according to the software that the present invention write, as storer, or CD ROM, or transmit on the net, and carried out by processor.Yet cardinal principle of the present invention can be described in the network intelligence information processing method or network intelligence information handling system of the following stated.
Fig. 1 represents a system of the present invention.Subscriber computer/computing machine 101 connects 108,109 by the Internet, is connected to the webserver 102 and internet resource positioning mark server, as the server 103 and 104 of http://www.3721.com.Subscriber computer 101 can be the computing machine of any kind of of operation Microsoft Windows (Microsoft's Window) operating system, comprise PC, macintosh computer, and internet equipment are as WebTV (Web TV) and wireless Internet browsing apparatus.Subscriber computer 101 can be by pulling out into modulator-demodular unit, the DSL line, and wire line MODEM, industrial siding, as T1 or T3, or optical fiber connects, and is connected to the Internet.Obviously, those of ordinary skills know, the present invention does not limit the concrete form that is connected between the particular type of subscriber computer or subscriber computer and the Internet.Internet resource locator server 103 and 104 comprises browser model database 105, URL pattern 106 and other pattern 107.
Fig. 2 represents subscriber computer 203, connects 202 by the Internet, is connected to internet resource positioning mark server 201, as 3721 servers or contain other server of server software of the present invention.The browser screen image is just carried out in subscriber computer 203.A little user end computer software is also just being carried out (seeing the little figure of bottom of screen) in subscriber computer 203.Little user end computer software is from address box intercepting text message (msg) input of browser.This information or be sent to internet resource positioning mark server 201 for processing is perhaps carried out this locality by little user side software and is handled
Fig. 3 illustrates the processing procedure of user side running software of the present invention.User side software uses win32 Hook Technique (Win32 hook technology) to inject all operation processes.Hook is a point in the Microsoft windows messaging treatment mechanism, and at this point, application program can be installed a subroutine or independent module, with the message in surveillance contact with handle the message of some type.The hook program can be overall, the message in all threads of surveillance, and perhaps it also can monitor the message of single thread specific to thread.Some hook can only be set at system scope (as, WH_SYSMSGFILTER), but the action scope of most of hooks can have system or particular thread scope.Can find technical information at microsoft rs web_site (http://www.microsoft.com) about the Win32 hook.
Whether check the process of all operations, be the target that needs intercepting and capturing to determine it.If it is a target, the information of relevant process just is used to search the edit control that the user imports the browser of URL.This information can be used for retrieving the browser model storehouse, with the version of the browser determining to move in the subscriber computer.This database can upgrade automatically.
In case find edit control, just generate a subclass.The message of this editor can be the selection or the keyboard input of combo box and drop-down list.If it is the keyboard input, just checks and determine whether it is the URL address.Still in the mode of rule storehouse of a URL, retrieve to determine whether it is a URL.If it is the selection of combo box or drop-down list, just by processing shown in Figure 3.
Fig. 4 illustrates the browser of Chinese edition and the image of user side software interactive of the present invention.The user with input in Chinese word " computing machine ", just produces the Chinese address table relevant with this word in the address box of browser.
Yet today, the retrieval of website not only can be undertaken by the URL or the keyword of English, and carried out with other kind natural language, as Chinese.This just needs some can use the sort of natural language, carries out the disposal route or the system of this network information retrieval effectively and accurately.
Be appreciated that retrieval undertaken by database usually, this database comprises specially designed key, thus convenient various retrieval tasks.For the internet retrieval of Chinese information, no exception.As the purpose of retrieval of the present invention, internet resource positioning mark server should comprise the search index table of Chinese character at least, the key of phonetic spelling (phonetic) search index table and phonetic transcriptions of Chinese characters letter abbreviations (phonetic prefix).
Usually, when the input keyword query, the keyword phrase of input just is broken down into several significant words, and it is mated with the key of establishing is in advance arranged.Then, the result for retrieval of each word combines consideration, to determine net result or Query Result.Yet for some natural language, as Chinese, the inquiry of being imported may be a Chinese character.Each character has or may not have definite connotation, and character and the combination of other character can produce the Chinese word of different connotations.Therefore, the simple decomposition of Chinese character string can not guarantee the accuracy of Query Result.Therefore, the present invention can be with phrase or the query word that the user imported, resolve into might be combined into the significant Chinese word that comes.
For example, first word just simply with second word and/or the 3rd word combination of back, obtain a significant speech, in addition, also can form other any significant speech with each word of back.In the present invention, first word can make up with any word of input, forms all possible significant speech and is used for inquiry.Therefore, when whole results all come from might be combined into significant speech the time, it is correct that the Query Result of acquisition can guarantee to inquire about.
Inquiry input to Chinese website might be Chinese character input, URL input and phonetic input, comprises the input of phonetic spelling, the abbreviation of phonetic prefix, the phonetic input of input of phonetically similar word phonetic and southern sound.Before the details of the method and system that enters relevant above-mentioned each input of the present invention, existing once input in Chinese technology is discussed is helped to understand better the present invention.
The main coded system of Chinese is: Big5 and GB (that is national standard).Big5 generally is used to handle the complex form of Chinese characters, and GB generally is used for simplified Chinese character.In the current Big coded system in Hong Kong and Taiwan, " my god " binary coding be 1101000110100100." my god " GB be 1110110011001100.Please note above-mentioned " my god " the Big5 sign indicating number or GB all with 1 the beginning, and the letter " A " ASCII character be with 0 the beginning.This example speaks, that is, all Chinese sign indicating numbers are all with 1 beginning, and all ASCII character are all with 0 beginning.In this sense, whether be English still Chinese in system if can detect given byte in the file that comprises the Chinese and English text.
Computing machine input and processing Chinese text are very problems of difficulty.The quantity of Chinese character has illustrated this point.In Chinese characters (Chinese character) writing system of Chinese, normally used Chinese character has 3000 to 6000.If comprise less relatively use, more than 10,000 Chinese character just arranged.Except that this difficulty, also have the standardization of Chinese version, a plurality of phonetically similar words, the separatrix problems of rarely used word etc. all hinder computing machine effectively to handle Chinese text.Although carried out a large amount of research decades, exist hundreds of diverse ways, computing machine input in Chinese and processing are still and hinder computing machine to use in China, particularly text-processing major obstacle.
At present, the computer system that can be used for importing and handle Chinese language text can be divided into three kinds.First kind is based on Chinese character is resolved into primary graphic element.It is not unique that the Chinese character of every kind of method decomposes.Therefore, learn quite difficulty of these methods.
Second kind is based on pronunciation with the third, as phonetic spelling method.These methods can run into " phonetically similar word problem " in the Chinese processing.Second kind be phonetic entry (as, be used for " phonetic " of China's Mainland and be used for " phonetic notation " in Taiwan or BPMF), it is method in common to except that professional typist everyone.The Chinese character writing system is that this method is at conceptive and actual obstacle.
Although, for thousands of word, 1300 the different speech syllables of only having an appointment, however a speech syllable can be equivalent to many different Chinese characters.For example, the pronunciation of " yi " can be equivalent to more than 100 Chinese character in the mandarin.When this is translated into corresponding Chinese character at the speech syllable with input, produce uncertain.
Relate to this " phonetically similar word problem ", most of voice entry systems use the multiselect method.For example, No. the 3rd, 142,138, the Deutsche Bundespatent in 5 days Mays in 1938 of J.Heinzl etc., No. the 1064957th, the Chinese patent application in the 8 days March in 1991 of No. the 5th, 047,932, United States Patent (USP) in 10 days September in 1991 of K.C.Hsieh and TanShanguang.After keying in speech syllable, computing machine demonstrates all possible word of same pronunciation.In some cases, there are not enough spaces to remove to show all possible word of same pronunciation on the screen.This can need scroll-up/down.Therefore, these speech method based on single syllable are very slow.
The improvement to this multiselect method based on the probability (possibility) that obtains adjacent Chinese characters is disclosed in, in No. the 2nd, 248,328, the UK Patent Application in the 1 day April in 1992 of R.W.Sproat.Probability (possibility) method can further combine with syntax rule.For example, the Chinese in 1992 of K.T.Lua etc. and the Computer Processing of oriental language, Vol.6, Num.1,85 pages.Yet the accuracy (voice are to word) of these method conversions generally can only reach about 80%.
The third method is combined with voice one characters input method and other non-voice letter.The non-voice letter is added on the phonetic letter word of artificially difference same pronunciation.Example comprises the phonetic (No. the 2nd, 158,776, the BrP in the 20 days November in 1985 of C.C.Chen) of band radicals by which characters are arranged in traditional Chinese dictionaries mark and the phonetic (No. the 1066518th, the Chinese patent application in the 25 days November in 1992 of G.Xie) of band stroke number.These methods need be remembered the rule of formulating or calculate stroke number that reality has reduced input speed.
Also have other Chinese character input method, for example, United States Patent (USP) the 6th, 073, No. 146 are disclosed.' 146 patent disclosures a kind of system, use the keyboard of the other symbolic key of zone (with corresponding ASCII character), make the user can be with the syllable of the speech text of each input of distinctive signs note of representing syllable tone.In this method of carrying out in the system is the syllable that has been transfused to when determining at distinctive signs (or defining symbol) keystroke.Subsequently, the syllable of all inputs and one can received speech syllable and the abbreviation epiphase relatively.If the syllable of input is on this table, the syllable of then correct spelling and accent just is stored in the storer, and is displayed on the phonological component that image shows.Follow-up syllable is continued to handle, define symbol up to input.Define symbol in case run into, just use morphology and comprehensive processing and/or the statistical language pattern character string (being defined as two word strings that define between the symbol) of coming analysing word clearly to determine the suitable Chinese character in the character string of representing this speech.This unique Chinese translation just is stored in the storer, and is displayed on the Chinese character part of graphic interface.
Among the present invention, be used for search index data structure such as Fig. 5 A of the Internet keyword query, shown in Fig. 5 B and Fig. 5 C.The present invention has the search index table of three kinds of structure proximates.For realizing the high-speed intelligent retrieval of the Internet key word, it is very important setting up the efficient data structure that is fit to the retrieval large-scale data.Three kinds of data structures of the present invention are concordance lists that (1) is used to discern the intelligent retrieval of the speech of common Chinese character and English word or phrase; (2) Chinese phonetic alphabet spelling intelligent retrieval concordance list; (3) Chinese phonetic alphabet abbreviation intelligent retrieval concordance list.
Referring to Fig. 5 A, concordance list is Chinese and English vocabulary, comprises all Sino-British clictions, for example " China ", " software ", " computer ", " ibm " etc.In Chinese or English table, each speech all is connected to the Internet key word node tabulation.Each node in this table is represented certain pointer, points to the physical memory space of the Internet keyword that comprises this word.Therefore, it can retrieve all the Internet keywords that comprise this Chinese or English word from being linked to key word entrance, the Internet tabulation of each speech.
Referring to Fig. 5 B, data structure is similar to Fig. 5 A's.Just the left side Chinese word is phonetic form, i.e. Chinese phonetic spelling.For example, the Chinese of last predicate be now " zhongguo ", " ruanijan ", " diannao ", etc.Key word entrance, the Internet tabulation of link is the tabulation that comprises the Internet key word of this speech Chinese phonetic alphabet form.
Fig. 5 C has the data structure similar to Fig. 5 A.Difference is that in the vocabulary of left side, each speech all is forms of Chinese Pin Yin initial abbreviation, as " zg ", " rj ", " dn " etc.Like this, relevant key word entrance, the Internet tabulation comprises that this speech is corresponding with the phonetic alphabet abbreviation of these inquiries.By this three figure as can be known, three kinds of basic intelligent search methods have similar data structure, and still, speech is with China and British cliction, phonetic spelling (phonetic), or the multi-form storage of phonetic alphabet abbreviations (Chinese phonetic alphabet prefix).Therefore, the internal algorithm that is appreciated that these three kinds of retrievals is identical.Key is that these speech are how to divide into groups or selection, have the term of connotation with composition in inquiry.As mentioned above, query string be broken down into the significant speech that might be combined out, guaranteeing that each possible term points to the Internet key word in the tabulation, and guarantee that how inquiry is judged as is Chinese character input or english input, input of phonetic spelling or phonetic prefix abbreviation input.Correlation technique of the present invention below is discussed.
Although developed simpler method, the Chinese character input remains the very work of difficulty.Particularly when internet apparatus is hand-held device, as personal digital assistant, the perhaps mobile phone that is connected with internet wireless.One aspect of the present invention provides a kind of method of simplifying Chinese characters and importing.The present invention is specially adapted to import network address, perhaps natural language keyword or website (webpage) name.Fig. 6 expresses a specific embodiments of the present invention.In the method, the user keys in the prefix of Chinese word phonetic spelling, shown in 501.The phonetic prefix is used to Query Database, and a possible URL table as a result of is listed, shown in 502.This table can be based on statistical information, as according to the frequency of inquiry the most frequently used URL at first being listed, shown in 503.
Fig. 7 expresses another specific embodiments of the present invention, 601, and the phonetic spelling of input Chinese word.602, check this spelling, to determine whether it is common misspellings.What common mistake was pieced together is because the reason of accent.At southern china, many southerners are because southern accent causes Chinese phonetic alphabet mistake.If because wrong the assembly appears in southern accent, 605, system of the present invention can be automatically with its correction.If query string does not have mistake to piece together, or be repaired wrong the assembly, then 603, the url database that retrieval is relevant.604, show its output.
A little user side software draws the support of pulling with database by the intelligent retrieval of rear end, can be used as the example of specific embodiments of the present invention.This software can be downloaded from http://www.3721.com.The user needn't know or key in long and complicated url string, the substitute is simply and keys in the brand of being familiar with, the Chinese character of name of product at the network address frame, just it can be taken to its desirable targeted sites or related web page.For example, " Legend computer " that the user can key in Chinese simply will find the website that will visit, and need not key in http://www.legend.com.cn.
Now, following principal feature of the present invention, Fig. 8 expresses the basic flow sheet of Chinese of the present invention and/or english retrieval.801, behind the inquiry string A of input Chinese and/or english form, 802, system just contrasts Chinese and English vocabulary (CEWL) analysis and consult character string A, and, inquiry string A is resolved into one or more Chinese words: W=(W 1, W 2, W 3..., W n).803, to each the speech W among the W x, system is term W in the CEWL table x, to find its attached key word entrance, the Internet table (IKEPL x), IKEPL xEach node in the table can point to one and comprise speech W xThe Internet key word (IK).
804, system is with all IKEPL 1, IKEPL 2..., IKEPL nCombine, obtain R as a result, that is, and R=IKEPL 1, U IKEPL 2, U..., IKEPL nBecause IKEPL xIn each node all point to and comprise speech W xIK, then each IK of R comprises a speech among the W at least.805, in the time of merging, system calculates its weight by ad hoc rules to each IK among the R, and the example of rule is as follows:
(1) weight counted in speech: the number of the speech in W that IK is contained
(2) length overall of the contained speech in W of length weight: IK
At last, on the basis of above-mentioned rule, the comprehensive weight of each IK of system-computed.After the calculating, 806, the weight of IK is pressed by system, with the classification of R as a result, so, recently like the result appear at gauge outfit, and system can limit the quantity of result among the R.Then, 807, final IK table R appears.
Similarly, referring to Fig. 9,901, the inquiry string A of input is the form of phonetic spelling.902, after character string A input, system's contrast Chinese phonetic alphabet spelling vocabulary (FCPWL) is analyzed character string A, and is broken down into one or more Chinese phonetic alphabet speech: W={W 1, W 2, W 3..., W n.903, for each the speech W among the W x, system is retrieved in FCPWL, to find its attached keyword entrance, the Internet Table I KEPL x, IKEPL xEach node in the table points to its phonetic and comprises W xThe Internet keyword (IK).Subsequently, 904, system merges IKEPL 1, IKEPL 2..., IKEPL n, to obtain R=IKEPL as a result 1, U IKEPL 2, U..., IKEPL nLike this, the phonetic of each IK among the R all comprises a speech among the W at least.Following steps 906-907 is very identical with the step of 805-807,, presses the weight that ad hoc rules calculates each IK among the R that is; The weight of pressing IK will be shown result's classification of R, so that result like recently is placed on gauge outfit, and, result's quantity among the restriction R, thus the table R of IK as a result finally obtained.
Similarly, referring to Figure 10,11, the user will import Chinese phonetic alphabet abbreviated character string A.12, system's contrast Chinese phonetic alphabet abbreviation vocabulary (ACPWL) is analyzed character string A, and, character string A is resolved into one or more Chinese phonetic alphabet abb.s: W={W 1, W 2, W 3..., W n.Then, 13, to each the speech W among the W x, this speech is retrieved by system in ACPWL, to find its attached keyword entrance, the Internet Table I KEPL x, IKEPL xEach node in the table points to its Pinyin abbreviation and comprises speech W xThe Internet keyword (IK).Subsequently, 14, system merges IKEPL 1, IKEPL 2..., IKEPL n, to obtain R=IKEPL as a result 1, U IKEPL 2, U..., IKEPL n, then the Pinyin abbreviation of each IK all comprises a speech in the W at least among the R.Those steps among following steps 15-17 and Fig. 8 and Fig. 9 are basic identical,, press the weight that ad hoc rules calculates each IK among the R that is; The weight of pressing IK will be shown result's classification of R, so that result like recently is placed on the gauge outfit place, and, result's quantity among the restriction R, thus the table R of IK as a result finally obtained.
China and British cliction, Chinese phonetic alphabet spelling speech, with Chinese phonetic alphabet abb., on the basis of these three kinds of intelligent retrieval patterns, the present invention will judge about the method and system of Intelligent Information Processing in the wide area network whether the input inquiry character string is China and British cliction, Chinese phonetic alphabet spelling speech, still is Chinese phonetic alphabet abb., as shown in figure 11.Behind 110 input of character string A, 111, system judges whether the inquiry string A of input is the form of Chinese phonetic alphabet spelling speech.If system is just calculated by the intelligent search method of phonetic spelling, as shown in Figure 9.
If character string A is not a Chinese phonetic alphabet spelling speech, 112, system judges whether the inquiry string A of input is the form of Chinese phonetic alphabet abb..If system is just calculated by the intelligent search method of Chinese phonetic alphabet abb., as shown in figure 10.If character string A is not, the inquiry string A that therefore system just judges input is the form of China and British cliction, and, carry out the calculating identical with calculating shown in Figure 8.Yet, a kind of situation is arranged, system judges 113 whether the result of calculation of Chinese phonetic alphabet spelling word and search or the retrieval of Chinese phonetic alphabet abb. is blank.If the result is blank, system will carry out the calculating of Chinese and English word and search once more, as shown in Figure 8.If the calculating of the search modes of Fig. 9 or Figure 10 is not blank, then its result of calculation just is judged as net result.
Figure 12 A has represented the phonetic spelling search modes of homonym of the present invention.121, behind the input inquiry character string A, 122, systematic analysis obtains all possible homonym combination, as searchable spelling speech.123, for each spelling homonym, system carries out Chinese phonetic alphabet spelling word and search to be calculated, as shown in Figure 9.Obtaining all result for retrieval R NAfter, 124, system is with analysis result R N, and obtain the most probable result of final sum, or restriction result's quantity.
Figure 12 B illustrates and has the wrong phonetic spelling search modes of correcting function of piecing together of dialect among the present invention.For further expanding the method and system of Fig. 7,125, behind the input spelling speech character string A, 126, system of the present invention will contrast listed consonant or the vowel that may misspell because of southern accent in the table, analyze the speech of input, as " huang " and " wang ", " shi " and " si " " lu " and " l ", etc.In a word, this tabular lifted the speech that might misspell.Therefore, the inquiry string of input is split up into several pinyin word, comprises all possible pinyin word, then, 127, calculates by the method for phonetic spelling retrieval, to obtain all possible IK as a result.Subsequently,, analyze result for retrieval, to obtain the most probable result of final sum 128.
Be appreciated that above narration only is explanation rather than restriction.For the those of ordinary skills that read above-mentioned explanation, many variations of the present invention are conspicuous.Therefore, scope of the present invention not only should be determined in conjunction with above explanation, but also should be determined in conjunction with variation and equivalent.Although the present invention narrates with specific embodiments; But be appreciated that this does not have plan and limits the present invention to these specific embodiments.On the contrary, this invention is intended to cover may be at the variation in connotation of the present invention and the scope, modification and equivalent.

Claims (19)

1. the Internet intelligent information processing method comprises step:
A) whether the identification input is the URL address, English word, and the native language literal,
Or native language diacritic;
B) if that imported is common URL, just by the Internet at corresponding clothes
Inquiry input in the affair device, and, directly obtain Query Result from it;
C) if described input comprises the native language diacritic, just with described input at
At least one phonetic vocabulary is searched corresponding the Internet keyword, and, directly
Therefrom obtain Query Result; With
D) if described input comprises the native language literal, then with described input as the nature
The language input is handled in the natural language table, and obtains desirable because of the spy
The gateway keyword, and obtain corresponding website URL Query Result.
2. method as claimed in claim 1, it is characterized in that further comprising and judge that described note is the speech of phonetic full form, or the speech of phonetic prefix form, if described input is a phonetic spelling speech character string, just described input of character string is resolved in containing all possible Chinese phonetic alphabet spelling vocabulary that the implication word combination arranged.
3. method as claimed in claim 1, it is characterized in that after importing described inquiry string with the form of phonetic spelling, described system's contrast Chinese phonetic alphabet spelling vocabulary (FCPWL) is analyzed described character string, and described character string is resolved into one or more Chinese phonetic alphabet speech, i.e. W={W 1, W 2, W 3..., W n; For each the speech W among the W x, described system is the retrieval and inquisition character string in FCPWL, to find its attached keyword entrance, the Internet Table I KEPL x, IKEPL xEach node in the table points to its phonetic and comprises W xThe Internet keyword, subsequently, described system merges IKEPL 1, IKEPL 2..., IKEPL n, to obtain R=IKEPL as a result 1, UIKEPL 2, U..., IKEPL nEach the Internet keyword among the R, its phonetic comprise a speech among the W at least.
4. method as claimed in claim 3 is characterized in that the weight that ad hoc rules calculates each the Internet keyword among the R is further pressed by described system after attached the Internet keyword merges; Total length weight of speech among the contained W of weight and the Internet keyword counted in the speech that comprises speech number among the contained W of the Internet keyword; Then, the weight classification that the described R of table is as a result pressed the Internet keyword so that immediate result is appeared at the gauge outfit of described table, then is the quantity of result among the restriction R, thus the Internet antistop list R of acquisition net result.
5. method as claimed in claim 1 is characterized in that further comprising and judges that described diacritic is a phonetic spelling speech, or phonetic prefix abb.; If described input is a phonetic prefix abb. character string, just in containing all Chinese phonetic alphabet that has implication contamination abbreviation vocabularys, resolve described input of character string.
6. method as claimed in claim 5 is characterized in that after judging that described inquiry input is Chinese phonetic alphabet abb., the described contrast ACPWL of system analyzes described inquiry input, and one or more Chinese phonetic alphabet abb.s, i.e. W={W are resolved in described inquiry input 1, W 2, W 3..., W n; For each the speech W among the W x, described system resolves institute's predicate in vocabulary (ACPWL) abridged in the Chinese phonetic alphabet, to find its attached keyword entrance, the Internet Table I KEPL x, IKEPL xEach node in the table points to its Pinyin abbreviation speech and comprises the predicate W of institute xThe Internet keyword; Then, described system merges IKEPL 1, IKEPL 2..., IKEPL n, to obtain R=IKEPL as a result 1, U IKEPL 2, U..., IKEPL nSubsequently, each the Internet keyword among the R will comprise a Pinyin abbreviation speech among the W at least.
7. method as claimed in claim 6 is characterized in that the weight of each the Internet keyword among the R is further calculated by described system by ad hoc rules after described attached the Internet keyword merges; Total length weight of speech among the contained W of weight and the Internet keyword counted in the speech that comprises speech number among the contained W of the Internet keyword; Subsequently, the weight classification that the described R of table is as a result pressed the Internet keyword so that immediate result is appeared at the gauge outfit of described table, then is the quantity of result among the restriction R, thus the Internet antistop list R of acquisition net result.
8. method as claimed in claim 1 is characterized in that described natural language table is Chinese and English vocabulary, and like this, all have the implication contamination and resolve described input by described input, to find attached the Internet keyword.
9. method as claimed in claim 8 is characterized in that after the described Chinese and English vocabulary of contrast (CEWL) is analyzed described inquiry input one or more Chinese words, i.e. W={W being resolved in described inquiry input 1, W 2, W 3..., W n; For each the speech W among the W x, the retrieval predicate W of institute in CEWL x, to find its attached keyword entrance, the Internet Table I KEPL x, subsequently, at IKEPL xIn each node point to and comprise the predicate W of institute xThe Internet keyword.
10. method as claimed in claim 9 is characterized in that described system merges all IkEPL 1, IKEPL 2..., IKEPL n, and, obtain R as a result, that is, and R=IKEPL 1, UIKEPL 2, U..., IKEPL nThus, each IKEPL xNode point to and to include a speech W at least xThe Internet keyword; Merge the described result who obtains, and press the weight that ad hoc rules calculates each the Internet keyword among the R; Comprise:
(1) weight counted in the speech of speech number among the contained W of the Internet keyword;
(2) total length weight of speech among the contained W of the Internet keyword.
11. method as claim 10, it is characterized in that described system is by above-mentioned described rule, calculate the comprehensive weight of each the Internet keyword, and after described calculating, described system classifies the described R of table as a result by the weight of described the Internet keyword, so that immediate result is placed on the gauge outfit of described table as a result, described system will limit the quantity of result among the R, with the described the Internet of final acquisition antistop list.
12. an intelligent information processing method that is used for the phonetic homonym is characterized in that comprising the following steps: analyzing all possible homonym after pinyin word inquiry string input, but and all these speech are regarded as the term of Chinese phonetic alphabet spelling; For the homonym of each Chinese phonetic alphabet, press Chinese phonetic alphabet spelling vocabulary, carry out Chinese phonetic alphabet spelling word and search and calculate; All result for retrieval that draw are merged, analyze described result, thereby obtain final and most probable result.
13. as the method for claim 12, it is characterized in that the carrying out of the described calculating of Chinese phonetic alphabet spelling, be by analyzing described inquiry string, and described character string is resolved into one or more Chinese phonetic alphabet speech, i.e. W={W by Chinese phonetic alphabet spelling vocabulary (FCPWL) 1, W 2, W 3..., W n; For each the speech W among the W x, described system will be in FCPWL the retrieval and inquisition character string, to find its attached keyword entrance, the Internet Table I KEPL x, IKEPL then xIn each node point to its phonetic and comprise W xThe Internet keyword; Subsequently, described system merges IKEPL 1, IKEPL 2..., IKEPL n, to obtain R=IKEPL as a result 1, U IKEPL 2, U..., IKEPL nThe phonetic of each the Internet keyword among the R comprises a pinyin word among the W at least.
14. method as claim 13, after attached the Internet keyword merges, the weight that ad hoc rules calculates each the Internet keyword among the R is further pressed by described system, total length weight of speech among the contained W of weight and the Internet keyword counted in the speech that comprises speech number among the contained W of the Internet keyword; Subsequently, the weight classification that the described R of table is as a result pressed the Internet keyword so that immediate result is placed on the gauge outfit of described table, then is the quantity of result among the restriction R, thus the Internet antistop list R of acquisition net result.
15. intelligent information processing method that is used for the phonetic spelling misspelled because of southern sound, it is characterized in that comprising the following steps: after the input of pinyin word inquiry string, the consonant that might be misspelled by the institute of southerner misspelling note or the vocabulary of the corresponding Chinese character of vowel are analyzed institute's predicate of importing; The speech of all misspellings that list in the exhaustive list; Described inquiry string is decomposed into several pinyin word, to comprise all possible pinyin word; Carry out the calculating of phonetic spelling word and search, to obtain all possible the Internet keyword of possible result for retrieval; Analyze described result for retrieval, thereby obtain final and most probable result.
16. method as claim 15, it is characterized in that after the phonetic spelling of determining described inquiry is correct, described system's contrast Chinese phonetic alphabet spelling vocabulary (FCPWL) is resolved described inquiry string, and described inquiry string is divided into one or more Chinese phonetic alphabet speech, i.e. W={W 1, W 2, W 3..., W n; For each the speech W among the W x, described inquiry input is retrieved by FCPWL by described system, to find its attached keyword entrance, the Internet Table I KEPL x, IKEPL xIn each node point to its phonetic and comprise W xThe Internet keyword; Subsequently, described system merges IKEPL 1, IKEPL 2..., IKEPL n, to obtain R=IKEPL as a result 1, UIKEPL 2, U..., IKEPL nThe phonetic of each the Internet keyword among the R comprises a pinyin word among the W at least.
17., it is characterized in that the weight that ad hoc rules calculates each the Internet keyword among the R is further pressed by described system after attached the Internet keyword merges as the method for claim 16; Total length weight of speech among the contained W of weight and the Internet keyword counted in the speech that comprises speech number among the contained W of the Internet keyword; Subsequently, the weight classification that the described R of table is as a result pressed the Internet keyword so that immediate result is presented in the gauge outfit of described table, then is the quantity of result among the restriction R, thus the Internet antistop list R of acquisition net result.
18. a Internet intelligent information handling system is characterized in that comprising:
A device that is used to import the inquiry string of speech;
One is used for whether identification input speech is the URL address, english, native language literal, or the device of vulgar tongue phonemic notation;
One is used for by the Internet in the described input of corresponding server inquiry, if when described input is common URL, directly therefrom obtains the device of described Query Result;
One is used to contrast the described input of at least one pinyin word table analysis, if when described input comprises described vulgar tongue phonemic notation, finds out corresponding the Internet keyword, obtains the device of corresponding Query Result subsequently; With
One is used for described input is handled at the natural language table as natural language input, and when described input comprises the native language literal, obtains desirable the Internet keyword, obtains corresponding URL query site result's device.
19. system as claim 18, it is characterized in that further comprising that one is used to check whether the Chinese phonetic alphabet speech of described inquiry input has owing to the device that common mistake is pieced together appears in southern sound, with a device that is used for correcting automatically the speech of described misspelling, wherein correct at definite described input Pinyin, and after any misspelling speech is repaired, carry out the retrieval of related urls by a data base querying device.
CNB018018467A 2000-06-28 2001-06-28 Method and system of intelligent information processing in network Expired - Lifetime CN100422987C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US21481200P 2000-06-28 2000-06-28
US60/214,812 2000-06-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CNA031537081A Division CN1496062A (en) 2000-06-28 2001-06-28 Intelligent information processing method in network and its system

Publications (2)

Publication Number Publication Date
CN1383517A true CN1383517A (en) 2002-12-04
CN100422987C CN100422987C (en) 2008-10-01

Family

ID=22800508

Family Applications (2)

Application Number Title Priority Date Filing Date
CNB018018467A Expired - Lifetime CN100422987C (en) 2000-06-28 2001-06-28 Method and system of intelligent information processing in network
CNA031537081A Pending CN1496062A (en) 2000-06-28 2001-06-28 Intelligent information processing method in network and its system

Family Applications After (1)

Application Number Title Priority Date Filing Date
CNA031537081A Pending CN1496062A (en) 2000-06-28 2001-06-28 Intelligent information processing method in network and its system

Country Status (3)

Country Link
JP (2) JP3871644B2 (en)
CN (2) CN100422987C (en)
AU (1) AU2001291598A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100399329C (en) * 2005-01-19 2008-07-02 结信网络技术服务(上海)有限公司 Intelligent movable guiding engine systems
US8271265B2 (en) 2006-08-25 2012-09-18 Nhn Corporation Method for searching for chinese character using tone mark and system for executing the method
CN101615180B (en) * 2008-06-27 2012-10-31 国际商业机器公司 Method and device for identifying Pinyin
CN103853777A (en) * 2012-12-04 2014-06-11 腾讯科技(深圳)有限公司 Method and device for accessing websites through keywords
WO2017124294A1 (en) * 2016-01-19 2017-07-27 王晓光 Conference recording method and system for network video conference
CN109388537A (en) * 2018-08-31 2019-02-26 阿里巴巴集团控股有限公司 Operation information tracking, device and computer readable storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8745051B2 (en) * 2008-07-03 2014-06-03 Google Inc. Resource locator suggestions from input character sequence
CN101661480B (en) * 2008-08-29 2012-08-08 国际商业机器公司 Method and system for ensuring name of organization in different languages
CN113312926A (en) * 2021-06-07 2021-08-27 浙江贰贰网络有限公司 Domain name meaning translation method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6151624A (en) * 1998-02-03 2000-11-21 Realnames Corporation Navigating network resources based on metadata
WO1999040517A1 (en) * 1998-02-09 1999-08-12 Ibi Co., Ltd. Method for connection for computer network on internet by real name and computer network system thereof
CN1255797A (en) * 1999-04-05 2000-06-07 徐志男 Chinese-character translation system for internet address and e-mail address

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100399329C (en) * 2005-01-19 2008-07-02 结信网络技术服务(上海)有限公司 Intelligent movable guiding engine systems
US8271265B2 (en) 2006-08-25 2012-09-18 Nhn Corporation Method for searching for chinese character using tone mark and system for executing the method
CN101615180B (en) * 2008-06-27 2012-10-31 国际商业机器公司 Method and device for identifying Pinyin
CN103853777A (en) * 2012-12-04 2014-06-11 腾讯科技(深圳)有限公司 Method and device for accessing websites through keywords
WO2017124294A1 (en) * 2016-01-19 2017-07-27 王晓光 Conference recording method and system for network video conference
CN109388537A (en) * 2018-08-31 2019-02-26 阿里巴巴集团控股有限公司 Operation information tracking, device and computer readable storage medium

Also Published As

Publication number Publication date
JP2006164292A (en) 2006-06-22
JP2004502231A (en) 2004-01-22
CN1496062A (en) 2004-05-12
CN100422987C (en) 2008-10-01
JP3871644B2 (en) 2007-01-24
AU2001291598A1 (en) 2002-01-08

Similar Documents

Publication Publication Date Title
US8412517B2 (en) Dictionary word and phrase determination
US8010344B2 (en) Dictionary word and phrase determination
JP4857075B2 (en) Method and computer program for efficiently retrieving dates in a collection of web documents
US10423649B2 (en) Natural question generation from query data using natural language processing system
JP5608766B2 (en) System and method for search using queries written in a different character set and / or language than the target page
JP3703080B2 (en) Method, system and medium for simplifying web content
US8745051B2 (en) Resource locator suggestions from input character sequence
US20020152258A1 (en) Method and system of intelligent information processing in a network
JPH11328076A (en) Method and system for accessing internet
CN1871607A (en) Identifying related names
JP2003529845A (en) Method and apparatus for providing multilingual translation over a network
WO2003017023A2 (en) System and method for extracting content for submission to a search engine
CN101079031A (en) Web page subject extraction system and method
CN1955952A (en) System and method for automatically extracting by-line information
WO2007143914A1 (en) Method, device and inputting system for creating word frequency database based on web information
US20030177115A1 (en) System and method for automatic preparation and searching of scanned documents
JP2006164292A (en) Method and system for processing intelligent information in network
EP1312039A2 (en) System and method for automatic preparation and searching of scanned documents
CN101071425A (en) Information fast searching device, client end, system and method
CN1808428A (en) Information searching criteria presentation and editing system and method
CN104778232A (en) Searching result optimizing method and device based on long query
US20180293508A1 (en) Training question dataset generation from query data
KR100519748B1 (en) Method and apparatus for internet navigation through continuous voice command
CN101083550A (en) System and method for realizing network real name
CN1916888A (en) Method and system of identifying language of double-byte character set character data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: 3721 NETWORKS SOFTWARE CO., LTD.

Free format text: FORMER OWNER: YINTEGUOFENG NETWORK SOFTWARE CO. LTD.

Effective date: 20050603

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20050603

Address after: Hongkong, China

Applicant after: 3721 Network Software Co., Ltd.

Address before: Hongkong, China

Applicant before: Inter China Network Software Co Ltd

ASS Succession or assignment of patent right

Owner name: YAHOO| CO.,LTD.

Free format text: FORMER OWNER: 3721 NETWORKS SOFTWARE CO., LTD.

Effective date: 20060120

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20060120

Address after: American California

Applicant after: Yahoo Corp.

Address before: Hongkong, China

Applicant before: 3721 Network Software Co., Ltd.

C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: FEIYANG MANAGEMENT CO., LTD.

Free format text: FORMER OWNER: YAHOO CORP.

Effective date: 20150331

TR01 Transfer of patent right

Effective date of registration: 20150331

Address after: The British Virgin Islands of Tortola

Patentee after: Fly upward Management Co., Ltd

Address before: American California

Patentee before: Yahoo Corp.

CX01 Expiry of patent term
CX01 Expiry of patent term

Granted publication date: 20081001