CN102314440A - Method for maintaining language model base by using network and system - Google Patents

Method for maintaining language model base by using network and system Download PDF

Info

Publication number
CN102314440A
CN102314440A CN2010102166319A CN201010216631A CN102314440A CN 102314440 A CN102314440 A CN 102314440A CN 2010102166319 A CN2010102166319 A CN 2010102166319A CN 201010216631 A CN201010216631 A CN 201010216631A CN 102314440 A CN102314440 A CN 102314440A
Authority
CN
China
Prior art keywords
user
corpus
local
input
netspeak
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010102166319A
Other languages
Chinese (zh)
Other versions
CN102314440B (en
Inventor
周志华
蒋斌
弓辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201010216631.9A priority Critical patent/CN102314440B/en
Publication of CN102314440A publication Critical patent/CN102314440A/en
Application granted granted Critical
Publication of CN102314440B publication Critical patent/CN102314440B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to an input method realized through network assistance and an input system. The language model base is maintained by using a network server and input is finished by calling the corresponding language model base by a user. The system comprises at least one piece of user equipment, the network server and a language model synchronizing device, wherein the user equipment is configured to be used by a user to enter characters and is configured to establish a local language model base according to the user input; the network server is configured to establish a network language model base on the basis of the received at least one local language model base and statistical analysis on other language materials; and the language model synchronizing device is configured to synchronize the local language model base and the network language model base. The invention also relates to the input method realized by using the system.

Description

Utilize the method and system in network operation language model storehouse
Technical field
The present invention relates to literal input field, relate in particular to and a kind ofly utilize network assistance to realize, improve the input method of safeguarding the aspect and the system in language model storehouse.
Background technology
Present popular input method, as search dog, Google's input method etc. all built-in the statistical language model algorithm, analyze literal input data meet user expectation with generation result.Statistical language model is to disclose the inherent statistical law of linguistic unit with probabilistic statistical method, and wherein N-Gram is simply effective, is widely used.The N-Gram model is based on a kind of like this hypothesis, and the appearance of N speech is only relevant with a front N-1 speech, and all uncorrelated with other any speech, and the probability of whole sentence is exactly the product of each speech probability of occurrence.These probability can obtain through the number of times that directly N speech of statistics occurs simultaneously from language material.That commonly used is the Bi-Gram of binary and the Tri-Gram of ternary.
Does particularly, the research task of language model is in input method: the possibility that the individual vocabulary in front in the known text sequence (i-1), i vocabulary are word w have much? Suppose that it is the certain words sequence of k that S has represented certain length, S=w1, w2 ... wk.The N-gram language model is regarded sequence of words S as the Markov process with following probable value:
p(S)=пP(wi|wi-1,wi-2,wi-3,...,wi-n+1) (i=1,...,k)
Wherein n has represented the exponent number of Markov process.For example when n=2, two promptly above-mentioned gram language model Bi-gram.The same existing information that it is right that it utilizes vocabulary is carried out the probability estimate of correlation parameter.
In input method, introduce statistical language model, can improve effectively the user carrying out literal input especially carry out like the long sentence input condition under the accuracy rate of input method selection result.Yet, exist a lot of further problems of applicational language model not solve, for example, about the renewal in language model storehouse and optimization etc.
In fact, in the language model application facet, all there is some imperfection part in various existing input methods.For example; The individualized language model bank that Google's input method is provided; Pay attention to the maintenance in local language models storehouse; Network side only limits to the synchronous and renewal on the primitive meaning to the booster action of the maintenance in language model storehouse, promptly when user's logging in network server, can own local language model storehouse be uploaded to network and preserve; And when reset input method or carry out to require the language model storehouse of user language model bank that keeps on the webserver and local subscriber apparatus synchronous under the situation such as input method upgrading of user.Do not relate in Google's input method in the language model storehouse of network side and being optimized the user; It only safeguards its individualized language model bank to the input habit of individual subscriber; And can other network users' input habit organically not absorbed in user's the individualized language model bank, even this other network users and this user have certain homogeney or optimize for this user's language model storehouse about the statistics of its input habit and to have positive effect.Be limited to the limited processing power of local device, be similar to this language model of Google's input method storehouse and must have finite capacity, the untimely latent defect of renewal.
The search dog input method provides exempts from local cloud input function of installing, and claims and can utilize the unlimited processing capacity of server that input is provided beyond the clouds.This is equivalent in fact safeguard whole corpus at network side, and directly and client carry out alternately.Can imagine, this cloud input institute based on will be not through the huge corpus of classification or optimization.According to statistics; Possibly surpass tens G capacity at least if all collect vocabulary that occurs on the network or habitual phrase etc. the corpus of formation; And all be synchronized to the corpus of tens G on the local device almost is what impossible to realize, particularly especially true for the such thin-client of handheld device.Even if as the search dog input method is looked for, only at the like this big corpus of network side storing, the entry that so directly searches out coupling based on the corpus of such magnanimity from network side also be difficulty or have a time-delay.Prior, this what follow for the use of super large corpus is proprietary input habit, and it does not have relatedly with the individual subscriber language model, therefore can't provide to the specific user and call the high efficiency that particular language model is brought.
In order to address the above problem, a kind of new character input method need be provided.
Summary of the invention
The purpose of this invention is to provide a kind of character input method that addresses the above problem and system.
According to a first aspect of the invention, a kind of input system of utilizing network assistance to realize is provided, it utilizes the webserver to safeguard the language model storehouse, and accomplishes input through calling the corresponding language model bank to the user, and said system comprises:
At least one subscriber equipment is configured to carry out the literal input by user's use, and is configured to set up the local language models storehouse according to user's input;
The webserver is configured to set up the netspeak model bank based at least one local language models that receives and to the statistical study of other language materials; And
The language model synchronous device is configured to local language models storehouse and netspeak model bank are carried out synchronously.
According to a second aspect of the invention, a kind of input method of utilizing network assistance to realize is provided, wherein safeguards the language model storehouse at network side, and accomplish input through calling the corresponding language model bank to the user, said method comprises step:
Set up local language models according to user's input;
Said local language models is uploaded to the webserver;
Based at least one local language models that receives and to the statistical study of other language materials, set up the netspeak model bank.
According to scheme of the present invention; The individualized language model that is adapted to the user that can set up in this locality uploads to network side; And at network side on the basis in local language models storehouse, carry out statistical study through input to a plurality of users, form the netspeak model bank.Compare with the language model application of prior art; The present invention has utilized server almost endless processing power for local device to carry out the statistical study of relevant user's input, to help to be based upon the language model storehouse of expanding on the basis, local language models storehouse.Simultaneously to the specific user; The netspeak model bank that expands has kept its individualized feature; To shorten the training process of user guaranteeing that the language model storehouse adapts under specific user's the prerequisite well, and bring beyond thought speed to promote effect input method.In addition; The present invention can make the user can be as required with local language models storehouse and netspeak model bank bi-directional synchronization; And be the language model storehouse that keeps on the local device or server for its selection or the netspeak model bank called all are complementary with user characteristics, promptly can satisfy user's input habit requirement to greatest extent.
Description of drawings
Above-mentioned and other characteristic, character and advantage of the present invention will become more obvious through the description below in conjunction with accompanying drawing and embodiment, identical in the accompanying drawings Reference numeral is represented identical characteristic all the time, wherein:
Fig. 1 is the structured flowchart of input system according to an embodiment of the invention;
Fig. 2 is the structured flowchart of input system according to another embodiment of the invention;
Fig. 3 is the process flow diagram of input method according to an embodiment of the invention;
Fig. 4 is the part process flow diagram of input method according to another embodiment of the invention;
Fig. 5 is the synoptic diagram according to netspeak model bank of the present invention and local language models storehouse;
Fig. 6 is the synoptic diagram that explanation can realize suitable computingasystem environment of the present invention.
Embodiment
Below in conjunction with accompanying drawing the present invention is described in further detail.
Describe character input system according to an embodiment of the invention in detail with reference to Fig. 1 below.Input system shown in Figure 1 comprises the webserver 10, subscriber equipment 20 and language model synchronous device 30, and wherein the webserver 10 can be on the internet a plurality of webservers 10 of distributing 1... 10 nThese webservers 10 1... 10 nCollaborative work constitutes a server cloud, for a large number of users provides service.The webserver 10 also can be the one or more servers that are positioned on the intranet.
For realizing the object of the invention, the webserver 10 comprises web corpus 101, and web corpus 101 can comprise user network corpus 101a and the public network corpus 101b that corresponds respectively to each user.
User network corpus 101a is the user's of each registration the backup of local corpus on the webserver.As shown in the figure; Subscriber equipment 20 comprises corpus synchronous device (not shown); Be used for behind registered user's logging in network server 10, the local corpus 201 on the subscriber equipment uploaded to the webserver 10 or user network corpus 101a and local corpus 201 that this user is retained on the webserver 10 carry out synchronously according to user's selection.
Identical with local corpus 201, user network corpus 101a also stores basicvocabulary collection, basic language model, user and uses the word finder that generates in the input method process.Can also store some supplementarys: for example, the user sets a property to the various of input method, includes but not limited to fuzzy sound, either traditional and simplified characters, Two bors d's oeuveres, spelling, letter assembly or the like; And attribute of user information, include but not limited to occupation, hobby, professional domain, resume, age or the like.
Owing to preserve the user network corpus on the webserver 10; Therefore no matter which terminal device the user uses, and can come to carry out apace typing through logining synchronous local corpus 201 in back or online use user network corpus 101a as long as can be connected to the webserver 10.
Public network corpus 101b is based on retrieval vocabulary on network search engines of the input of open source literature, publication, a large number of users, a large number of users, the indexing key words and/or the AdWords information of a large amount of webpages are carried out analytic statistics and formed its reflection user group's general character or focus.
The common web corpus that constitutes 101 of user network corpus 101a and public network corpus 101b is set up the training corpus of netspeak model bank as the webserver 10.The webserver 10 comprises netspeak modelling device 102; It is used for obtaining corpus from web corpus 101; And set up language model by corpus, wherein set up model and comprise corpus is carried out pre-service, analyzes language material to extract language model and the language model storehouse is optimized processing etc.Setting up statistical language model according to predetermined corpus is to well known to a person skilled in the art content, at this not to its detailed expansion.According to the expectation size in the abundance of corpus and required model accuracy rate, component model storehouse etc.; Can select to set up like concrete syntax version of binary or three gram language model and so on and the specific algorithm that is used to set up language model, like Apriori algorithm etc.
Netspeak modelling device 102 is put into netspeak model bank 103 with the language model that generates.Subscriber equipment 20 also is synchronized to its local language models storehouse 203 in the netspeak model bank 103 through language model synchronous device 30.Like this, both comprised language model apparatus for establishing 102 in the netspeak model bank 103, also comprised the language model of subscriber's local by the language model that extracts in the web corpus 101.For user's local language models storehouse 203, the netspeak model bank is abundanter, has merged the input habit of a large number of users, and the language model that meets its custom can be directly called in the initial training that wherein need not the user.Undoubtedly; Always can there be certain contact and general character between men; Through statistical study to a large number of users input, can set up the language model that is suitable for a plurality of users fast, it is higher that this sets up the obvious efficient of language model than subscriber equipment through analysis user self input history.Even import a keystroke sequence first, the user also need not through correcting the output result that interference lexical item in the linguistic unit such as long sentence can directly obtain meeting oneself expectation one by one again, and it is desired that this is undoubtedly the user.For example, never imported keystroke sequence " zhuanlfshshxzdiershitiaodiyikuan " before the user.Language model storehouse before utilizing; The user needs artificially this sentence to be divided into a plurality of speech inputs; Perhaps need select correct lexical item to obtain " the 20 first item of Patent Law detailed rules for the implementation " one by one, otherwise might obtain " Patent Law is implemented Tibet father's occurrences in human life and transferred first item " wrong like this result.Yet; If web corpus has comprised the relational language model that from the relevant text of patent examination as to substances, extracts; Perhaps in a large amount of network users, exist the patent related personnel to comprise patent agent, patent examiner, the intellecture property consultant of company; As long as once there was a user to select same language material output, then follow-up user just can save aforesaid troublesome steps.
Preferably, netspeak model bank 103 comprises sort module 103a, is used for according to user property netspeak model bank 103 being subdivided into a plurality of netspeak model word banks word bank 1, word bank 2...... word bank n.Through such classification, will provide more accurate language model storehouse to call to the user.The web corpus that will be fit to oneself in user expectation downloads under its local subscriber apparatus or the server handling ability condition of limited, and this preferred version is favourable.The present invention can obtain best compromise between the Indirect Language model for the user takes up room and obtains more easily in the economy of language model bank; The user can be benefited from the language model that the webserver expanded, and the language model that can avoid again being not suitable for oneself is imported obscuring of causing to self.
Among the figure for clear; Netspeak modelling device 102 and netspeak model bank 103 are depicted as independent device; In fact, it often is integrated in the web corpus 101 as the organic component of corpus, directly is used for carrying out alternately to accomplish user's input with coalignment etc.
Language model storehouse synchronous device 30 not only is used for synchronous (upload) of subscriber's local language model storehouse to webserver side, also is used to synchronous (download) of the netspeak model word bank of user's selection to user equipment side.
Subscriber equipment 20 comprises local corpus 201, wherein stores the basicvocabulary collection, the user uses the word finder that generates in the input method process.In addition, some supplementarys can also be stored in this this locality corpus 201: for example, the user sets a property to the various of input method, includes but not limited to fuzzy sound, either traditional and simplified characters, Two bors d's oeuveres, spelling, letter assembly or the like; And attribute of user information, include but not limited to occupation, hobby, professional domain, resume, age or the like.These auxiliary information help candidate's entry is optimized ordering.As stated, the user can carry out local corpus 201 synchronously with web corpus 101 as required, and this can utilize abundant Internet resources and server handling ability and improve the dirigibility that local input method is provided with.
Subscriber equipment 20 comprises local language models apparatus for establishing 202; Itself and netspeak modelling device 102 are similar; Use known language model and set up algorithm; Language material according in the local corpus 201 is set up local language models, and the local language models of setting up is put into local language models storehouse 203.
Comprise the language model that carries out the suitable user that learning training obtains through text in the local language models storehouse 203 to user input.For different users, its input content has bigger difference on language model.Such as, the user who has often imports formal official document, requires writing criterions, uses words accurately, less colloquial style expression way; The user who has often imports blog articles, requires smooth nature, popular vocabulary and personalized writing style to emerge in an endless stream; And the user who has often is applied to Internet chat with input method, the style that its content then is full of easily, lies fallow, and the colloquial style tendency is fairly obvious.The difference of this style of writing style all will be embodied on the language model.If same user requires different language models in the different application occasion; For example same user imports document sometimes; Sometimes import chat content, then need segment the local language models storehouse, this can realize through the sort module that is similar to netspeak model bank 103.
Likewise; Among the figure for clear; Local language models apparatus for establishing 202 and local language models storehouse 203 are depicted as independent device; In fact, it often is integrated in the local corpus 201 as the organic component of corpus, directly is used for carrying out alternately to accomplish user's input with coalignment etc.
In the present embodiment, subscriber equipment 20 also comprises local coalignment 204, and it is used for and the input module that is used for carrying out input characters (keyboard 205 and display device 206) is cooperated realize local the input.Wherein, local coalignment 204 is used to receive the keystroke sequence of user through keyboard 205 inputs, and calls local language models and in local corpus, inquire about, and subsequently candidate's lexical item is outputed to display device 206, supplies the user to select.Among the embodiment of the cloud input that will introduce in detail in the back; Subscriber equipment is included be configured to by the user make be used for the input module (keyboard 205 and display device 206) of input characters can also be directly mutual with the network coalignment of network side, to realize the cloud input pattern.
Fig. 2 shows character input system according to another embodiment of the invention.The input system of Fig. 2 and the system class among Fig. 1 seemingly, just the system among Fig. 2 can directly be served for the user provide input by server beyond the clouds.As shown in the figure; In the webserver 10, increased network coalignment 104; It is receiving the user behind the keystroke sequence of keyboard 205 inputs, and is mutual with web corpus 101 and netspeak model bank 103 (corresponding network language model word bank wherein), obtains one or more network candidates lexical items; And candidate's lexical item sent to the display device 206 on the subscriber equipment 20, supply the user to select.
For clear, the subscriber equipment 20 among Fig. 2 does not comprise local coalignment.In fact, subscriber equipment 20 can still comprise local coalignment, and comprise be used for network candidates lexical item and local candidate's lexical item be summarised in together gather the device (not shown).This method that combines local input and server to assist both can utilize huge web corpus of network side and powerful server handling ability to search suitable entry option, the advantage that also can utilize the local device reaction velocity to bring soon.
Fig. 3 is a process flow diagram of on the webserver, safeguarding the method in language model storehouse according to an embodiment of the invention.As shown in the figure, in step S301, subscriber equipment is set up local language models according to local corpus.The corpus that this this locality corpus is comprised can be that basicvocabulary collection, user use the word finder that generates in the input method process.This local language models can be binary or three gram language model.The algorithm of setting up language model is well-known to those skilled in the art, repeats no more at this.
In step S302, subscriber equipment uploads to the webserver with local language models.For example, realize this through the language model storehouse synchronous device among Fig. 1 and Fig. 2 and upload step.When the user signs in to the webserver through subscriber equipment, after the language model storehouse synchronic command that receives from subscriber equipment, this user's local language models storehouse is synchronized to the netspeak model bank.
In step S303, the webserver receives the local language models of subscriber equipment, and this local language models storehouse is merged in the netspeak model bank.The netspeak model bank can be regarded organic set in a plurality of users' local language models storehouse as, and it in conjunction with the newly-increased language model that from the network language material, is drawn into, forms bigger language model storehouse on the basis in each local language models storehouse.
In step S304,, the netspeak model bank is subdivided into a plurality of netspeak model word banks according to user property.User property can be job specification, hobby, age level, geographic area of user etc.Because this attribute, the user reflects different input style and input habit when importing.For example, the same meaning of expressing traffic congestion, older user may import " this morning, Lu Shangzhen was stifled ", and young user may import " morning is super stifled today ".Again for example, like the user of football usually to use " how much are Holland and Slovakia's score ", expression such as " what time the Cup of tonight begin on earth ", and possibly can not relate to this sentence formula to the halfhearted user of this type games.Therefore; According to the different attribute that the user had; Formation has more the netspeak model word bank of granularity, can combine the real-time of network language model storehouse and enrich the accurate and personalized advantage in advantage and local language models storehouse, for user's input provides optimal selection.
In step S305, call corresponding netspeak model word bank to the user.When user expectation is synchronized to the local language models storehouse with the language model in the netspeak model bank, when perhaps user expectation directly provides cloud input service by server, can be it and call suitable netspeak model word bank.For example, according to user property (Li Wei-man-23 years old-have deep love for football-...), will send language model storehouse synchronous device for the netspeak model word bank of its selection, be synchronized to the local language models storehouse by the latter; Perhaps call netspeak model word bank into its selection by the network coalignment, the corresponding network candidates result of inquiry in web corpus, the display device that sends the user to supplies its selection.
Fig. 4 shows the process flow diagram that the method for cloud input is provided according to of the present invention by the webserver on the basis of the netspeak model bank of its maintenance.In step S401, receive the button list entries of user on the keyboard of subscriber equipment via network.This keystroke sequence can be one or more phrases even letter assembly or spelling in short; For example the user will import " I like using Baidu's search engine "; Can import the letter of each word initial consonant and piece together " wxhybdssyq "; The spelling " woxihuanyongbaidusousuoyinqing " of each word can be imported, the mixing input " woxhuanybaidssyinq " of letter assembly and spelling can also be imported.In general, all import spelling, candidate word is more accurate, reduces the number of times that page turning is searched, but needs the more character of input.If all the input letter is pieced together, then repeated code is more, and it is longer to cause page turning to search the time, and efficient is not high.So spelling is more effective with letter assembly mixing input usually.
Then, in step S402, behind acquisition user's the button list entries, call and the corresponding netspeak model of user word bank, said list entries is carried out matching inquiry in the web corpus of the webserver.Because web corpus is very huge; Be based on the language model that meets user's input habit simultaneously; Therefore the result of coupling can be more accurate; For example possibly directly return " I like using Baidu's search engine " this entry, not select speech, even if therefore network-feedback has hysteresis still can accelerate input speed greatly slightly and do not need one by one phrase to carry out page turning.
For instance, suppose that the keystroke sequence that the user imports is that " wxhtzhmsh " pieced together in letter.In web corpus, there is lexical item " I like, no signal, infinitely good, Wang Xiaohe ... " to wxh; There is " characteristic, notice, comrade, children's garment, barreled ... " to tzh; To msh exist " description, pattern, do not say, second kill, cuisines ... "; According to different language models, above-mentioned lexical item can constitute such as " I like cuisines among the figure ", " no signal adjustment modes " and " the little lotus children's garment of Wang kill second " or the like.Can know that according to user property this user's job specification is Taobao shop specialty seller, then call its corresponding language model after, find that most possible output result should be the third selection.And for the user who does not have this attribute, then be difficult to obtain such lexical item combination.
In step S403, can give subscriber equipment with the network entry option feedback that is obtained, supply its selection.The number of times that inputting preferences option that whether the network coalignment can before be selected according to entry, the previous selecteed time order and function of entry, the previous selecteed number of times of entry, user preset and/or the quilt of entry on network are searched for is confirmed the priority of entry option.Though used entry option one speech, should be appreciated that any linguistic unit that its acute pyogenic infection of finger tip user expectation obtains, even comprise one section word at this.Under the more and more intelligent situation of present input method, the user directly imports and obtains the candidate language unit longer and more complete than word, phrase even sentence than the keystroke sequence of length is very possible.
Fig. 5 shows the synoptic diagram according to netspeak model bank of the present invention and local language models storehouse.As shown in the figure, the netspeak model bank 103 on the webserver 10 comprises several word banks: civil servant's word bank, Taobao's word bank, word bank below 25 years old, football word bank etc.Local language models storehouse 203 on the subscriber equipment 20 comprises two sub-banks: work word bank and chat word bank.The user of this subscriber equipment 20 is young men of 23 years old, and in government organs' tenure, hobby is a football.When it uses input method; Can utilize language model storehouse synchronous device 30 that its local language models storehouse is synchronized to netspeak model bank 103, the sort module (not shown) in this netspeak model bank 103 will merge to these language models in the corresponding netspeak model word bank according to this attribute of user.Certainly, also might the netspeak model bank for each user safeguards its oneself word bank, like first, second, third, fourth, penta word bank.When the user changes its subscriber equipment, when perhaps upgrading its input method, can be synchronized to this locality through the netspeak model word bank that language model storehouse synchronous device 30 will be fit to oneself.Shown in the dotted line among the figure, the language model in civil servant's word bank is synchronized in this user's the work word bank, so that it easily obtains to meet the style of writing style of formal official document, the output result of linguistic norm when carrying out the relevant input of work.In addition; With below 25 years old and the language model in the football word bank be synchronized to this user's chat word bank so that it easily obtains the random relatively input results such as expression of the neologisms in the network, hot speech, pet phrase when importing like unofficial interaction scenarios such as Internet chats.The classification of word bank among Fig. 5 only is an example, also possibly exist in practice the classification results that obtains according to other attributes of user and and local language models between other forms of mapping.
Fig. 6 has explained an example of suitable computingasystem environment 600, wherein can realize the present invention.Computingasystem environment 600 only is an example of suitable computing environment and is not intention restriction usable range of the present invention or function.Computing environment 600 should not be interpreted as to be had with the arbitrary of the assembly described in the exemplary operation environment 600 or makes up relevant dependency or requirement.
Those skilled in the art will appreciate that computing machine or other client computer or server apparatus can be used as the part computer network and adopt, and perhaps are used for DCE.In this, the invention belongs to any computer system with any amount internal memory or storage unit, and the application program and the process that occur in any amount on any amount storage unit or the capacity, they can use with the present invention.The present invention can be applied in network environment or DCE, adopt the environment of server computer and client computers.The present invention can also be used for independent computing equipment, has the programming language function and produces, receives and launch information interpreting and executive capability with long-range or local service.
The present invention can use multiple other general or special-purpose computing system environment or configuration to operate. can be fit to the known computing system that uses with the present invention; The example of environment and/or configuration comprises; But be not limited to: personal computer; Server computer; Portable or portable set; Multicomputer system; System based on microprocessor; STB; The programmable user electronic equipment; Network PC; Small-size computer; Mainframe computer; Comprise DCE of arbitrary said system or the like.
The present invention can describe with the general context of computer executable instructions, the program module of for example being carried out by computing machine.Generally speaking, program module comprises routine, program, object, assembly, data structure etc., and they are carried out particular task or realize specific abstract data type.The present invention can also be actually used in the DCE, is wherein executed the task by the teleprocessing equipment that connects through communication network or other data transmission medium.In DCE, program module and other data can be arranged in local and remote storage medium, comprise memory storage device.Distributed Calculation is convenient to share computer resource and service through the direct exchange between computing equipment and system.These resources comprise information, high-speed cache with service, reach the exchange of file disk storage.Distributed Calculation is utilized the network connectivity, allows their collective's effect of subscriber computer performance to help whole company.In this, plurality of devices can have application program, object or resource, and they can utilize technology of the present invention.
With reference to figure 6, be used to realize that example system of the present invention comprises that form is the universal computing device of computing machine 610.The assembly of computing machine 610 can be including, but not limited to: processing unit 620, Installed System Memory 630, and comprising that the various system component lotus roots of Installed System Memory are bonded to the system of processing unit 620. bus 621.System bus 621 can be a polytype. bus-structured any, comprise rambus or Memory Controller Hub, peripheral bus, and use arbitrary multiple bus-structured local bus.But unrestricted, this structure comprises industrial standard architectures (ISA) bus, MCA (MCA) bus, enhancement mode ISA (EISA) bus, video electronics standard alliance (VESA) local bus, reaches periphery component interconnection (PCI) bus (being also referred to as the Mezzanine bus) through example.
Computing machine 610 generally comprises various computer-readable mediums.Computer-readable medium can be and to comprise volatibility and non-volatile medium, removable and not removable medium by any available media of computing machine 610 visit.But unrestricted, computer-readable medium can comprise computer storage media and communication media through example.Computer storage media comprises volatibility and non-volatile, removable and not removable medium, and they are realized with any means or the technology that is used to store such as the such information of computer-readable instruction, data structure, program module or other data.Computer storage media is stored expectation information and can be by any other medium of computing machine 610 visits including, but not limited to: RAM, ROM, EEPROM, flash memory or other memory technology, CDROM, digital versatile disc (DVD) or other optical disc memory, tape cassete, tape, magnetic disk memory or other magnetic storage apparatus or be used to.Communication media generally comprises computer-readable instruction, data structure, program module or other data in the modulated data signal such such as carrier wave or other transmission mechanism, and comprises any information delivery medium.Term " modulated data signal " means the signal that its one or more characteristics are set up or change with the mode that the signal internal information is encoded.Through example but unrestricted, communication media comprise such as cable network or directly line connect so wired medium and such as sound, RF, infrared such wireless media and other wireless media.Above-mentioned combination in any should be included in the scope of computer-readable medium.
Installed System Memory 630 comprises computer storage media, and its form is volatibility and/or Nonvolatile memory, such as read-only memory (ROM) 631 and random access memory (RAM) 632.Basic input/output 633 (BIOS) generally is stored in the ROM 131, and it comprises the basic routine of the inter-module transmission information that for example helps between the starting period in computing machine 610.RAM 632 generally comprises data and/or program module, and they can zero accesses and/or are currently operated above that by processing unit 620.But unrestricted, Fig. 6 has explained operating system 634, application program 635, other program module 636 and routine data 637 through example.
Computing machine 610 can also comprise other removable/not removable, volatile/nonvolatile computer storage media.Only pass through example; Fig. 6 explained hard disk drive 641 that not removable, non-volatile magnetic medium is read and write, to disc driver 651 removable, that non-volatile magnetic disk 652 is read and write and CD drive 655 that removable, non-volatile CD 656 is read and write, such as CD ROM or other optical medium.In the exemplary operation environment available other removable/not removable, volatile, nonvolatile calculates storage media including, but not limited to tape cassete, flash card, digital universal disc, digital video tape, solid-state RAM, solid-state ROM or the like.The hard disk drive 641 general not removable memory interfaces that pass through as interface 640 link to each other with system bus 621, and disc driver 651 generally links to each other with system bus 621 with the removable memory interface as interface 650 with CD drive 655.
Discuss above and be the storage that computing machine 610 provides computer-readable instruction, data structure, program module and other data at driver illustrated in fig. 6 and their correlation computer storage media.In Fig. 6, for example, said hard disk drive 641 storage operating systems 644, application program 645, other program module 646 and routine data 647.Notice these assemblies or can be identical with routine data 637, perhaps different with them with operating system 634, application program 635, other program module 636.Here provide different digital to explain that they are different copies at least for operating system 644, application program 645, other program module 646 with routine data 647.The user can be through being input to order and information in the computing machine 610 such as keyboard 662 and indicating equipment 661 such input equipments, and input equipment is commonly referred to mouse, trace ball or touch panel.Other input equipment (not shown) can comprise microphone, joystick, cribbage-board, satellite dish, scanner or the like.These with other input equipment often through linking to each other with processing unit 620 with user's input interface 660 that system bus 621 lotus roots are closed, but also can use other interface to be connected, such as parallel port, game port or USB (USB) with bus structure.The display device of monitor 691 or other type is also through linking to each other with system bus 621 such as video interface 690 such interfaces.Except monitor 691, computing machine can also comprise other external unit, and like loudspeaker 697 and printer 1%, they can connect through output Peripheral Interface 690.
Computing machine 610 can be operated in the networked environment, and this environment uses and is connected such as the logic between the such one or more remote computers of remote computer 680.Remote computer 680 can be personal computer, server, router, network PC, peer device or other common network node; And generally comprise the relevant many or whole elements of above-mentioned and computing machine 610, although memory storage device 681 only has been described among Fig. 6.The described logic of Fig. 6 connects and comprises Local Area Network 671 and wide area network (WAN) 673, but can also comprise other network.This networked environment is common in office, enterprise-wide. computer networks, corporate intranet and internet.
When being used for the LAN networked environment, computing machine 610 links to each other with LAN 671 through network interface or adapter 670.When being used for the WAN networked environment, computing machine 610 generally comprises modulator-demodular unit 672 or other device that is used on the WAN such such as the internet 673, setting up communication.Modulator-demodular unit 672 can be inner or outside, and it can link to each other with system bus 621 through user's input interface 660 or other suitable mechanism.In networked environment, can be stored in the remote memory storage devices about computing machine 610 described program modules or its part.But unrestricted, Fig. 6 has explained the remote application 685 that resides on the memory device 681 through example.It is exemplary that network shown in being appreciated that connects, and also can use other device that establishes a communications link at intercomputer.
More than specific embodiment of the present invention is described.It will be appreciated that the present invention is not limited to above-mentioned specific implementations, those skilled in the art can make various distortion or modification within the scope of the appended claims.

Claims (16)

1. input system of utilizing network assistance to realize, it utilizes the webserver to safeguard the language model storehouse, and accomplishes input through calling the corresponding language model bank to the user, and said system comprises:
At least one subscriber equipment is configured to carry out the literal input by user's use, and is configured to set up the local language models storehouse according to user's input;
The webserver is configured to set up the netspeak model bank based at least one local language models that receives and to the statistical study of other language materials; And
The language model synchronous device is configured to local language models storehouse and netspeak model bank are carried out synchronously.
2. system according to claim 1, the wherein said webserver further comprises web corpus, said web corpus comprises the corpus that is used to set up the netspeak model bank.
3. system according to claim 2, wherein said web corpus comprise user network corpus and public network corpus, and wherein the user network corpus can carry out with user's local corpus synchronously.
4. system according to claim 1, the wherein said webserver further comprises netspeak modelling device, is used for setting up the netspeak model according to the language material of web corpus.
5. system according to claim 1, wherein said netspeak model bank comprises sort module, is used for according to user property, and the netspeak model bank is categorized as one or more netspeak model word banks.
6. system according to claim 5, wherein said user property is selected from and comprises following group: user's job specification, hobby, age level, residential area.
7. system according to claim 1; The wherein said webserver further comprises the network coalignment; Itself and web corpus and netspeak model bank are mutual; Be used for web corpus being inquired about, so that cloud input service to be provided to the netspeak model word bank that the user calls respective type.
8. system according to claim 1, wherein said subscriber equipment further comprises local corpus and local language models apparatus for establishing, wherein said local language models apparatus for establishing is set up the local language models storehouse according to the language material in the said local corpus.
9. an input method of utilizing network assistance to realize is wherein safeguarded the language model storehouse at network side, and accomplishes input through calling the corresponding language model bank to the user, and said method comprises step:
Set up local language models according to user's input;
Said local language models is uploaded to the webserver;
Based at least one local language models that receives and to the statistical study of other language materials, set up the netspeak model bank.
10. method according to claim 9 also comprises from web corpus obtaining corpus, sets up the netspeak model bank through corpus is carried out statistical study.
11. method according to claim 10 also comprises subscriber's local corpus and web corpus is carried out synchronously.
12. method according to claim 9, the wherein said netspeak model bank of setting up comprises that corpus is carried out pre-service, language model to be extracted and the language model storehouse that obtains is optimized.
13. method according to claim 9 also comprises according to user property, and the netspeak model bank is categorized as one or more netspeak model word banks.
14. being selected from, method according to claim 9, wherein said user property comprise following group: user job character, hobby, age level, residential area etc.
15. method according to claim 9 comprises that also the netspeak model bank of calling respective type to the user inquires about web corpus, so that cloud input service to be provided.
16. method according to claim 13 also comprises the netspeak model word bank that is suitable for this user is synchronized to the subscriber equipment that the user is using.
CN201010216631.9A 2010-06-30 2010-06-30 Utilize the method and system in network operation language model storehouse Active CN102314440B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010216631.9A CN102314440B (en) 2010-06-30 2010-06-30 Utilize the method and system in network operation language model storehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010216631.9A CN102314440B (en) 2010-06-30 2010-06-30 Utilize the method and system in network operation language model storehouse

Publications (2)

Publication Number Publication Date
CN102314440A true CN102314440A (en) 2012-01-11
CN102314440B CN102314440B (en) 2016-06-08

Family

ID=45427619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010216631.9A Active CN102314440B (en) 2010-06-30 2010-06-30 Utilize the method and system in network operation language model storehouse

Country Status (1)

Country Link
CN (1) CN102314440B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870000A (en) * 2012-12-11 2014-06-18 百度国际科技(深圳)有限公司 Method and device for sorting candidate items generated by input method
CN103869999A (en) * 2012-12-11 2014-06-18 百度国际科技(深圳)有限公司 Method and device for sorting candidate items generated by input method
CN104020950A (en) * 2013-03-01 2014-09-03 腾讯科技(深圳)有限公司 Input method based on touch screen and input device with touch screen
WO2019150222A1 (en) * 2018-02-01 2019-08-08 International Business Machines Corporation Dynamically constructing and configuring a conversational agent learning model
CN110456921A (en) * 2019-08-01 2019-11-15 吉旗(成都)科技有限公司 Predict the method and device of user's keyboard operation behavior
CN110853642A (en) * 2019-11-14 2020-02-28 广东美的制冷设备有限公司 Voice control method and device, household appliance and storage medium
CN111611769A (en) * 2019-02-25 2020-09-01 北京嘀嘀无限科技发展有限公司 Text conversion method and device for multiple language models
CN112840627A (en) * 2018-12-19 2021-05-25 深圳市欢太科技有限公司 Information processing method and related device
CN114141236A (en) * 2021-10-28 2022-03-04 北京百度网讯科技有限公司 Language model updating method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101373468A (en) * 2007-08-20 2009-02-25 北京搜狗科技发展有限公司 Method for loading word stock, method for inputting character and input method system
CN101398834A (en) * 2007-09-29 2009-04-01 北京搜狗科技发展有限公司 Processing method and device for input information and input method system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101373468A (en) * 2007-08-20 2009-02-25 北京搜狗科技发展有限公司 Method for loading word stock, method for inputting character and input method system
CN101398834A (en) * 2007-09-29 2009-04-01 北京搜狗科技发展有限公司 Processing method and device for input information and input method system

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103869999A (en) * 2012-12-11 2014-06-18 百度国际科技(深圳)有限公司 Method and device for sorting candidate items generated by input method
CN103869999B (en) * 2012-12-11 2018-10-16 百度国际科技(深圳)有限公司 The method and device that candidate item caused by input method is ranked up
CN103870000B (en) * 2012-12-11 2018-12-14 百度国际科技(深圳)有限公司 The method and device that candidate item caused by a kind of pair of input method is ranked up
CN103870000A (en) * 2012-12-11 2014-06-18 百度国际科技(深圳)有限公司 Method and device for sorting candidate items generated by input method
CN104020950A (en) * 2013-03-01 2014-09-03 腾讯科技(深圳)有限公司 Input method based on touch screen and input device with touch screen
GB2584239B (en) * 2018-02-01 2022-03-02 Ibm Dynamically constructing and configuring a conversational agent learning model
WO2019150222A1 (en) * 2018-02-01 2019-08-08 International Business Machines Corporation Dynamically constructing and configuring a conversational agent learning model
US11886823B2 (en) 2018-02-01 2024-01-30 International Business Machines Corporation Dynamically constructing and configuring a conversational agent learning model
GB2584239A (en) * 2018-02-01 2020-11-25 Ibm Dynamically constructing and configuring a conversational agent learning model
CN112840627B (en) * 2018-12-19 2022-03-15 深圳市欢太科技有限公司 Information processing method, related device and computer readable storage medium
CN112840627A (en) * 2018-12-19 2021-05-25 深圳市欢太科技有限公司 Information processing method and related device
CN111611769A (en) * 2019-02-25 2020-09-01 北京嘀嘀无限科技发展有限公司 Text conversion method and device for multiple language models
CN110456921A (en) * 2019-08-01 2019-11-15 吉旗(成都)科技有限公司 Predict the method and device of user's keyboard operation behavior
CN110853642A (en) * 2019-11-14 2020-02-28 广东美的制冷设备有限公司 Voice control method and device, household appliance and storage medium
CN110853642B (en) * 2019-11-14 2022-03-25 广东美的制冷设备有限公司 Voice control method and device, household appliance and storage medium
CN114141236A (en) * 2021-10-28 2022-03-04 北京百度网讯科技有限公司 Language model updating method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN102314440B (en) 2016-06-08

Similar Documents

Publication Publication Date Title
US10832008B2 (en) Computerized system and method for automatically transforming and providing domain specific chatbot responses
TWI732271B (en) Human-machine dialog method, device, electronic apparatus and computer readable medium
CN102314440A (en) Method for maintaining language model base by using network and system
US20230015606A1 (en) Named entity recognition method and apparatus, device, and storage medium
US9235653B2 (en) Discovering entity actions for an entity graph
US10061843B2 (en) Translating natural language utterances to keyword search queries
CN101420313B (en) Method and system for clustering customer terminal user group
CN106663117B (en) Constructing graphs supporting providing exploratory suggestions
US11138212B2 (en) Natural language response recommendation clustering for rapid retrieval
US10073840B2 (en) Unsupervised relation detection model training
CN102163198B (en) A method and a system for providing new or popular terms
US10678786B2 (en) Translating search queries on online social networks
Stojanovski et al. Deep neural network architecture for sentiment analysis and emotion identification of Twitter messages
US20190108282A1 (en) Parsing and Classifying Search Queries on Online Social Networks
US8510308B1 (en) Extracting semantic classes and instances from text
CN107480158A (en) The method and system of the matching of content item and image is assessed based on similarity score
CN110297890B (en) Image acquisition using interactive natural language dialogue
US20070255552A1 (en) Demographic based classification for local word wheeling/web search
CN107103016A (en) Represent to make the method for image and content matching based on keyword
US11861319B2 (en) Chatbot conducting a virtual social dialogue
JP2008052732A (en) Method and program for calculating similarity, and method and program for deriving context model
CN109032381B (en) Input method and device based on context, storage medium and terminal
EP3420473A1 (en) Expert detection in social networks
US10223349B2 (en) Inducing and applying a subject-targeted context free grammar
CN107491465A (en) For searching for the method and apparatus and data handling system of content

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant