CN101119334A - Method, system and equipment for obtaining neology - Google Patents

Method, system and equipment for obtaining neology Download PDF

Info

Publication number
CN101119334A
CN101119334A CNA2007101221872A CN200710122187A CN101119334A CN 101119334 A CN101119334 A CN 101119334A CN A2007101221872 A CNA2007101221872 A CN A2007101221872A CN 200710122187 A CN200710122187 A CN 200710122187A CN 101119334 A CN101119334 A CN 101119334A
Authority
CN
China
Prior art keywords
client
neologisms
unit
speech
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2007101221872A
Other languages
Chinese (zh)
Inventor
李伟杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CNA2007101221872A priority Critical patent/CN101119334A/en
Publication of CN101119334A publication Critical patent/CN101119334A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The present invention discloses a method for getting new word. The method contains the following steps: firstly, the alternative character string is gained from the chart data; and secondly, the alternative character string is filtered according to the preset rule and the filtered words are used as new word. The present invention also discloses a system for getting new word and an instant communication user end and a server which completes the getting of the new word. The present invention has the advantages of easily, effectively and in time automatically getting new word based on the word source of the instant communication, and the new word gotten by the present invention has a boarder service scale.

Description

A kind of method, system and equipment that obtains neologisms
Technical field
The present invention relates to information extraction technology, be meant a kind of method of utilizing instant messaging (IM) system to obtain neologisms, system especially and realize instant communication client and the server that neologisms obtain.
Background technology
Along with information-based, electronic and networked fast development and universal, people can or get access to huge amount of information from various communication networks, internet contact every day; Along with the continuous expansion that a large amount of propagation and the people of information exchange content, new vocabulary emerges in an endless stream and is widely used.Owing to vocabulary is the basis that people link up, constantly increase and upgrade dictionary to be very important.At present, neologisms obtain in the application of fields such as input method and web search more, for input method, need to bring in constant renewal in self database, are that the user provides more vocabulary, input more easily with the assurance; For search engine, need to upgrade at any time and the expanded search keyword, to improve search speed.
Specifically, aspect input method, Chinese character coding input method commonly used now comprises that keyboard is imported and non-keyboard is imported two classes, and so-called keyboard input is meant 26 English alphabets that utilize on the keyboard, according to certain coding rule input Chinese character, as: phonetic input, radicals by which characters are arranged in traditional Chinese dictionaries input, five inputs or the like; So-called non-keyboard input is meant and utilizes other form input Chinese characters, as handwriting input, phonetic entry, the input of optical character identification (OCR) technology or the like.But, all there is in various degree problem in this two classes input mode to obtaining neologisms: inputting method, be to obtain neologisms according to features such as user's incoming frequency and number of times, general mode is: gather earlier input information and with the information stores of input, then canned data is screened and adds up by presetting rule, like this, though can accurately obtain neologisms, but the neologisms that obtained only derive from certain user, and only be stored on the terminal of the current use of this user, can not be towards more users, even a lot of neologisms are that everybody uses always, different user also needs to obtain respectively by frequent input separately; And, for same user, because neologisms only are stored on the terminal of current use, so, need again to obtain again after changing terminal, such as: on the terminal of office, obtained neologisms much commonly used, but needed again to obtain again when on oneself terminal, using required neologisms.The non-keyboard input method, clearly, this class input itself is by equipment or software is identified the handwriting, the input of Chinese character is finished in the identification of sound, optical character, can not guarantee higher recognition accuracy, so, on this identification degree, obtain neologisms, obtain probably wrong vocabulary or and non-required vocabulary, therefore be difficult to reach the real purpose that obtains neologisms.Aspect web search, obtaining of neologisms is the keyword of all-network user input that will collect, is pooled on the webserver and stores, and adds up afterwards again and extracts.But, because the continuous appearance of neologisms, and be dispersed in the different corpus, be difficult in time, discern effectively and upgrade; And prior art can adopt artificial participation to collect, put in order and distinguish the mode of neologisms usually, the neologisms that obtain is added in the existing dictionary again, so, not only expend time in, cost, and operating efficiency is very low.
As can be seen, it is less relatively that prior art is obtained the approach of neologisms, mainly is user's input and searching keyword are collected and added up, and then gets access to neologisms, out of Memory originated as the source that obtains of neologisms at present.
Summary of the invention
In view of this, main purpose of the present invention is to provide a kind of method of obtaining neologisms, can obtain neologisms simply, effectively, in real time automatically based on the etymology of instant messaging, and can make the neologisms scope of application of being obtained wider.
Another object of the present invention is to provide a kind of system and equipment that obtains neologisms, can support, realize simple and convenient, effective flexibly based on the automatic obtain manner of the neologisms of instant messaging.
For achieving the above object, technical scheme of the present invention is achieved in that
The present invention proposes a kind of method of obtaining neologisms, comprising:
A, from chat data, obtain the alternative characters string;
B, alternative character string is screened according to presetting rule, will be through the speech after the screening as neologisms.
In the steps A, described being retrieved as: the chat data of obtaining the local terminal input; Or for to obtain the chat data of the opposite end that receives, wherein, described opposite end is one or more.
Described chat data is the chat data of local terminal input; Then steps A is specially:
The IM client software receives the data message of active user by the input method input, when will importing data and being shown in the instant messaging interface as chat record, with current input information as the alternative characters string;
Perhaps, described chat data is the chat data from the opposite end; Then steps A is specially:
The IM client software receives the data message that send the opposite end, when the data message that will receive is shown in the instant messaging interface as chat record, with the data message received as the alternative characters string.
Further comprise between steps A and the step B: the alternative characters string is divided into one or more speech; Then step B screens resulting speech according to presetting rule.
Further comprise after the described screening of step B: statistics also judges whether the speech through screening reaches setting threshold at the number of times that assigned address occurs, if reach, then with corresponding words as neologisms; Otherwise not as neologisms.Wherein, described assigned address is an internet data, or is the chat record from local terminal or opposite end.
In the such scheme, this method further comprises: the database of the neologisms that obtain being incorporated into various input methods.
In the such scheme, described steps A and step B are finished by the IM client, and this method further comprises: the IM client sends to the neologisms that obtain to end subscriber by instantaneous communication system.
In the such scheme, described steps A and step B are finished by the IM client, this method further comprises: the IM client uploads to background server with the neologisms that obtain, initiatively download IM client by background server, or download to the IM client that demand is arranged according to the request of IM client to each registration.
In the such scheme, described steps A and step B are finished by background server, and this method further comprises: background server is initiatively downloaded IM client to each registration with the neologisms that obtain, or downloads to the IM client that demand is arranged according to the request of IM client.
The invention allows for a kind of IM client that realizes that neologisms obtain, comprising: text input unit, instant message transrecieving unit, chat record display unit, key are that this IM client also comprises: end side screening unit;
Described text input unit is used to receive and show the information that this end subscriber is imported, and the information of receiving is sent to instant message transrecieving unit, chat record display unit and end side screening unit;
Described instant message transrecieving unit is used for and will sends to another IM client from the information that this end subscriber that the text input unit receives is imported, and the information that another IM client that will receive is sent sends to chat record display unit and end side screening unit;
Described end side screening unit, reception is from the chat data information of the local terminal input of text input unit, and the chat data information of sending from the opposite end of instant message transrecieving unit, and screen according to the alternative characters string of presetting rule to the chat data correspondence, obtain neologisms.
Wherein, this IM client further comprises the participle unit, and the alternative characters string that is used for the chat data correspondence that will obtain is divided into one or more speech, more ready-portioned speech is delivered to end side screening unit and is screened.
This IM client also further comprises statistic unit, be used for the speech after receiving terminal side screening unit screens, and the speech that receives and the data message that obtains from assigned address compared, add up received speech and whether reach setting threshold in the assigned address occurrence number, the speech that will reach threshold value is as neologisms.
Described instant message transrecieving unit further is further used for the neologisms that local terminal obtains are sent to the opposite end, or receives the neologisms that send the opposite end.
This IM client further comprises the server interaction unit, is used for the neologisms that local terminal obtains are uploaded to background server, or receives the neologisms of background server broadcast transmission, or to the background server request and download neologisms.
The invention allows for a kind of server of realizing that neologisms obtain, comprise the chat data Transmit-Receive Unit; This server also comprises: server side screening unit;
Described chat data Transmit-Receive Unit receives the chat record that each IM client is sent, and all chat data that will receive are delivered to server side screening unit;
Described server side screening unit screens the alternative characters string according to presetting rule, obtains neologisms.
This server further comprises the participle unit, and the alternative characters string that is used for the chat data correspondence that will obtain is divided into one or more speech, more ready-portioned speech is delivered to server side screening unit and is screened.
In the such scheme, this server further comprises statistic unit, be used for the speech after reception server side screening unit screens, and the speech that receives and the data message that obtains from assigned address compared, add up received speech and whether reach setting threshold in the occurrence number of assigned address, the speech that will reach threshold value is as neologisms.
The neologisms that described chat data Transmit-Receive Unit also is further used for obtaining are directly downloaded to each IM client, or download to the IM client that demand is arranged according to the request of IM client.
The present invention proposes a kind of system that obtains neologisms again, comprises at least one IM client, background server; This system also comprises the screening unit, is used for according to presetting rule the alternative characters string being screened, and obtains neologisms.Wherein, described screening unit is positioned at the IM client; Or be positioned at background server; Or be positioned at IM client and background server.
Method, system and the equipment that obtains neologisms provided by the present invention as etymology, obtains the alternative characters string with chat data from chat data, again the character string of being obtained is screened, added up by presetting rule, obtains neologisms.The present invention has following advantage and characteristics:
1) because the present invention extracts neologisms from chat data, and chat data comes from local terminal input, or comes from the input of one or more opposite ends, so, not only expanded the etymology that neologisms obtain, and the neologisms that obtain more meets majority's demand.
2) the present invention's process of obtaining neologisms can directly be screened according to presetting rule, also can further compare, add up on the basis of screening, so that obtain neologisms more accurately.
3) presetting rule of the present invention can be provided with various screening rules according to user's request, can not only accuracy, efficient is higher, and realize more flexible.
4) neologisms of the present invention obtain and can finish in the IM client, also can finish at background server; Can also adopt two-layer screening, simultaneously finish neologisms at IM client and background server and obtain, realize more flexibly, various, and the neologisms scope of application of obtaining is wider.
5) the present invention is when IM client and/or background server realize that neologisms obtain, can also be by uploading, download, send modes such as chat data, the neologisms that an entity is obtained send to more user and share, it is more extensive that neologisms are used, and can guarantee that neologisms upgrade in bigger range synchronization, and then save time and cost, increase work efficiency.
Description of drawings
Fig. 1 is a method flow schematic diagram of the present invention;
Fig. 2 is the composition structural representation of IM client among the present invention;
Fig. 3 is for realizing the composition structural representation of the server that neologisms obtain among the present invention.
Embodiment
Along with developing rapidly of IM, the people who uses instant communication mode to exchange is more and more, and the content that exchanges by IM more and more widely.Based on this, core concept of the present invention is: the chat data in the chat record as etymology, is obtained the alternative characters string from chat data, again the character string of being obtained is screened, added up by presetting rule, with satisfactory character string as neologisms.
Among the present invention, the function of obtaining neologisms can directly embed in the instant communication software of IM client.
The method that the present invention obtains neologisms may further comprise the steps as shown in Figure 1:
Step 100: from chat data, obtain the alternative characters string;
Here, obtaining alternative characters string and general input method from chat data, to obtain character string different, and obtaining from chat data is to obtain in real time, does not need first storage; The more important thing is: not only can obtain the chat data of local terminal input, can also obtain the chat data of the opposite end that receives, wherein, the opposite end can have a plurality of.
Chat data for the local terminal input, specific practice is: the user imports Chinese character by input method to the IM client, the IM client software receives the data message of current input, when will importing data and being shown in the instant messaging interface as chat record, with current input information as the alternative characters string.
For chat data from the opposite end, the concrete processing is: the software of active user IM client receives the data message that send the opposite end, when the data message that will receive is shown in the instant messaging interface as chat record, with the data message received as the alternative characters string.
In this step, can further the alternative characters string be divided into one or more speech, follow-up Screening Treatment be done in each speech afterwards; Also can not do division, and finish the participle operation by follow-up presetting rule, remove the part that does not meet basic word-building requirement, be the normal speech that uses with what guarantee to obtain.If doing vocabulary divides, then described division can be to distinguish different speech according to the separator that presets, such as: according to the separator that presets, be partitioned into the character string that needs, for instance, suppose the rule that presets be with comma as one of decollator, when then the alternative characters string is " autumn; beautiful season ", be divided into " autumn " and " season of beauty ".Also can carry out participle according to daily habits, such as: the alternative characters string is " my idol ", can be divided into " I " and " idol " or the like by daily custom.
Here, described division can adopt existing minute word algorithm to realize, such as: employing is based on the segmenting method of string matching, based on the segmenting method of understanding and based on the segmenting method of adding up.Wherein, comprise again based on the segmenting method of string matching: forward (by left-to-right) maximum matching method, reverse (by the right side to a left side) maximum matching method, minimum syncopation etc., can adopt one of them or combination in any.
Step 110: according to presetting rule the alternative characters string that step 100 obtains is screened, will be through the speech after the screening as neologisms.
Here, screening rule is predefined, and various screening rules can be set as required, for a plurality of screening rules, can use separately, also can be used in combination.
Described screening rule has a lot, such as: the speech of step 100 acquisition and the speech in the existing dictionary are compared, if do not have in the existing dictionary, just think neologisms, for instance, the speech of acquisition comprises " rapid development ", " prancing ", " leap ", wherein, existing in " rapid development ", " prancing " dictionary, only think that then " leap " is neologisms.
For another example: add up the number of times that each speech occurs in chat data, set a threshold value, if number of times less than threshold value, just do not think neologisms; If number of times more than or equal to threshold value, is just thought neologisms, for instance, setting threshold is 500, if " rapid development " occurs 358 times, " prancing " occurs 558 times, and " leap " occurs 20 times, then deletion " rapid development " and " leap ".In this case, can add up many chat records continuously, what comprise that local terminal sends receives with local terminal.
Again such as: length threshold is set, and in conjunction with relatively screening with existing dictionary, concrete, select the speech that is in certain length range earlier, again the speech selected and the speech in the existing dictionary are compared, do not occur just as neologisms, for instance, the preseting length scope is 2 to 8, supposes to receive " Hong Kong ", " return of Hong kong ten anniversary ceremonies ", " center, Hong Kong ", then, keep in " Hong Kong " and " center, Hong Kong " by length redundant rule elimination " return of Hong kong ten anniversary ceremonies ".
If further the alternative characters string is divided into one or more speech in the step 100, then this step is screened the speech that step 100 obtains according to presetting rule.
In actual applications, only adopt the mode of step 100 and 110 to choose neologisms, can not embody the notion of neologisms fully, because neologisms should be to use frequency or say existing frequency speech higher, that do not have in the existing dictionary, so, often occur under environment commonly used in order to ensure the neologisms that obtained, can further increase step 120 in step 110 back, modify steps 110 is step 110a simultaneously:
Step 110a: according to presetting rule alternative characters string or the speech that step 100 obtains screened, obtain satisfactory speech;
In this case, can also increase a lot of screening rules, such as: remove and do not meet the speech that presets word-building, for instance, suppose that the current word-building that presets is the character string of deletion with " what is " beginning, so, after getting access to two character strings of " what is a patent " and " what ",, do not meet the requirement of presetting word-building because " what is a patent " this character string is with " what is " beginning, so delete character string " what is a patent ", only reserved character string " what ".For another example: remove the speech that does not meet the character that presets coding requirement, for instance, the coding requirement of supposing Set For Current is for only selecting to meet the character of Chinese character encoding, get access to " what? patent " time, since wherein "? " belong to non-Chinese character, then should delete "? "
Step 120: statistics also judges whether the speech through screening reaches setting threshold at the number of times that assigned address occurs, if reach, then with corresponding words as neologisms; Otherwise, not as neologisms.
Here, described assigned address can be an internet data, also can be chat record, and what chat record comprised local terminal input sends with the opposite end.Wherein, internet data can have different acquisition approach, such as: get access to any internet web page by crawler technology, the data on the webpage are internet data; The owned data of client comprise electronics article, classification article, such as news, science and technology; The data that other client provides.
The neologisms that utilize said method to obtain can further offer more people by different modes and use, and upgrade the big relatively dictionary of coverage.A kind of implementation is: incorporate the neologisms that obtain the database of various input methods into, so, people just can obtain more neologisms, more convenient user's use after upgrading input method database; Another kind of implementation is: this end subscriber sends to the neologisms that self produce to end subscriber when chatting by instantaneous communication system, after end subscriber is received, can upgrade the dictionary of self with the neologisms that obtain, and, to end subscriber several can be arranged here; Another implementation is, the neologisms that this end subscriber obtains self upload to background server, initiatively downloads IM client to each registration by background server, or downloads to the IM client that demand is arranged according to the request of IM client.Here, said neologisms also can be new dictionaries; Said background server can be an instant communication server, also can be the dictionary server, or other possesses the server of above-mentioned processing capacity.
Usually, step 100 and 110 or step 100~120 can finish in the IM of this locality client; Because chat record also can be sent to background server storage, so, step 100 and 110 or step 100~120 also can finish at background server.
The neologisms that obtain in order to make have more generality, meet the demand of more users, and the present invention can also further adopt the mode of two-layer screening, that is: screen in local IM client and background server simultaneously, and specific practice is:
Earlier local IM client set by step 100 and 110 or step 100~120 screen, the result who obtains is not directly as neologisms, but be sent to background server as alternative word, background server is collected the alternative word of sending from each IM client, screen and add up according to the screening rule of self presetting again, just be equivalent to, in background server side try again step 110 or try again step 110 and 120.The IM client can be identical with the set screening rule of background server, also can be different; The occurrence number threshold value that is provided with also can be identical, can be different.The speech that obtains after the two-layer screening of process can also can be downloaded to the IM client that demand is arranged by the request mode of IM client by the IM client of active downloading mode to each registration as neologisms.
When obtaining neologisms when the IM client is finished, for realizing that the present invention obtains the method for neologisms, the present invention proposes a kind of IM client that realizes that neologisms obtain, as shown in Figure 2, comprise existing text input unit 21, instant message transrecieving unit 22, chat record display unit 23, key is also to comprise: end side screening unit 24.Wherein, text input unit 21 is used to receive and show the information that this end subscriber is imported, and the information of receiving is sent to instant message transrecieving unit 22, chat record display unit 23 and end side screening unit 24; Instant message transrecieving unit 22, the instant message transrecieving unit (not shown) that connects chat record display unit 23 and another IM client, be used for and send to another IM client from the information that this end subscriber that text input unit 21 receives is imported, and the information that another IM client that will receive is sent sends to chat record display unit 23 and end side screening unit 24; Chat record display unit 23 is used to show the information that sends to another IM client of this end subscriber input and the information of receiving from another IM client; End side screening unit 24, reception is from the chat data information of the local terminal input of text input unit 21, and the chat data information of sending from the opposite end of instant message transrecieving unit 22, and screen according to the alternative characters string of presetting rule to the chat data correspondence, obtain neologisms.Wherein, the presetting rule in the end side screening unit 24 preestablishes and is stored in wherein.
The IM client that these realization neologisms obtain can further comprise the participle unit, be positioned at before the end side screening unit 24, the alternative characters string that is used for the chat data correspondence that will obtain is divided into one or more speech, more ready-portioned speech is delivered to end side screening unit 24 and is screened.
The IM client that these realization neologisms obtain can further include statistic unit, be used for the speech after receiving terminal side screening unit 24 screens, and the speech that receives and the data message that obtains from assigned address compared, add up received speech and whether reach setting threshold in the occurrence number of assigned address, the speech that will reach threshold value is as neologisms.Wherein, assigned address can be an internet data, also can be the chat record from local terminal or opposite end.
Instant message transrecieving unit 22 can also be further used for the neologisms that local terminal obtains are sent to the opposite end, or receives the neologisms that send the opposite end.
The IM client that these realization neologisms obtain can further include the server interaction unit, and be used for the neologisms that local terminal obtains are uploaded to background server, or receive the neologisms of background server broadcast transmission, or to the background server request and download neologisms.
Because the present invention obtains the method for neologisms and also can realize at server end, so, when obtaining neologisms when background server is finished, for realizing that the present invention obtains the method for neologisms, the present invention proposes a kind of server of realizing that neologisms obtain, as shown in Figure 3, comprise existing chat data Transmit-Receive Unit 31, key is also to comprise: server side screening unit 32.Wherein, chat data Transmit-Receive Unit 31 receives the chat record that each IM client is sent, and all chat data that will receive are delivered to server side screening unit 32; Server side screening unit 32 screens the alternative characters string according to presetting rule, obtains neologisms.Wherein, the presetting rule in the server side screening unit 32 preestablishes and is stored in wherein.
The server that these realization neologisms obtain can further comprise the participle unit, be positioned at before the server side screening unit 32, the alternative characters string of all chat data correspondences that are used for receiving is divided into one or more speech, more ready-portioned speech is delivered to server side screening unit 32 and is screened.
The server that these realization neologisms obtain can further include statistic unit, be used for the speech after reception server side screening unit 32 screens, and the speech that receives and the data message that obtains from assigned address compared, add up received speech and whether reach setting threshold in the occurrence number of assigned address, the speech that will reach threshold value is as neologisms.Wherein, assigned address can be an internet data, also can be the chat record from local terminal or opposite end.
The neologisms that chat data Transmit-Receive Unit 31 also is further used for obtaining are directly downloaded to each IM client, or download to the IM client that demand is arranged according to the request of IM client.
Among the present invention, the server that described realization neologisms obtain can be an instant communication server, also can be the dictionary server.Chat data Transmit-Receive Unit 31, server side screening unit 32, participle unit and statistic unit can all be arranged in instant communication server or dictionary server, in the time of also can having instant communication server and dictionary server at the same time, chat data Transmit-Receive Unit 31 is arranged in instant communication server, and remaining element is arranged in the dictionary server.
The present invention also proposes a kind of system that obtains neologisms, comprises at least one IM client, background server, and key is that this system also comprises the screening unit, is used for according to presetting rule the alternative characters string being screened, and obtains neologisms.Described screening unit can be positioned at the IM client, also can be positioned at background server, can also be arranged at IM client and background server simultaneously.When the screening unit is positioned at the IM client, is equivalent to system and constitutes by at least one IM client shown in Figure 2 and existing background server; When the screening unit is positioned at background server, is equivalent to system and constitutes by background server shown in Figure 3 and at least one existing IM client; When the screening unit is arranged at IM client and background server simultaneously, be equivalent to system and constitute by at least one IM client shown in Figure 2 and background server shown in Figure 3, in this case, IM client and background server are done two-layer screening.
Be used to transmit the alternative word that filters out between the instant message transrecieving unit of IM client and the chat data Transmit-Receive Unit of background server, or the transmission neologisms.
For first kind and the third situation, the IM client can also can comprise IM client and the existing IM client that can obtain neologisms simultaneously only for obtaining the IM client of neologisms; For second kind of situation, described IM client only is existing IM client.
Equally, this system can further comprise the participle unit, and the alternative characters string that is used for all the chat data correspondences that will obtain is divided into one or more speech, more ready-portioned speech is delivered to the screening unit and is screened.
This system can further include statistic unit, be used for the speech after receiving screen menu unit screens, and the speech that receives and the data message that obtains from assigned address compared, add up received speech and whether reach setting threshold in the occurrence number of assigned address, the speech that will reach threshold value is as neologisms.Wherein, assigned address can be an internet data, also can be the chat record from local terminal or opposite end.
The above is preferred embodiment of the present invention only, is not to be used to limit protection scope of the present invention.

Claims (27)

1. a method of obtaining neologisms is characterized in that, this method comprises:
A, from chat data, obtain the alternative characters string;
B, alternative character string is screened according to presetting rule, will be through the speech after the screening as neologisms.
2. method according to claim 1 is characterized in that, described being retrieved as: the chat data of obtaining the local terminal input; Or for to obtain the chat data of the opposite end that receives, wherein, described opposite end is one or more.
3. method according to claim 1 is characterized in that, described chat data is the chat data of local terminal input; Then steps A is specially:
The IM client software receives the data message of active user by the input method input, when will importing data and being shown in the instant messaging interface as chat record, with current input information as the alternative characters string;
Perhaps, described chat data is the chat data from the opposite end; Then steps A is specially:
The IM client software receives the data message that send the opposite end, when the data message that will receive is shown in the instant messaging interface as chat record, with the data message received as the alternative characters string.
4. method according to claim 1 is characterized in that, further comprises between steps A and the step B: the alternative characters string is divided into one or more speech;
Then step B screens resulting speech according to presetting rule.
5. according to each described method of claim 1 to 4, it is characterized in that this method further comprises: the database of the neologisms that obtain being incorporated into various input methods.
6. according to each described method of claim 1 to 4, it is characterized in that described steps A and step B are finished by the IM client, this method further comprises: the IM client sends to the neologisms that obtain to end subscriber by instantaneous communication system.
7. according to each described method of claim 1 to 4, it is characterized in that, described steps A and step B are finished by the IM client, this method further comprises: the IM client uploads to background server with the neologisms that obtain, initiatively download IM client by background server, or download to the IM client that demand is arranged according to the request of IM client to each registration.
8. according to each described method of claim 1 to 4, it is characterized in that, described steps A and step B are finished by background server, this method further comprises: background server is initiatively downloaded IM client to each registration with the neologisms that obtain, or downloads to the IM client that demand is arranged according to the request of IM client.
9. according to each described method of claim 1 to 4, it is characterized in that, further comprise after the described screening of step B: statistics also judges whether the speech through screening reaches setting threshold at the number of times that assigned address occurs, if reach, then with corresponding words as neologisms; Otherwise not as neologisms.
10. method according to claim 9 is characterized in that, described assigned address is an internet data, or is the chat record from local terminal or opposite end.
11. method according to claim 9 is characterized in that, this method further comprises: the database of the neologisms that obtain being incorporated into various input methods.
12. method according to claim 9 is characterized in that, described steps A and step B are finished by the IM client, and this method further comprises: the IM client sends to the neologisms that obtain to end subscriber by instantaneous communication system.
13. method according to claim 9, it is characterized in that, described steps A and step B are finished by the IM client, this method further comprises: the IM client uploads to background server with the neologisms that obtain, initiatively download IM client by background server, or download to the IM client that demand is arranged according to the request of IM client to each registration.
14. method according to claim 9, it is characterized in that, described steps A and step B are finished by background server, this method further comprises: background server is initiatively downloaded IM client to each registration with the neologisms that obtain, or downloads to the IM client that demand is arranged according to the request of IM client.
15. an IM client that realizes that neologisms obtain comprises: text input unit, instant message transrecieving unit, chat record display unit is characterized in that this IM client also comprises: end side screening unit;
Described text input unit is used to receive and show the information that this end subscriber is imported, and the information of receiving is sent to instant message transrecieving unit, chat record display unit and end side screening unit;
Described instant message transrecieving unit is used for and will sends to another IM client from the information that this end subscriber that the text input unit receives is imported, and the information that another IM client that will receive is sent sends to chat record display unit and end side screening unit;
Described end side screening unit, reception is from the chat data information of the local terminal input of text input unit, and the chat data information of sending from the opposite end of instant message transrecieving unit, and screen according to the alternative characters string of presetting rule to the chat data correspondence, obtain neologisms.
16. IM client according to claim 15, it is characterized in that, this IM client further comprises the participle unit, and the alternative characters string that is used for the chat data correspondence that will obtain is divided into one or more speech, more ready-portioned speech is delivered to end side screening unit and is screened.
17. according to claim 15 or 16 described IM clients, it is characterized in that, this IM client further comprises statistic unit, be used for the speech after receiving terminal side screening unit screens, and the speech that receives and the data message that obtains from assigned address compared, add up received speech and whether reach setting threshold in the assigned address occurrence number, the speech that will reach threshold value is as neologisms.
18. IM client according to claim 17 is characterized in that, described instant message transrecieving unit further is further used for the neologisms that local terminal obtains are sent to the opposite end, or receives the neologisms that send the opposite end.
19. IM client according to claim 17, it is characterized in that this IM client further comprises the server interaction unit, be used for the neologisms that local terminal obtains are uploaded to background server, or receive the neologisms of background server broadcast transmission, or to the background server request and download neologisms.
20. a server of realizing that neologisms obtain comprises the chat data Transmit-Receive Unit; It is characterized in that this server also comprises: server side screening unit;
Described chat data Transmit-Receive Unit receives the chat record that each IM client is sent, and all chat data that will receive are delivered to server side screening unit;
Described server side screening unit screens the alternative characters string according to presetting rule, obtains neologisms.
21. server according to claim 20, it is characterized in that, this server further comprises the participle unit, and the alternative characters string that is used for the chat data correspondence that will obtain is divided into one or more speech, more ready-portioned speech is delivered to server side screening unit and is screened.
22. according to claim 20 or 21 described servers, it is characterized in that, this server further comprises statistic unit, be used for the speech after reception server side screening unit screens, and the speech that receives and the data message that obtains from assigned address compared, add up received speech and whether reach setting threshold in the occurrence number of assigned address, the speech that will reach threshold value is as neologisms.
23. server according to claim 22 is characterized in that, the neologisms that described chat data Transmit-Receive Unit also is further used for obtaining are directly downloaded to each IM client, or download to the IM client that demand is arranged according to the request of IM client.
24. a system that obtains neologisms comprises at least one IM client, background server; It is characterized in that this system also comprises the screening unit, be used for the alternative characters string being screened, obtain neologisms according to presetting rule.
25. system according to claim 24 is characterized in that, described screening unit is positioned at the IM client; Or be positioned at background server; Or be positioned at IM client and background server.
26. according to claim 24 or 25 described systems, it is characterized in that, this system further comprises the participle unit, and the alternative characters string that is used for all the chat data correspondences that will obtain is divided into one or more speech, more ready-portioned speech is delivered to the screening unit and is screened.
27. system according to claim 26, it is characterized in that, this system further comprises statistic unit, be used for the speech after receiving screen menu unit screens, and the speech that receives and the data message that obtains from assigned address compared, add up received speech and whether reach setting threshold in the occurrence number of assigned address, the speech that will reach threshold value is as neologisms.
CNA2007101221872A 2007-09-21 2007-09-21 Method, system and equipment for obtaining neology Pending CN101119334A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2007101221872A CN101119334A (en) 2007-09-21 2007-09-21 Method, system and equipment for obtaining neology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2007101221872A CN101119334A (en) 2007-09-21 2007-09-21 Method, system and equipment for obtaining neology

Publications (1)

Publication Number Publication Date
CN101119334A true CN101119334A (en) 2008-02-06

Family

ID=39055273

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2007101221872A Pending CN101119334A (en) 2007-09-21 2007-09-21 Method, system and equipment for obtaining neology

Country Status (1)

Country Link
CN (1) CN101119334A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467548A (en) * 2010-11-15 2012-05-23 腾讯科技(深圳)有限公司 Identification method and system of new vocabularies
CN101571758B (en) * 2009-06-04 2012-12-12 腾讯科技(深圳)有限公司 Input method system, method and device thereof
US8407236B2 (en) 2008-10-03 2013-03-26 Microsoft Corp. Mining new words from a query log for input method editors
CN103399890A (en) * 2013-07-22 2013-11-20 百度在线网络技术(北京)有限公司 Method and equipment for collecting words on input method client side
CN103440094A (en) * 2013-08-20 2013-12-11 青岛海信传媒网络技术有限公司 Message reply input prompt method and communication terminal used for message reply
CN103823849A (en) * 2014-02-11 2014-05-28 百度在线网络技术(北京)有限公司 Method and device for acquiring entries
CN104091058A (en) * 2014-06-27 2014-10-08 北京君和信达科技有限公司 Safety inspection conclusion submitting method and device
CN104091285A (en) * 2014-07-29 2014-10-08 宁波森浦信息技术有限公司 Method for automatically recognizing quoted prices of bonds
CN105488098A (en) * 2015-10-28 2016-04-13 北京理工大学 Field difference based new word extraction method
CN107544685A (en) * 2016-06-29 2018-01-05 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN108304367A (en) * 2017-04-07 2018-07-20 腾讯科技(深圳)有限公司 Segmenting method and device
CN112784572A (en) * 2021-01-19 2021-05-11 上海明略人工智能(集团)有限公司 Marketing scene conversational analysis method and system

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8407236B2 (en) 2008-10-03 2013-03-26 Microsoft Corp. Mining new words from a query log for input method editors
CN101571758B (en) * 2009-06-04 2012-12-12 腾讯科技(深圳)有限公司 Input method system, method and device thereof
CN102467548B (en) * 2010-11-15 2015-09-16 腾讯科技(深圳)有限公司 A kind of recognition methods of neologisms and system
CN102467548A (en) * 2010-11-15 2012-05-23 腾讯科技(深圳)有限公司 Identification method and system of new vocabularies
CN103399890A (en) * 2013-07-22 2013-11-20 百度在线网络技术(北京)有限公司 Method and equipment for collecting words on input method client side
CN103399890B (en) * 2013-07-22 2016-10-26 百度在线网络技术(北京)有限公司 At the method and apparatus that input method client collects words
CN103440094A (en) * 2013-08-20 2013-12-11 青岛海信传媒网络技术有限公司 Message reply input prompt method and communication terminal used for message reply
CN103823849A (en) * 2014-02-11 2014-05-28 百度在线网络技术(北京)有限公司 Method and device for acquiring entries
CN104091058A (en) * 2014-06-27 2014-10-08 北京君和信达科技有限公司 Safety inspection conclusion submitting method and device
CN104091285A (en) * 2014-07-29 2014-10-08 宁波森浦信息技术有限公司 Method for automatically recognizing quoted prices of bonds
CN104091285B (en) * 2014-07-29 2018-02-06 宁波森浦信息技术有限公司 The method of automatic identification bond price quotation
CN105488098A (en) * 2015-10-28 2016-04-13 北京理工大学 Field difference based new word extraction method
CN105488098B (en) * 2015-10-28 2019-02-05 北京理工大学 A kind of new words extraction method based on field otherness
CN107544685A (en) * 2016-06-29 2018-01-05 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN108304367A (en) * 2017-04-07 2018-07-20 腾讯科技(深圳)有限公司 Segmenting method and device
WO2018184510A1 (en) * 2017-04-07 2018-10-11 腾讯科技(深圳)有限公司 Word partitioning method and device and storage medium
CN108304367B (en) * 2017-04-07 2021-11-26 腾讯科技(深圳)有限公司 Word segmentation method and device
CN112784572A (en) * 2021-01-19 2021-05-11 上海明略人工智能(集团)有限公司 Marketing scene conversational analysis method and system

Similar Documents

Publication Publication Date Title
CN101119334A (en) Method, system and equipment for obtaining neology
CN104253741B (en) A kind of method for sending information, relevant apparatus and system
CN103246703A (en) Method and equipment for determining application word banks
US7526300B2 (en) Short message management system using a VM application and a mobile communication terminal
EP2896162B1 (en) Determining additional information associated with geographic location information
CN105323594A (en) Online live-broadcast list generation method and device
CN101192235A (en) Method, system and equipment for delivering advertisement based on user feature
EP2103089A2 (en) A method and system for personalized content delivery for wireless devices
CN102150161A (en) Ranking search results based on affinity criteria
CN106020504B (en) Information output method and device
CN102473189A (en) Providing link to portion of media object in real time in social networking update
EP1642470A2 (en) Content publishing over mobile networks
CN101566995A (en) Method and system for integral release of internet information
CN101106784B (en) Information sorting and method, system and device for establishing initial network communication contact book
JP6161227B2 (en) Input resource push method, system, computer storage medium and device
CN101140585A (en) User thesaurus management system and management method
CN104951544A (en) User data processing method and system and method and system for providing user data
CN101083545A (en) System and method for publishing information in chat room
CN106713950A (en) Video service system based on prediction and analysis of user behaviors
CN102567376A (en) Method and device for recommending personalized search results
US20120054598A1 (en) Method and system for viewing web page and computer Program product thereof
CN104765746A (en) Data processing method and device for mobile communication terminal browser
EP1164504A3 (en) System for broadcasting requested pieces of music utilizing information system
CN106899947A (en) Short message method for cleaning and device
CN108012558B (en) Telephone number normalization and information discoverability

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20080206