CN1983252A - Extraction for instant message subject content - Google Patents
Extraction for instant message subject content Download PDFInfo
- Publication number
- CN1983252A CN1983252A CN 200510134454 CN200510134454A CN1983252A CN 1983252 A CN1983252 A CN 1983252A CN 200510134454 CN200510134454 CN 200510134454 CN 200510134454 A CN200510134454 A CN 200510134454A CN 1983252 A CN1983252 A CN 1983252A
- Authority
- CN
- China
- Prior art keywords
- subject content
- key word
- instant message
- extracting method
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
A method for picking up subject content of demand message includes picking up key word from demand message according to preset pick-up condition then selecting out subject content from picked up key word according to policy.
Description
Technical field
The present invention relates to computing machine and communication technical field, relate in particular to a kind of subject content extracting method of instant message.
Background technology
Along with popularizing of the development of Internet and network, the use of instant messaging (IM, Instant Messenger) software is slowly the indispensable online communation mode in people's routine work, the studying and living that becomes.By IM software, people can carry out the communication of literal, sound and video in real time.When the user carries out text chat by IM software, how current instant message is carried out the intelligent extraction of subject content, and the subject information how application fetches arrives, present IM software do not bring better experience and convenient to the user as yet.
In the text chat of IM software was used at present, still unmatchful chat content was carried out the application of the intelligent extraction of subject content.When the user in chat process, if want when institute's topics of interest content is carried out information search in the chat content, the user must manually select the also subject information of copy chat, initiates self search engine simultaneously (as Google, Baidu etc.), the chat subject content is carried out information search.If the user thinks a result of search and shares the other side who gives chat, also necessary manually copy information, so mode has been brought great inconvenience to the user.
Secondly, in the Intelligent Extraction Technology of at present existing subject content based on literal, performance is bad on real-time and correlativity.So-called real-time refers to whether the subject content that captures is the theme of current chat content; Correlativity is promptly caught the accuracy of subject content.
Summary of the invention
The invention provides a kind of method, can't be in the prior art to solve to the problem of the subject content intelligent extraction in the instant messaging.
The invention provides a kind of method, further solve in the prior art problem that the subject content in can only the manual extraction instant messaging is used.
The invention provides following technical scheme:
A kind of subject content extracting method that is applied in the instant messaging comprises the steps:
A, the predetermined extraction conditions of basis extract key word from instant message;
B, from the key word that extracts, determine subject content according to strategy.
Wherein:
Described predetermined extraction conditions is: all literal and theme dictionary that instant message comprised are mated, and with the frequency of occurrences is the highest in the matching result word as key word.
Described instant message is the instant message that is present in the moving window.
Described moving window is provided with window size and 2 parameters of sliding speed, and moving window is according to these 2 real-time catching instant message of parameter.
The setting of described theme dictionary can be expanded, and can freely add, delete one or more dictionaries, also can increase or delete one or more words in dictionary.
Described predetermined extraction conditions is meant that all literal to instant message carry out word occurrence rate statistics, and the highest word of the frequency of occurrences as key word.
All literal of instant message are carried out word occurrence rate statistics, and the highest word of the frequency of occurrences as key word.
Described strategy is: calculate the degree of correlation that each key word extract occurs, this degree of correlation and threshold value compared, and greater than threshold value the time the key word of this degree of correlation correspondence as subject content.
Described key word is to be present in the key word formation of the first in first out with the length of preestablishing.
The described degree of correlation is the ratio of the length of the number of times that occurs in the key word formation of key word and key word formation.
After determining subject content, start search engine, search for and display of search results according to subject content.
Described display of search results for part or all of Search Results is shown in both sides, in many ways or on the instant message window of any one party.
Beneficial effect of the present invention is as follows:
When the user carries out instant messaging by IM software, can in time carry out the intelligent extraction of subject content, and the subject content that can application fetches arrives brings better experience and convenient for IM user to current instant message.
Description of drawings
Fig. 1 is a schematic flow sheet of the present invention.
Embodiment
The present invention carries out intelligent extraction according to instant message real-time among the IM to subject content, thereby obtains the subject content of the current chat of user.In the acquisition procedure of subject content, the present invention has used sliding window technique, key word formation and degree of subject relativity parameter, thereby has realized extracting the real-time and the correlativity of subject content.
The present invention has designed a moving window, is used to catch the key word in this moving window, and key word only carries out statistics and analysis to the instant message that is present in real time in the moving window as the foundation and the basis that finally obtain subject content.Moving window has a certain size, and slides with certain speed, and every slip is once just carried out statistics and analysis one time to the instant message in the moving window, and the least unit of statistical study is an instant message.
Moving window is provided with two parameters: window size (WindowSize) and sliding speed (SlideVelocity), unit all is the instant message number.For example, work as WindowSize=2, during SlideVelocity=2, expression keeps two instant messages are handled in the moving window, simultaneously, and when the new instant message of two of appearance, moving window is to two of lower slider, and 2 instant messages in the moving window are carried out statistics and analysis.Generally, sliding speed equals window size, because, if the partial record that sliding speed during less than window size, then can occur in the moving window is repeated statistical treatment; Otherwise, if sliding speed during greater than window size, the part instant message then can occur and be omitted, not by statistical treatment.
IM user can define this two variablees according to own demand, and the initial value of variable is set to: WindowSize=2, SlideVelocity=2.
When moving window slided at every turn, the instant message that all can newly obtain from moving window extracted key word.
Custom and common-use words according to people's text chat are provided with a theme dictionary, comprise the theme that weather dictionary, sight spot dictionary, physical culture dictionary, military dictionary etc. are commonly used.In each theme dictionary, collect the common word of related category.For example, in the weather dictionary, have: weather, weather, temperature, cold, sweltering heat etc.; In the dictionary of sight spot, have: Jiu Zhaigou, Zhangjiajie, Gulang Island, Mount Huang, the West Lake etc.Being set to and can expanding of theme dictionary can add or delete one or more dictionaries freely, also can increase or delete one or more words easily in one or more theme dictionaries.
Had after the theme dictionary, just can add up, extract key word.When moving window slided according to the parameter that sets in advance, all literal that will be present in instant message in the moving window mated in each theme dictionary.In matching process, note word that matches and the frequency that this word occurs, according to matching result, obtain the key word in this moving window.In matching process,, then therefrom select a word that the frequency of occurrences is the highest as final key word if a plurality of matching results are arranged.
Yet,, have the possibility that does not match any key word by above-mentioned coupling.
Therefore, after the coupling of finishing the theme dictionary,, then carry out the word frequency statistics operation if do not match key word.Promptly, when the instant message in the moving window can not find any coupling in the theme dictionary, illustrate that then user's chatting contents is not in theme dictionary scope, at this moment, all literal that instant message in the moving window is comprised carry out the word statistics, and note the highest word of the frequency of occurrences, this word as the key word that extracts.
Thus, after the coupling of finishing the theme dictionary and word frequency statistics, one extracts the key word in the current moving window surely.
By above-mentioned processing, each slip of moving window all can produce a key word, after moving window slides through several times, a plurality of key words can occur, then according to certain strategy, filters out one of them as subject content from these key words.
At first, design a key word formation, be used to deposit each the slip afterwards of moving window and the key word that produces.This key word formation is according to first in first out (First In First Out, formation rule FIFO), and have certain length restriction.By the length of formation is set, can control the time validity of key word, that is, in being chat before for a long time, extracts a key word, and so, this key word just should not become the subject content of current chat.So by the length restriction and the fifo queue rule of formation, the key word of joining the team in the recent period can make the accurate topic keyword of joining the team in early days team.Thus, the subject content that goes out based on the key word priority-queue statistic all is current up-to-date chat theme forever.The initial value of queue length is made as 5, and the expression formation keeps 5 up-to-date key words.
Secondly, in order to be used to characterize the frequency that similar theme occurs, introduce a degree of subject relativity, when the frequency that occurs is up to or surpass degree of correlation threshold value, just think that key word is exactly the subject content of current chat.The degree of correlation of each key word equals, and the number of times that this key word occurs in the key word formation is divided by the numerical value of the length gained of key word formation.For example, key word " West Lake " has occurred in the key word formation 4 times, and the length of formation is 5, and the degree of correlation of key word " West Lake " is 80% so, if predefined degree of correlation threshold value is 60%, then key word " West Lake " is just screened as subject content.
Be provided with after the key word queue length and the degree of correlation, just can have obtained key word.Whenever moving window slides once, all key words in the key word formation are carried out the calculating of the degree of correlation, take out the key word and the predefined degree of correlation threshold of degree of correlation maximum, if surpass this degree of correlation threshold value, then the key word of the maximal correlation degree of Qu Chuing is final topic keyword; On the contrary,, illustrate that then current chat does not have extractible theme theme, wait for the slip of moving window next time if do not surpass threshold value.
When each window slides, all obtain the subject content of user's chat in real time, in case the subject content of getting access to, then start the search engine of prior appointment, such as google or Baidu etc., this subject content is searched for, and part or all of (can be redefined for 3) of Search Results link is presented on the chat conversations window of IM software, may be displayed on the chat both sides, in many ways or on the chat conversations window of any one party, thus, IM user does not need own manual operation, just can inquire about fast current chat subject content easily, and share Query Result automatically.
Below in conjunction with accompanying drawing complete scheme of the present invention is described further, sees also Fig. 1, the method that provides of the present invention is:
Step 100: will be present in all literal that instant message comprised and theme dictionary coupling in the moving window;
Step 200: if match word, then that the frequency of occurrences is the highest word is pressed in the key word formation and direct execution in step 400 as key word, if do not match any word, then execution in step 300;
Step 300: all literal that instant message comprised are carried out the statistics of the frequency of occurrences, and the highest word of statistics medium frequency is pressed in the key word formation;
Step 400: calculate the degree of correlation of all key words in the key word formation, and from the result, take out the maximal correlation degree and compare with degree of correlation threshold value, if above degree of correlation threshold value, then the key word of this maximal correlation degree correspondence as subject content;
Step 500: use the subject content filter out, start the search engine of setting in advance, and Search Results shared automatically be presented at the both sides that chat, in many ways or on the chat conversations window of any one party.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.
Claims (11)
1, a kind of subject content extracting method of instant message is characterized in that, comprises the steps:
A, the predetermined extraction conditions of basis extract key word from instant message;
B, from the key word that extracts, determine subject content according to strategy.
2, subject content extracting method as claimed in claim 1 is characterized in that, described predetermined extraction conditions is: all literal and theme dictionary that instant message comprised are mated, and with the frequency of occurrences is the highest in the matching result word as key word.
3, subject content extracting method as claimed in claim 2 is characterized in that, described instant message is the instant message that is present in the moving window.
4, subject content extracting method as claimed in claim 3 is characterized in that, described moving window is provided with window size and 2 parameters of sliding speed, and moving window is according to these 2 real-time catching instant message of parameter.
5, subject content extracting method as claimed in claim 2 is characterized in that, the setting of theme dictionary can be expanded, and can freely add, delete one or more dictionaries, also can increase or delete one or more words in dictionary.
6, subject content extracting method as claimed in claim 1 is characterized in that, described predetermined extraction conditions is meant that all literal to instant message carry out word occurrence rate statistics, and the highest word of the frequency of occurrences as key word.
7, subject content extracting method as claimed in claim 1, it is characterized in that, described strategy is: calculate the degree of correlation that each key word extract occurs, this degree of correlation and threshold value compared, and greater than threshold value the time the key word of this degree of correlation correspondence as subject content.
8, subject content extracting method as claimed in claim 7 is characterized in that, described key word is to be present in the key word formation of the first in first out with the length of preestablishing.
As claim 7 or 8 described subject content extracting method, it is characterized in that 9, the described degree of correlation is the ratio of the length of the number of times that occurs of key word and key word formation in the key word formation.
10, subject content extracting method as claimed in claim 1 is characterized in that, after determining subject content, starts search engine, searches for and display of search results according to subject content.
11, subject content extracting method as claimed in claim 10 is characterized in that, described display of search results for part or all of Search Results is shown in both sides, in many ways or on the instant message window of any one party.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2005101344549A CN100410943C (en) | 2005-12-15 | 2005-12-15 | Extraction for instant message subject content |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2005101344549A CN100410943C (en) | 2005-12-15 | 2005-12-15 | Extraction for instant message subject content |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1983252A true CN1983252A (en) | 2007-06-20 |
CN100410943C CN100410943C (en) | 2008-08-13 |
Family
ID=38165792
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2005101344549A Active CN100410943C (en) | 2005-12-15 | 2005-12-15 | Extraction for instant message subject content |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100410943C (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005107578A1 (en) | 2004-05-06 | 2005-11-17 | Koninklijke Philips Electronics N. V. | Protection mechanism for spectroscopic analysis of biological tissue |
WO2009021429A1 (en) * | 2007-08-13 | 2009-02-19 | Tencent Technology (Shenzhen) Company Limited | Method and device for dealing with the instant messaging information |
CN102246497A (en) * | 2008-12-03 | 2011-11-16 | 索尼爱立信移动通讯有限公司 | Controlling sound characteristics of alert tunes that signal receipt of messages responsive to content of the messages |
CN102520853A (en) * | 2011-11-29 | 2012-06-27 | 上海量明科技发展有限公司 | Method, terminal and system for triggering instant messaging interaction interface |
CN102594734A (en) * | 2012-03-24 | 2012-07-18 | 上海量明科技发展有限公司 | Method, terminal and system for starting instant communication interactive interface |
CN103150318A (en) * | 2011-11-29 | 2013-06-12 | 国际商业机器公司 | Automatically recommending asynchronous discussion forum posts during a real-time collaboration |
CN103577587A (en) * | 2013-11-08 | 2014-02-12 | 南京绿色科技研究院有限公司 | News theme classification method |
CN104683218A (en) * | 2015-02-11 | 2015-06-03 | 广州酷狗计算机科技有限公司 | Service warning method and device |
CN105721668A (en) * | 2014-12-22 | 2016-06-29 | Lg电子株式会社 | Mobile terminal and method of controlling therefor |
CN106294725A (en) * | 2016-08-08 | 2017-01-04 | 浪潮集团有限公司 | A kind of key message extracting method based on mobile terminal instant messaging application |
CN107453978A (en) * | 2017-07-06 | 2017-12-08 | 深圳Tcl新技术有限公司 | Data statistical approach, mobile terminal, server and storage medium based on group |
CN107770037A (en) * | 2016-08-18 | 2018-03-06 | 丁雷 | By group chat Content Transformation into the method for seminar, server, terminal and system |
CN111352685A (en) * | 2020-02-28 | 2020-06-30 | 北京百度网讯科技有限公司 | Input method keyboard display method, device, equipment and storage medium |
CN111859119A (en) * | 2020-06-30 | 2020-10-30 | 维沃移动通信有限公司 | Information processing method and device |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SG99879A1 (en) * | 2001-01-18 | 2003-11-27 | Buzzcity Pte Ltd | Wireless messaging system and method |
CN1251108C (en) * | 2001-03-26 | 2006-04-12 | 腾讯科技(深圳)有限公司 | Instant messaging system and method |
US7640341B2 (en) * | 2003-06-19 | 2009-12-29 | Microsoft Corporation | Instant messaging for multi-user computers |
US20050044143A1 (en) * | 2003-08-19 | 2005-02-24 | Logitech Europe S.A. | Instant messenger presence and identity management |
EP1515521A3 (en) * | 2003-09-12 | 2005-07-06 | Axel Druschel | A method and an apparatus for automatic conversion of text-based messages to a query of internet-based applications |
KR20050096422A (en) * | 2004-03-30 | 2005-10-06 | (주)퓨쳐인포넷 | System and method for interlocking instant messaging services and web services |
-
2005
- 2005-12-15 CN CNB2005101344549A patent/CN100410943C/en active Active
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005107578A1 (en) | 2004-05-06 | 2005-11-17 | Koninklijke Philips Electronics N. V. | Protection mechanism for spectroscopic analysis of biological tissue |
WO2009021429A1 (en) * | 2007-08-13 | 2009-02-19 | Tencent Technology (Shenzhen) Company Limited | Method and device for dealing with the instant messaging information |
US8204946B2 (en) | 2007-08-13 | 2012-06-19 | Tencent Technology (Shenzhen) Company Ltd. | Method and apparatus for processing instant messaging information |
CN102246497A (en) * | 2008-12-03 | 2011-11-16 | 索尼爱立信移动通讯有限公司 | Controlling sound characteristics of alert tunes that signal receipt of messages responsive to content of the messages |
US8718610B2 (en) | 2008-12-03 | 2014-05-06 | Sony Corporation | Controlling sound characteristics of alert tunes that signal receipt of messages responsive to content of the messages |
CN102246497B (en) * | 2008-12-03 | 2014-09-17 | 索尼爱立信移动通讯有限公司 | Controlling sound characteristics of alert tunes that signal receipt of messages responsive to content of the messages |
CN102520853A (en) * | 2011-11-29 | 2012-06-27 | 上海量明科技发展有限公司 | Method, terminal and system for triggering instant messaging interaction interface |
CN103150318A (en) * | 2011-11-29 | 2013-06-12 | 国际商业机器公司 | Automatically recommending asynchronous discussion forum posts during a real-time collaboration |
US9294420B2 (en) | 2011-11-29 | 2016-03-22 | International Business Machines Corporation | Augmenting a real-time collaboration with ranked electronic bulletin board posts |
CN102594734A (en) * | 2012-03-24 | 2012-07-18 | 上海量明科技发展有限公司 | Method, terminal and system for starting instant communication interactive interface |
CN103577587A (en) * | 2013-11-08 | 2014-02-12 | 南京绿色科技研究院有限公司 | News theme classification method |
CN105721668A (en) * | 2014-12-22 | 2016-06-29 | Lg电子株式会社 | Mobile terminal and method of controlling therefor |
CN104683218A (en) * | 2015-02-11 | 2015-06-03 | 广州酷狗计算机科技有限公司 | Service warning method and device |
CN104683218B (en) * | 2015-02-11 | 2018-10-23 | 广州酷狗计算机科技有限公司 | Service the method and device of early warning |
CN106294725A (en) * | 2016-08-08 | 2017-01-04 | 浪潮集团有限公司 | A kind of key message extracting method based on mobile terminal instant messaging application |
CN106294725B (en) * | 2016-08-08 | 2019-02-15 | 浪潮集团有限公司 | A kind of key message extracting method based on mobile terminal instant messaging application |
CN107770037A (en) * | 2016-08-18 | 2018-03-06 | 丁雷 | By group chat Content Transformation into the method for seminar, server, terminal and system |
CN107453978A (en) * | 2017-07-06 | 2017-12-08 | 深圳Tcl新技术有限公司 | Data statistical approach, mobile terminal, server and storage medium based on group |
CN111352685A (en) * | 2020-02-28 | 2020-06-30 | 北京百度网讯科技有限公司 | Input method keyboard display method, device, equipment and storage medium |
CN111352685B (en) * | 2020-02-28 | 2024-04-09 | 北京百度网讯科技有限公司 | Display method, device, equipment and storage medium of input method keyboard |
CN111859119A (en) * | 2020-06-30 | 2020-10-30 | 维沃移动通信有限公司 | Information processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN100410943C (en) | 2008-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100410943C (en) | Extraction for instant message subject content | |
CN110457404B (en) | Social media account classification method based on complex heterogeneous network | |
CN103309998B (en) | A kind of message query method and device, terminal device | |
CN103761242B (en) | Search method, searching system and natural language understanding system | |
CN103617169B (en) | A kind of hot microblog topic extracting method based on Hadoop | |
CN106357416A (en) | Group information recommendation method, device and terminal | |
CN103116605A (en) | Method and system of microblog hot events real-time detection based on detection subnet | |
CN104615627B (en) | A kind of event public feelings information extracting method and system based on microblog | |
CN105159938B (en) | Search method and device | |
CN103823844A (en) | Question forwarding system and question forwarding method on the basis of subjective and objective context and in community question-and-answer service | |
CN103761279B (en) | Method and system for scheduling network crawlers on basis of keyword search | |
CN105095434B (en) | The recognition methods of timeliness demand and device | |
CN102456054B (en) | A kind of searching method and system | |
CA2753775C (en) | Method and device for extracting characteristic relation circle from network | |
CN107609103A (en) | It is a kind of based on push away spy event detecting method | |
CN102750346A (en) | Method, system and terminal device for recommending software | |
CN112235230B (en) | Malicious traffic identification method and system | |
CN112149422B (en) | Dynamic enterprise news monitoring method based on natural language | |
CN103942268A (en) | Method and device for combining search and application and application interface | |
CN106372083B (en) | A kind of method and system that controversial news clue is found automatically | |
CN114881041A (en) | Multi-dimensional intelligent extraction system for microblog big data hot topics | |
CN110008405A (en) | A kind of personalization message method for pushing and system based on timeliness | |
CN102810103A (en) | Search result sharing method and system | |
CN104063479B (en) | A kind of branded network temperature computational methods based on community network | |
CN112199601B (en) | News recommendation method based on event popularity of mass news data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |