CN102129440A - Method and system for directional push of information - Google Patents

Method and system for directional push of information Download PDF

Info

Publication number
CN102129440A
CN102129440A CN2010100428181A CN201010042818A CN102129440A CN 102129440 A CN102129440 A CN 102129440A CN 2010100428181 A CN2010100428181 A CN 2010100428181A CN 201010042818 A CN201010042818 A CN 201010042818A CN 102129440 A CN102129440 A CN 102129440A
Authority
CN
China
Prior art keywords
information
word
input
directed
word frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010100428181A
Other languages
Chinese (zh)
Inventor
万春晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Beijing Co Ltd
Original Assignee
Tencent Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Beijing Co Ltd filed Critical Tencent Technology Beijing Co Ltd
Priority to CN2010100428181A priority Critical patent/CN102129440A/en
Publication of CN102129440A publication Critical patent/CN102129440A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for directional push of information, which comprises the following steps of: acquiring input information of a user, wherein the input information comprises words input by an input method; counting a word frequency according to the words, and sorting the words according to the word frequency; searching a directional information list for the directional information matched with the words input by the user according to a descending order of the word frequency; and returning back the matched directional information to the user. The invention further relates to a system for directional push of information, and a client for directional push of information. In the method, the system and the client provided by the invention, the words input by the user are acquired by the input method without limit of a chatting dialog box, and the comprehensiveness of extraction of user input features is improved; and consequently, the accuracy of the pushed information is improved. By calling and judging whether the input words are nouns, the accuracy of the extraction of the user input features is improved. The word frequency data is counted and sorted by large root heap, so that the time complexity is low, and the sorting efficiency is high. Only 50-100 words with higher word frequency are matched, so that the accuracy of the pushed information is improved.

Description

Directed method for pushing of information and system
[technical field]
The present invention relates to network information processing technology field, particularly relate to directed method for pushing of a kind of information and system.
[background technology]
The directed propelling movement of information is meant at the user and needs and interested content, sends corresponding information to the user, comprises general knowledge, news, weather forecast, advertisement etc.
In the directed push technology of traditional internet information, method based on user's input feature vector mainly is the information of importing in the dialog box of chat according to the user, extract the user's interest keyword, feature to the user is analyzed, thereby obtain user's point-of-interest, and then the orientation of the information of realization pushes.
There are two shortcomings in the directed push technology of traditional internet information: (1) user's input feature vector is imperfect.Rich and the diversity of software makes that the feature extraction face that is undertaken by the chat conversations frame merely is too narrow, comprehensively the characteristic information of digging user.(2) user's input feature vector extracts inaccurate.There is defective in the technology of accurately extracting keyword in the complicated statement, makes by the accuracy of the user characteristics that extracts from statement lower.
Two above-mentioned shortcomings finally can cause the inaccurate problem of directed information to user's propelling movement, and just the directed information of Tui Songing is not the information of user's interest, needs.
[summary of the invention]
Angle the inaccurate problem of directed information in order to solve tradition based on the directed method for pushing propelling movement of information of user's input feature vector, be necessary to provide a kind of information accurately directed method for pushing.
The directed method for pushing of a kind of information comprises the steps: to gather user's input information, and described input information comprises the word by the input method input; According to described word statistics word frequency, described word is sorted according to word frequency; Search and sort after the directed information that is complementary of word; The directed information of described coupling is exported to the user.
Preferably, the step of described statistics word frequency is: judge whether the user is noun by the word of input method input, if then add up word frequency according to the noun of user's input.
Preferably, adopt the word frequency of the word of the big root heap statistics user input line ordering of going forward side by side, the word and the word frequency thereof of each node statistics user input of described big heap.
Preferably, described step of searching directed information specifically is only 50~100 the highest words of word frequency to be searched.
Preferably, described user's input information also comprises customer identification number, and described customer identification number is corresponding only to be used to add up the big root heap that the user imports word and word frequency thereof.
Also be necessary to provide a kind of information accurately directed supplying system.
The directed supplying system of a kind of information comprises input acquisition module, word frequency statistics and order module, directed information matching module and directed information output module; The input acquisition module is used to gather user's input information, and described input information comprises the word by the input method input; Word frequency statistics and order module are added up the word frequency of the word of input, and according to word frequency described word are sorted; The directed information matching module search and sort after the directed information that is complementary of word; The directed information output module receives the directed information that described directed information matching module finds, and this directed information is exported to the user.
Preferably, described word frequency statistics and order module adopt the word frequency of the word of the big root heap statistics user input line ordering of going forward side by side, the word and the word frequency thereof of each node statistics user input of described big heap.
Preferably, this system also comprises the part of speech judge module, and described part of speech judge module is used to judge whether the word that described input acquisition module collects is noun; If, then this word is transferred in word frequency statistics and the order module, adopt big its word frequency of root heap statistics, according to word frequency described word is sorted then.
Preferably, the input information of described input acquisition module collection also comprises customer identification number, described customer identification number is corresponding only to be used to add up the big root heap that the user imports word and word frequency thereof, and described system comprises that also the word frequency that is used to store described big heap stores up the storage module.
Preferably, described directed information matching module is only searched 50~100 the highest words of word frequency.
The word of user's input is gathered by input method by directed method for pushing of above-mentioned information and system, is not subjected to the limitation of chat conversations frame, has improved the comprehensive of user's input feature vector extraction; Thereby can improve the accuracy of the information that pushes, the information user of propelling movement more needs, interested.
By judging whether the speech of importing is noun, improved the accuracy that user's input feature vector extracts.
Adopt big root heap the word frequency data are added up and to sort, time complexity is O (nlogn), and complexity is low, ordering efficient height.
Only 50~100 higher words of word frequency are mated, can catch user's point-of-interest more accurately, improve the accuracy of the information that pushes.
[description of drawings]
Fig. 1 is the process flow diagram of the directed method for pushing of information among the embodiment;
Fig. 2 is the process flow diagram of the directed method for pushing server of information among the embodiment;
Fig. 3 is the process flow diagram of the directed method for pushing client of information among the embodiment;
Fig. 4 is the synoptic diagram of the directed supplying system of information among the embodiment;
Fig. 5 is the structural representation of client among the embodiment;
Fig. 6 is the structural representation of server among the embodiment.
[embodiment]
Fig. 1 is the process flow diagram of the directed method for pushing of information among the embodiment.This method is set up the directed information tabulation in advance, and is further comprising the steps of:
S110 gathers user's input information.User's input information comprises customer identification number and the word of importing by input method.
S120, the word frequency of the word of statistics input, and according to word frequency word is sorted.In a preferred embodiment, can call the user profile of instant messenger, each customer identification number that is instant messenger is set up a exclusive word frequency data.The word frequency data can be added up by modes such as array, chained lists, in a preferred embodiment, adopt big root heap to add up and sort.Big root heap is a complete binary tree, and its arbitrary non-child node is all more than or equal to its child node, i.e. the root node maximum.The corresponding only big root heap (word frequency heap) that the user imports word and word frequency thereof that is used to add up of customer identification number.The word and the word frequency thereof of each node statistics user input of word frequency heap, the node that word frequency is the highest is positioned at the heap top.
S130 searches the directed information that the word with user's input is complementary successively according to word frequency height.Extract the node of big root heap from heap top recurrence, from the directed information tabulation, search the directed information that the word with user's input is complementary successively according to the word frequency ordering.In order to push directed information more accurately, in a preferred embodiment, only 50~100 the highest words of word frequency are searched.Each word can be complementary with a plurality of directed informations, and every directed information can be complementary with a plurality of words.
S140 returns to the user with directed information.
In another preferred embodiment, said method can be divided into workflow at client and server.Fig. 2 is the process flow diagram of the directed method for pushing client of information among another embodiment.Fig. 3 is the process flow diagram of the directed method for pushing server of information among another embodiment.Wherein, the flow process of the directed method for pushing client of information comprises the steps:
S210 gathers user's input information.The word of gathering customer identification number and importing by input method accesses the exclusive word frequency heap of this customer identification number (if this user logins then newly-built word frequency heap first) then, and customer identification number and exclusive word frequency heap data are sent to server.What it is pointed out that its transmission is not to be complete word frequency heap data.In a preferred embodiment, be the word frequency data of 100 the highest speech of word frequency; In other embodiment, also can be the highest 50 or the word frequency data of other quantity of word frequency.
S220 judges whether it is noun.Judge by the interface and the dictionary that call input method whether the word of input is noun, if then enter next step, otherwise returns S210.
S230 adjusts the word frequency heap.The word frequency heap that active user's identification number is exclusive is adjusted, if do not have the noun that collects in the word frequency heap, then increased newly the node of this speech of storage, and the word frequency of this speech is changed to 1; If had this speech in the word frequency heap, then the word frequency with this speech adds 1.Then word frequency heap is resequenced, preceding 100 if the word frequency of this speech is positioned in a preferred embodiment, then preceding 100 nodes are carried out heapsort, otherwise do not resequence, to raise the efficiency.If current have a plurality of customer identification numbers to login, then simultaneously a plurality of word frequency heaps are adjusted.
The flow process of the directed method for pushing server of information comprises the steps: in same embodiment
S310, the word frequency data of reception client.
S320 searches the directed information that the word with user's input is complementary successively according to word frequency height.Server has the tabulation of directed information, comprise various directed informations and with the word of this directed information coupling, directed information can with a plurality of word match, word also can with a plurality of directed informations couplings.
S330 gathers the directed information array and sends to client output.The directed information of the word match in all and the word frequency data is aggregated into a directed information array, and sends to client and export, check for the user.
Fig. 4 is the synoptic diagram of the directed supplying system of information among the embodiment.Comprise input acquisition module 402, word frequency statistics and order module 404, directed information matching module 412 and directed information output module 406.
Input acquisition module 402 is used to gather user's input information, comprises customer identification number and the word of importing by input method.
Word frequency statistics and order module 404 are connected with input acquisition module 402, add up the word frequency of the word of input, and according to word frequency word are sorted.In a preferred embodiment, adopt big root heap to add up and sort.The word and the word frequency thereof of each node statistics user input of big root heap, the node that word frequency is the highest is positioned at the heap top.
Directed information matching module 412 receives the word frequency heap (being big root heap) that word frequency statisticses and order module 404 generate, and searches the directed information that the word with user's input is complementary successively according to the word frequency height.Specifically be the node that extracts big root heap from heap top recurrence, from the directed information tabulation, search the directed information that the word with user's input is complementary successively according to the word frequency ordering.In order to push directed information more accurately, in a preferred embodiment, only 50~100 the highest words of word frequency are searched.Each word can be complementary with a plurality of directed informations, and every directed information can be complementary with a plurality of words.
Directed information output module 406 receives the directed information that directed information matching module 412 words that find and user's input are complementary, and this directed information is returned to the user.
In another preferred embodiment, the directed supplying system of above-mentioned information can be divided into client 40 and server 41.Fig. 5 is the synoptic diagram of client 40 among the embodiment, and Fig. 6 is the synoptic diagram of server 41 among the embodiment.Client 40 comprises that input acquisition module 402, part of speech judge module 403, word frequency statistics and order module 404, word frequency store up storage module 405, first communication module 407 and directed information output module 406, and server 41 comprises second communication module 411, directed information matching module 412, directed information list storage module 413.
Input acquisition module 402 is used to gather the word by the input method input, also be used to gather customer identification number, and be connected with word frequency statistics and order module 404, store up storage for each identification number in word frequency and set up exclusive word frequency heap in the module 405, customer identification number is corresponding only to be used to add up the big root heap (word frequency heap) that the user imports word and word frequency thereof.When different customer identification number login system, only its exclusive word frequency heap is adjusted.
Part of speech judge module 403 is connected with input acquisition module 402, is used to judge whether the word that collects is noun, if then this word is transferred in word frequency statistics and the order module 404; If not, then this word is not transferred in word frequency statistics and the order module 404.
Word frequency statistics and order module 404 receive the word of part of speech judge module 403 transmission, and word frequency is stored up in the storage module 405 the exclusive word frequency of active user's identification number heap adjust: if do not have this speech in the word frequency heap, then increase the node of this speech of storage newly, and the word frequency of this speech is changed to 1; If had this speech in the word frequency heap, then the word frequency with this speech adds 1.Then word frequency heap is resequenced, preceding 100 if the word frequency of this speech is positioned in a preferred embodiment, then preceding 100 nodes are carried out heapsort, otherwise do not resequence, to reduce the resource consumption of system, raise the efficiency.
Word frequency is stored up storage module 405 and is used to store the word frequency heapinfo, and confession is added up frequently and order module 404 is called and adjusted.The exclusive word frequency heap of different user identification number is stored in the different storage areas.
First communication module 407 is used for that word frequency is stored up storage module 405 active user's identification numbers exclusive word frequency heap data and sends to second communication module 411 in the server end 41.What it is pointed out that its transmission is not to be complete word frequency heap data.In a preferred embodiment, be the data of preceding 100 words of word frequency heap; In other embodiment, also can be the word data of preceding 50 or other quantity.Also be used to receive the directed information array of second communication module 411 transmissions and send directed information output module 406 to.
Directed information output module 406 is used to receive the directed information array and exports to the user.
Second communication module 411 is used to receive the word frequency heap data that first communication module 407 sends, and sends into directed information matching module 412 and mate.The directed information array that also is used for directed information matching module 412 is sent sends to first communication module 407.
Directed information matching module 412 receives the word frequency heap data that second communication module 411 sends, and each node of traversal word frequency heap, search the directed information that the word with user's input is complementary successively according to word frequency ordering from the directed information tabulation, the directed information with coupling is aggregated into a directed information array and sends back second communication module 411 then.
Directed information list storage module 413 stores the directed information tabulation, searches for directed information matching module 412 and calls.Directed information tabulation comprise various directed informations and with the word of this directed information coupling, directed information can with a plurality of word match, word also can with many directed informations couplings.
The word of user's input is gathered by input method by directed method for pushing of above-mentioned information and system, is not subjected to the limitation of chat conversations frame, has improved the comprehensive of user's input feature vector extraction; Thereby can improve the accuracy of the information that pushes, the information user of propelling movement more needs, interested.Judge by the interface and the dictionary that call input method whether the speech of input is noun, improved the accuracy that user's input feature vector extracts.Adopt big root heap the word frequency data are added up and to sort, time complexity is O (nlogn), and complexity is low, ordering efficient height.Only the higher word of word frequency is mated, can catch user's point-of-interest more accurately, improve the accuracy of the information that pushes.
The above embodiment has only expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but can not therefore be interpreted as the restriction to claim of the present invention.Should be pointed out that for the person of ordinary skill of the art without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.

Claims (10)

1. the directed method for pushing of information comprises the steps:
Gather user's input information, described input information comprises the word by the input method input;
According to described word statistics word frequency, described word is sorted according to word frequency;
Search and sort after the directed information that is complementary of word;
The directed information of described coupling is exported to the user.
2. the directed method for pushing of information according to claim 1 is characterized in that the step of described statistics word frequency is: judge whether the user is noun by the word of input method input, if then add up word frequency according to the noun of user's input.
3. the directed method for pushing of information according to claim 1 and 2 is characterized in that, adopts the word frequency of the word of the big root heap statistics user input line ordering of going forward side by side, the word and the word frequency thereof of each node statistics user input of described big heap.
4. the directed method for pushing of information according to claim 1 and 2 is characterized in that described step of searching directed information specifically is only 50~100 the highest words of word frequency to be searched.
5. the directed method for pushing of information according to claim 1 is characterized in that described user's input information also comprises customer identification number, and described customer identification number is corresponding only to be used to add up the big root heap that the user imports word and word frequency thereof.
6. the directed supplying system of information is characterized in that, comprises input acquisition module, word frequency statistics and order module, directed information matching module and directed information output module;
The input acquisition module is used to gather user's input information, and described input information comprises the word by the input method input;
Word frequency statistics and order module are added up the word frequency of the word of input, and according to word frequency described word are sorted;
The directed information matching module search and sort after the directed information that is complementary of word;
The directed information output module receives the directed information that described directed information matching module finds, and this directed information is exported to the user.
7. the directed supplying system of information according to claim 6, it is characterized in that, described word frequency statistics and order module adopt the word frequency of the word of the big root heap statistics user input line ordering of going forward side by side, the word and the word frequency thereof of each node statistics user input of described big heap.
8. the directed supplying system of information according to claim 7 is characterized in that this system also comprises the part of speech judge module, and described part of speech judge module is used to judge whether the word that described input acquisition module collects is noun; If, then this word is transferred in word frequency statistics and the order module, adopt big its word frequency of root heap statistics, according to word frequency described word is sorted then.
9. the directed supplying system of information according to claim 8, it is characterized in that, the input information of described input acquisition module collection also comprises customer identification number, described customer identification number is corresponding only to be used to add up the big root heap that the user imports word and word frequency thereof, and described system comprises that also the word frequency that is used to store described big heap stores up the storage module.
10. according to claim 6 or the directed supplying system of 7 described information, it is characterized in that described directed information matching module is only searched 50~100 the highest words of word frequency.
CN2010100428181A 2010-01-13 2010-01-13 Method and system for directional push of information Pending CN102129440A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010100428181A CN102129440A (en) 2010-01-13 2010-01-13 Method and system for directional push of information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010100428181A CN102129440A (en) 2010-01-13 2010-01-13 Method and system for directional push of information

Publications (1)

Publication Number Publication Date
CN102129440A true CN102129440A (en) 2011-07-20

Family

ID=44267526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010100428181A Pending CN102129440A (en) 2010-01-13 2010-01-13 Method and system for directional push of information

Country Status (1)

Country Link
CN (1) CN102129440A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103716223A (en) * 2012-09-28 2014-04-09 北京网秦天下科技有限公司 Information pushing method and system
CN104794122A (en) * 2014-01-20 2015-07-22 腾讯科技(北京)有限公司 Position information recommending method, device and system
CN104933028A (en) * 2015-06-23 2015-09-23 百度在线网络技术(北京)有限公司 Information pushing method and information pushing device
WO2017124392A1 (en) * 2016-01-21 2017-07-27 阮元 Data sending method for technology of recommending resources according to user habits, and recommendation system
CN108628461A (en) * 2017-03-16 2018-10-09 北京搜狗科技发展有限公司 A kind of input method and device, a kind of method and apparatus of update dictionary
CN110147433A (en) * 2019-05-21 2019-08-20 北京鸿联九五信息产业有限公司 A kind of text template extracting method based on dictionary tree
CN110222256A (en) * 2019-05-06 2019-09-10 北京搜狗科技发展有限公司 A kind of information recommendation method, device and the device for information recommendation

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103716223A (en) * 2012-09-28 2014-04-09 北京网秦天下科技有限公司 Information pushing method and system
CN104794122A (en) * 2014-01-20 2015-07-22 腾讯科技(北京)有限公司 Position information recommending method, device and system
WO2015106644A1 (en) * 2014-01-20 2015-07-23 Tencent Technology (Shenzhen) Company Limited Method and system for providing recommendations during a chat session
US10142266B2 (en) 2014-01-20 2018-11-27 Tencent Technology (Shenzhen) Company Limited Method and system for providing recommendations during a chat session
CN104794122B (en) * 2014-01-20 2020-04-17 腾讯科技(北京)有限公司 Position information recommendation method, device and system
CN104933028A (en) * 2015-06-23 2015-09-23 百度在线网络技术(北京)有限公司 Information pushing method and information pushing device
WO2016206210A1 (en) * 2015-06-23 2016-12-29 百度在线网络技术(北京)有限公司 Information pushing method and device
WO2017124392A1 (en) * 2016-01-21 2017-07-27 阮元 Data sending method for technology of recommending resources according to user habits, and recommendation system
CN108628461A (en) * 2017-03-16 2018-10-09 北京搜狗科技发展有限公司 A kind of input method and device, a kind of method and apparatus of update dictionary
CN108628461B (en) * 2017-03-16 2022-07-08 北京搜狗科技发展有限公司 Input method and device and method and device for updating word stock
CN110222256A (en) * 2019-05-06 2019-09-10 北京搜狗科技发展有限公司 A kind of information recommendation method, device and the device for information recommendation
CN110147433A (en) * 2019-05-21 2019-08-20 北京鸿联九五信息产业有限公司 A kind of text template extracting method based on dictionary tree

Similar Documents

Publication Publication Date Title
CN102129440A (en) Method and system for directional push of information
CN103634473B (en) Based on mobile phone method for filtering spam short messages and the system of Naive Bayes Classification
CN102426610B (en) Microblog rank searching method and microblog searching engine
CN106407484B (en) Video tag extraction method based on barrage semantic association
CN106982150B (en) Hadoop-based mobile internet user behavior analysis method
CN105138558B (en) The real time individual information collecting method of content is accessed based on user
WO2022257436A1 (en) Data warehouse construction method and system based on wireless communication network, and device and medium
CN101231660A (en) System and method for digging key information of telephony nature conversation
CN103076892A (en) Method and equipment for providing input candidate items corresponding to input character string
CN105740337A (en) Rapid event matching method in content-based publishing subscription system
CN103455593B (en) A kind of service competition based on social networks realizes system and method
WO2017045415A1 (en) Content delivery method and device
CN101339560B (en) Method and device for searching series data, and search engine system
CN104615734B (en) A kind of community management service big data processing system and its processing method
CN108427774A (en) A kind of method and apparatus for commending contents
CN102509001A (en) Method for automatically removing time sequence data outlier point
CN101853280B (en) Method for searching for contacts in hand-held equipment
CN110245289A (en) A kind of information search method and relevant device
CN103902599A (en) Fuzzy search method and fuzzy search device
CN102663083A (en) Large-scale social network information extraction method based on distributed computation
CN101251853A (en) System and method for digging user attribute based on user interactive records
CN106933864A (en) A kind of search engine system and its searching method
CN106372083B (en) A kind of method and system that controversial news clue is found automatically
CN104462347A (en) Keyword classifying method and device
CN106411704A (en) Distributed junk short message recognition method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110720