CN107943895A - Information-pushing method and device - Google Patents

Information-pushing method and device Download PDF

Info

Publication number
CN107943895A
CN107943895A CN201711140407.4A CN201711140407A CN107943895A CN 107943895 A CN107943895 A CN 107943895A CN 201711140407 A CN201711140407 A CN 201711140407A CN 107943895 A CN107943895 A CN 107943895A
Authority
CN
China
Prior art keywords
user
result
keyword
algorithm
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711140407.4A
Other languages
Chinese (zh)
Inventor
孙健
康建峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201711140407.4A priority Critical patent/CN107943895A/en
Publication of CN107943895A publication Critical patent/CN107943895A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The embodiment of the present application discloses information-pushing method and device.One embodiment of this method includes:Obtain the search data of multiple users;The keyword in the search data of each user is extracted respectively, generates the corresponding keyword feature vector of each user respectively;By the Input matrix of the corresponding keyword feature vector composition of each user to topic model algorithm, the corresponding subject classification result of each user is obtained;The corresponding subject classification result of multiple users is clustered, obtain cluster result, and the distribution according to the corresponding probability of each theme in cluster result, determine target topic, the associated information of target topic is pushed to the user belonging to the subject classification result in cluster result.The accuracy rate of information push can be improved using method provided in this embodiment.

Description

Information-pushing method and device
Technical field
The invention relates to field of computer technology, and in particular to Internet technical field, more particularly to a kind of letter Cease method for pushing and device.
Background technology
With the popularization of internet, more and more users are searched and are browsed information using internet.
Usual user by inputting network address or can search and browse relevant information by search engine.Above-mentioned user can To include the user of each type, such as computer major technology class user, specialty chemical class user, amusement class user etc..
The content of the invention
The embodiment of the present application proposes a kind of information-pushing method and device.
In a first aspect, this application provides a kind of information-pushing method, this method includes:Obtain the search number of multiple users According to search data include:The title for the corresponding search result of search type that search type input by user, user click on;Carry respectively Take out the keyword in the search data of each user, and generate respectively the corresponding keyword feature of each user to Amount, wherein, the word frequency of each component and a keyword in keyword feature vector corresponds;By each user couple The Input matrix for the keyword feature vector composition answered obtains the corresponding subject classification knot of each user to topic model algorithm Fruit, wherein, the corresponding subject classification result of a user includes:User belongs to the general of each theme in multiple preset themes Rate;The corresponding subject classification result of multiple users is clustered, obtains cluster result, and according to each in cluster result The distribution of the corresponding probability of a theme, determines target topic, the master associated information of target topic being pushed in cluster result Inscribe the user belonging to classification results.
Second aspect, this application provides a kind of information push-delivery apparatus, which includes acquiring unit, is configured to obtain The search data of multiple users, search data include:The corresponding search knot of search type that search type input by user, user click on The title of fruit;Extraction unit, is configured to extract the keyword in the search data of each user respectively, and gives birth to respectively Into the corresponding keyword feature vector of each user, wherein, each component and a key in keyword feature vector The word frequency of word corresponds;Input unit, is configured to the matrix of the corresponding keyword feature vector composition of each user Topic model algorithm is input to, obtains the corresponding subject classification of each user as a result, wherein, the corresponding theme of a user divides Class result includes:User belongs to the probability of each theme in multiple preset themes;Determination unit, is configured to multiple users Corresponding subject classification result is clustered, and obtains cluster result, and corresponding according to each theme in cluster result The distribution of probability, determines target topic, the subject classification result institute associated information of target topic being pushed in cluster result The user of category.
The third aspect, this application provides a kind of server, including one or more processors;Storage device, for depositing The one or more programs of storage, when one or more programs are executed by one or more processors so that one or more processors Realize the information-pushing method of first aspect.
Fourth aspect, this application provides a kind of computer-readable recording medium, is stored thereon with computer program, the journey The information-pushing method such as first aspect is realized when sequence is executed by processor.
The information-pushing method and device that the application provides, by obtaining the search data of multiple users, then carry respectively Take out the keyword in the search data of each user, and generate respectively the corresponding keyword feature of each user to Amount, then obtains the Input matrix of the corresponding keyword feature vector composition of each user each to topic model algorithm The corresponding subject classification of a user is gathered as a result, finally clustered to the corresponding subject classification result of multiple users Class as a result, and the distribution according to the corresponding probability of each theme in cluster result, determine target topic, target topic associated Information be pushed to user belonging to the subject classification result in cluster result.It can be passed through using method provided in this embodiment Extract user and search for keyword in data, and the keyword feature vector generated by keyword drops in multiple users Peacekeeping cluster analysis obtains the corresponding target topic of each user, according to the corresponding target topic of user to user's pushed information, The accuracy rate of information push can be improved.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figure therein;
Fig. 2 is the flow chart according to one embodiment of the information-pushing method of the application;
Fig. 3 is the schematic diagram according to an application scenarios of the information-pushing method of the application;
Fig. 4 is the structure diagram according to one embodiment of the information push-delivery apparatus of the application;
Fig. 5 is adapted for the structure diagram of the computer system of the server for realizing the embodiment of the present application.
Embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to It illustrate only easy to describe, in attached drawing and invent relevant part with related.
It should be noted that in the case where there is no conflict, the feature in embodiment and embodiment in the application can phase Mutually combination.Describe the application in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the exemplary system of the embodiment of the information-pushing method that can apply the application or information push-delivery apparatus System framework 100.
As shown in Figure 1, system architecture 100 can include terminal device 101,102,103, server 104 and network 105. Network 105 between terminal device 101,102,103 and server 104 provide communication link medium.Network 105 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted with using terminal equipment 101,102,103 by network 105 with server 104, to receive or send out Send message etc..Various client applications can be installed on terminal device 101,102,103, such as web browser applications, searched Rope class application etc..
User can be interacted with using terminal equipment 101,102,103 by network 105 with server 104, to receive or send out Send message etc..Various client applications can be installed on terminal device 101,102,103, such as web browser applications, searched Rope class application etc..
Terminal device 101,102,103 can have a display screen and a various electronic equipments that supported web page browses, bag Include but be not limited to smart mobile phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 104 can be to provide the server of various services, such as to being shown on terminal device 101,102,103 Webpage provides the backstage web page server supported.Backstage web page server can dock the data such as received access request and be divided The processing such as analysis, and handling result (such as webpage data) is fed back into terminal device 101,102,103.
It should be noted that the information-pushing method that the embodiment of the present application is provided generally is performed by server 104, accordingly Ground, information push-delivery apparatus are generally positioned in server 104.
It should be understood that the terminal device, network, the number of server in Fig. 1 are only schematical.According to realizing need Will, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the flow 200 of information-pushing method one embodiment according to the application is shown.The information Method for pushing, comprises the following steps:
Step 201, the search data of multiple users are obtained.
In general, the electronic equipment (such as server 104 shown in Fig. 1) of information-pushing method operation thereon can pass through Network receives the search type that user is inputted by terminal device (such as terminal device 101,102,103 shown in Fig. 1).Herein, Search type can include but is not limited to word, character, numeral.Above-mentioned electronic equipment can according to the search type of the input of user to User shows multiple search results.User can open the corresponding webpage of search result by clicking on the title of search result, from And browsed web content.
In the present embodiment, above-mentioned electronic equipment (such as server 104 shown in Fig. 1) can obtain searching for multiple users Rope data.The search type that the search data of each user can include the search type of the user's input, the user clicks on corresponds to Search result title.Preferably, above-mentioned electronic equipment can obtain multiple users in predetermined amount of time (such as 10 days) Search data.
In application scenes, above-mentioned electronic equipment can obtain the search number of multiple users of specified professional website According to.
Step 202, the keyword in the search data of each user is extracted respectively, and generates each use respectively The corresponding keyword feature vector in family.
In the present embodiment, above-mentioned electronic equipment (such as server 104 shown in Fig. 1) gets more in step 201 After the search data of a user, the search data of each user in search data that can be to multiple users are analyzed Handle the keyword in the search data to extract each user.
Specifically, can above-mentioned electronic equipment with each the user's predetermined amount of time got search for data carry out word Language cutting.When carrying out word segmentation can use common word segmentation method (such as mechanical segmentation method, based on understanding Segmenting method, segmenting method based on statistics etc.) come the search data to each user carry out word segmentation.
After the search data of a user are cut into word, above-mentioned electronic equipment can according to the part of speech of each word come The corresponding keyword of search data is chosen, such as noun can be chosen as the keyword in the search data of the user.
So, above-mentioned electronic equipment can extract in the search data of each user in multiple users respectively Keyword.
After the keyword during above-mentioned electronic equipment extracts the search data of each user, above-mentioned electronic equipment can To generate the corresponding keyword feature vector of each user respectively.Keyword feature vector is each corresponded to by multiple keywords Word frequency (Term Frequency, TF) form.
Each component in above-mentioned keyword feature vector can correspond to the word frequency of a keyword.Here word frequency is Refer in the search data of a user, the frequency that some keyword occurs in the search data of the user.
In some optional implementations of the present embodiment, the user can be calculated based on word frequency-inverse document frequency algorithm The word frequency of each keyword in corresponding multiple keywords.
In application scenes, after the keyword in extracting the search data of each user, above-mentioned electronics Equipment can count the keyword that the search data of multiple users are related to, and according to the search number of the multiple users counted Feature dictionary is generated according to the keyword being related to.The identical keyword that the search data of different users are related to does not repeat to unite Meter.By each keyword involved by the search data of the above-mentioned multiple users counted according to one in features described above dictionary It is fixed sequentially to be arranged.So, each corresponding keyword of each user can correspond to one in feature dictionary Unique sequence number.
For multiple keywords in the search data of a user, it is corresponding that above-mentioned electronic equipment can generate the user Keyword feature is vectorial, and the word frequency of each component and a keyword in keyword feature vector corresponds.
In application scenes, it is assumed that in features described above dictionary the quantity of included keyword for N (N is more than zero, and N is positive integer), then the N-dimensional keyword feature that can generate each corresponding user is vectorial.The N-dimensional keyword of one user is special The sequence number (row number) of one-component in sign vector can be corresponded with the sequence number of a keyword in feature dictionary.When one When not including the keyword of a sequence number in features described above dictionary in the keyword of the search extracting data of a user, the use Component corresponding word frequency of the corresponding keyword feature vector in family at the sequence number is zero.
Step 203, by the Input matrix of each user corresponding keyword feature vector composition to topic model algorithm, Obtain the corresponding subject classification result of each user.
In the present embodiment, above-mentioned topic model algorithm is a kind of matrix decomposition algorithm, which can incite somebody to action One high level matrix is decomposed into two low order matrix.
Such as can for M, (M be positive integer more than zero, M, and M taking human as the quantity of theme is preset<N), by above-mentioned Topic model algorithm, can be decomposed the matrix of the corresponding N-dimensional keyword feature vector composition of multiple users.Assuming that The quantity of user is L (L is more than zero, and L be positive integer), then by topic model algorithm after, L × N ranks matrixes can be divided Solve as L × M ranks matrix and M × N rank matrixes.Further, it is also possible to predetermined mark is set to each theme in advance.Here theme Mark for example can be each professional subdivision domain name.
Above-mentioned electronic equipment is by the Input matrix of each user corresponding keyword feature vector composition to topic model Algorithm.When number of users is L, when keyword feature vector is N-dimensional, then the corresponding keyword feature of above-mentioned each user is vectorial The matrix of composition is L × N rank matrixes.Above-mentioned L × N ranks matrix is broken down into L × M rank squares after by topic model algorithm Battle array and M × N rank matrixes.M herein can be the quantity of preset themes.
In the present embodiment, L × M ranks matrix can be chosen and carry out subsequent treatment.It can consider in above-mentioned L × M ranks matrix It is made of L M dimensional vector.Each M dimensional vector and a user correspond.M dimensional vectors corresponding with a user can be with It is considered subject classification result corresponding with the user.So, the corresponding keyword feature of each user is vectorial The Input matrix of composition can obtain the corresponding subject classification result of each user to topic model algorithm.One user couple The subject classification result answered can include the probability distribution for each theme that the user belongs in multiple preset themes.
So, the corresponding N-dimensional keyword feature vector of each user is changed after by topic model algorithm For a M dimensional vector, so as to fulfill the dimensionality reduction of keyword feature vector.
Above-mentioned topic model algorithm can be implicit Di Li Crays distribution (Latent Dirichlet Allocation, LDA) algorithm or Non-negative Matrix Factorization (Non-negative Matrix Factorization, NMF) algorithm.Above-mentioned LDA Algorithm, NMF algorithms are widely studied at present and application known technologies, and details are not described herein.
Step 204, the corresponding subject classification result of multiple users is clustered, obtains cluster result, Yi Jigen According to the distribution of the corresponding probability of each theme in cluster result, target topic is determined, the information of target main association is pushed to poly- The user belonging to subject classification result in class result.
In the present embodiment, above-mentioned electronic equipment can be by calculating the corresponding subject classification knot of any two user The distance between multidimensional characteristic vectors corresponding to fruit (such as Euclidean distance) are realized to multiple users corresponding theme point Class result is clustered.Specifically, the distance between corresponding subject classification result of any two user can be less than The corresponding subject classification result of multiple users of predetermined distance threshold is gathered for one kind.So, multiple users are each right The subject classification result answered obtains multiple cluster results after cluster.
In some optional implementations of the present embodiment, above-mentioned output result corresponding to multiple users is gathered Class, can include the use of default clustering algorithm and the corresponding subject classification result of multiple users is clustered.
Further, above-mentioned default clustering algorithm can be one of the following:K mean cluster (K-means) algorithm, Gauss mix Hop algorithm (Gaussian Mixture Model, GMM), Di Li Cray process hybrid algorithms (Dirichlet Process Mixture Model, DPMM), synthesis cluster (Agglomerative Hierarchical Clustering, AHC) algorithm. It is understood that above-mentioned default clustering algorithm can also be other clustering algorithms, details are not described herein again.
For a cluster result, above-mentioned electronic equipment can be according to the corresponding probability of each theme in the cluster result Distribution, determines the target topic of the cluster result.Such as the smoother preset themes conduct of distribution curve of probability can be chosen The target topic of the cluster result.Above-mentioned electronic equipment can be using the corresponding target topic of a cluster result as the cluster knot The corresponding target topic of the corresponding each user of fruit.Target topic includes but not limited to internet exploitation theme, front end herein Develop theme, rear end exploitation theme, distributed algorithm class theme, artificial intelligence theme, amusement and recreation theme etc..
After the theme corresponding to each user has been obtained, above-mentioned electronic equipment can be to each user (user Used terminal device) the corresponding associated information of target topic of push the user, so as to fulfill the standard of information push is improved True rate.
With continued reference to Fig. 3, Fig. 3 is a schematic diagram of the application scenarios of the information-pushing method of the present embodiment.Fig. 3's In application scenarios 300, above-mentioned electronic equipment 301 obtains the search data 302 of multiple users.Above-mentioned electronic equipment 301 extracts often Keyword in the search data of one user, and generate the corresponding keyword feature vector 303 of each user.Then will be each The Input matrix of the corresponding keyword feature vector composition of a user obtains the corresponding master of each user to topic model algorithm Inscribe classification results 304.Then the corresponding subject classification result of multiple users will be clustered, according to a cluster result In the probability distribution of each theme determine target topic 305.The push of terminal device 307 that above-mentioned electronic equipment is used to user with The associated information 306 of the corresponding target topic of the user, so as to realize that targeted information pushes, improves the standard of information push True rate.
Information-pushing method provided in this embodiment, by obtaining the search data of multiple users first, then carries respectively Take out the keyword in the search data of each user, and generate respectively the corresponding keyword feature of each user to Amount, then obtains the Input matrix of the corresponding keyword feature vector composition of each user each to topic model algorithm The corresponding subject classification of a user is gathered as a result, finally clustered to the corresponding subject classification result of multiple users Class as a result, and the distribution according to the corresponding probability of each theme in cluster result, target topic is determined, by target main association Information is pushed to the user belonging to the subject classification result in cluster result.Can be by carrying using method provided in this embodiment The keyword in family search data is taken, and dimensionality reduction is carried out to the keyword feature vector generated by keyword to multiple users The corresponding target topic of each user is obtained with cluster analysis, can according to the corresponding target topic of user to user's pushed information To improve the accuracy rate of information push.
In some optional implementations of the present embodiment, extract each user's respectively in above-mentioned steps 202 The keyword in data is searched for, can be included:For each user, the search type inputted based on predetermined dictionary from the user, Multiple keywords are extracted in the title for the corresponding search result of search type that the user clicks on, wherein the predetermined dictionary includes Multiple corresponding technical terms of theme.Theme herein for example can be each professional subdivision field.Use predetermined dictionary pair The search data extraction keyword of user, can cause technical term input by user not to be split into the word of smaller particle size, Be conducive to follow-up topic model Algorithm Analysis and cluster analysis.
In some optional implementations of the present embodiment, generation respectively each user in step 202 is corresponding Keyword feature vector, can include:For each user, above-mentioned electronic equipment can be based on word frequency-inverse document frequency (Term Frequency-Inverse Document Frequency, TF-IDF) algorithm calculates the corresponding multiple passes of the user The word frequency of each keyword in keyword.It can accelerate the calculating speed of word frequency.
With further reference to Fig. 4, as the realization to method shown in above-mentioned each figure, this application provides a kind of push of information to fill The one embodiment put, the device embodiment is corresponding with the embodiment of the method shown in Fig. 2, which specifically can be applied to respectively In kind electronic equipment.
As shown in figure 4, the information push-delivery apparatus 400 of the present embodiment includes:Acquiring unit 401, extraction unit 402, input Unit 403 and determination unit 404.Wherein, acquiring unit 401, are configured to obtain the search data of multiple users, described search Data include:The title for the corresponding search result of search type that search type input by user, user click on;Extraction unit 402, matches somebody with somebody Put for extract each user respectively search data in keyword, and generate the corresponding pass of each user respectively Keyword feature vector, wherein, the word frequency of each component and a keyword in the keyword feature vector corresponds; Input unit 403, is configured to the Input matrix of the corresponding keyword feature vector composition of each user to topic model Algorithm, obtains the corresponding subject classification of each user as a result, wherein, the corresponding subject classification result of a user includes:Institute State the probability for each theme that user belongs in multiple preset themes;Determination unit 404, is configured to determination unit, and configuration is used Clustered in the corresponding subject classification result of multiple users, obtain cluster result, and according to each in cluster result The distribution of the corresponding probability of a theme, determines target topic, the master associated information of target topic being pushed in cluster result Inscribe the user belonging to classification results.
In the present embodiment, the acquiring unit 401 of information push-delivery apparatus 400, extraction unit 402, input unit 403 and really The specific processing of order member 404 and its caused technique effect can correspond to step 201, step in embodiment with reference to figure 2 respectively 202nd, the related description of step 203 and step 204, details are not described herein.
In some optional implementations of the present embodiment, extraction unit 402 described above is further configured to:For Each user, the webpage text that the search type inputted based on predetermined dictionary from the user, the user are browsed in the predetermined website Multiple keywords are extracted in this, wherein the predetermined dictionary includes the corresponding technical term of multiple themes.
In some optional implementations of the present embodiment, the extraction unit 402 is further configured to:For each A user, the word frequency of each keyword in the corresponding multiple keywords of the user is calculated based on word frequency-inverse document frequency algorithm.
In some optional implementations of the present embodiment, above-mentioned determination unit 404 is further configured to:Using default Clustering algorithm clusters the corresponding subject classification result of multiple users.
In some optional implementations of the present embodiment, above-mentioned default clustering algorithm can be one of the following:K averages are gathered Class algorithm, Gaussian Mixture algorithm, Di Li Cray processes hybrid algorithm, synthesis clustering algorithm.
Fig. 5 is refer to, it illustrates the structural representation of the computer system 500 for the server for being applicable in the embodiment of the present application Figure.
As shown in figure 5, computer system 500 includes central processing unit (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into program in random access storage device (RAM) 503 from storage part 508 and Perform various appropriate actions and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data. CPU501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always Line 504.
I/O interfaces 505 are connected to lower component:Importation 506 including keyboard, mouse etc.;Including such as liquid crystal Show the output par, c 507 of device (LCD) and loudspeaker etc.;Storage part 508 including hard disk etc.;And including such as LAN card, The communications portion 509 of the network interface card of modem etc..Communications portion 509 performs communication via the network of such as internet Processing.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as disk, CD, magneto-optic disk, Semiconductor memory etc., is installed on driver 510 as needed, in order to which the computer program that reads from it is according to need It is mounted into storage part 508.
Especially, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product, it includes being carried on computer-readable medium On computer program, the computer program include be used for execution flow chart shown in method program code.In such reality Apply in example, which can be downloaded and installed by communications portion 509 from network, and/or from detachable media 511 are mounted.When the computer program is performed by central processing unit (CPU) 501, perform what is limited in the present processes Above-mentioned function.It should be noted that the computer-readable medium of the application can be computer-readable signal media or calculating Machine readable storage medium storing program for executing either the two any combination.Computer-readable recording medium for example can be --- but it is unlimited In system, server or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or it is any more than combination.Meter The more specifically example of calculation machine readable storage medium storing program for executing can include but is not limited to:Electrical connection with one or more conducting wires, just Take formula computer disk, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type and may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this application, computer-readable recording medium can any include or store journey The tangible medium of sequence, the program can be commanded execution system, server either device use or overdue combined use.And In the application, computer-readable signal media can include believing in a base band or as the data that a carrier wave part is propagated Number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, including but not It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer Any computer-readable medium beyond readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use In by instruction execution system, server either device use or overdue combined use program.Wrapped on computer-readable medium The program code contained can be transmitted with any appropriate medium, be included but not limited to:Wirelessly, electric wire, optical cable, RF etc., or Above-mentioned any appropriate combination.
Flow chart and block diagram in attached drawing, it is illustrated that according to the system of the various embodiments of the application, method and computer journey Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation The part of one module of table, program segment or code, the part of the module, program segment or code include one or more use In the executable instruction of logic function as defined in realization.It should also be noted that marked at some as in the realization replaced in square frame The function of note can also be with different from the order marked in attached drawing generation.For example, two square frames succeedingly represented are actually It can perform substantially in parallel, they can also be performed in the opposite order sometimes, this is depending on involved function.Also to note Meaning, the combination of each square frame and block diagram in block diagram and/or flow chart and/or the square frame in flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit can also be set within a processor, for example, can be described as:A kind of processor bag Include acquiring unit, extraction unit, input unit and determination unit.Wherein, the title of these units is not under certain conditions The restriction to the unit in itself is formed, for example, acquiring unit is also described as " obtaining the list of the search data of multiple users Member ".
As on the other hand, present invention also provides a kind of computer-readable medium, which can be Included in device described in above-described embodiment;Can also be individualism, and without be incorporated the device in.Above-mentioned calculating Machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the device so that should Device:The search data of multiple users are obtained, search data include:The search type pair that search type input by user, user click on The title for the search result answered;The keyword in the search data of each user is extracted respectively, and is generated respectively each The corresponding keyword feature vector of a user, wherein, each component and a keyword in the keyword feature vector Word frequency correspond;By the Input matrix of each user corresponding keyword feature vector composition to topic model algorithm, The corresponding subject classification of each user is obtained as a result, wherein, the corresponding subject classification result of a user includes:The user Belong to the probability of each theme in multiple preset themes;The corresponding subject classification result of multiple users is clustered, Cluster result, and the distribution according to the corresponding probability of each theme in cluster result are obtained, determines target topic, by target master Inscribe the user belonging to the subject classification result that associated information is pushed in cluster result.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to the technology that the particular combination of above-mentioned technical characteristic forms Scheme, while should also cover in the case where not departing from inventive concept, carried out by above-mentioned technical characteristic or its equivalent feature any The other technical solutions for combining and being formed.Such as features described above has similar functions with (but not limited to) disclosed herein The technical solution that technical characteristic is replaced mutually and formed.

Claims (12)

1. a kind of information-pushing method, including:
The search data of multiple users are obtained, described search data include:The search type that search type input by user, user click on The title of corresponding search result;
The keyword in the search data of each user is extracted respectively, and generates the corresponding key of each user respectively Word feature vector, wherein, the word frequency of each component and a keyword in the keyword feature vector corresponds;
By the Input matrix of the corresponding keyword feature vector composition of each user to topic model algorithm, each use is obtained The corresponding subject classification in family is as a result, wherein, the corresponding subject classification result of a user includes:The user belongs to multiple default The probability of each theme in theme;
The corresponding subject classification result of multiple users is clustered, obtains cluster result, and according in cluster result The distribution of the corresponding probability of each theme, determines target topic, the associated information of target topic is pushed in cluster result User belonging to subject classification result.
2. according to the method described in claim 1, wherein, the key in the search data for extracting each user respectively Word, including:
For each user, the search type that the search type inputted based on predetermined dictionary from the user, the user are clicked on is corresponding Multiple keywords are extracted in the title of search result, wherein the predetermined dictionary includes the corresponding professional art of multiple themes Language.
3. according to the method described in claim 1, wherein, it is described generate respectively the corresponding keyword feature of each user to Amount, including:
For each user, each in the corresponding multiple keywords of the user is calculated based on word frequency-inverse document frequency algorithm The word frequency of keyword.
It is 4. described that the corresponding subject classification result of multiple users is gathered according to the method described in claim 1, wherein Class, including:
The corresponding subject classification result of multiple users is clustered using default clustering algorithm.
5. according to the method described in claim 4, wherein, the default clustering algorithm is one of the following:
K mean cluster algorithm, Gaussian Mixture algorithm, Di Li Cray processes hybrid algorithm, synthesis clustering algorithm.
6. a kind of information push-delivery apparatus, including:
Acquiring unit, is configured to obtain the search data of multiple users, and described search data include:Search input by user The title for the corresponding search result of search type that formula, user click on;Extraction unit, is configured to extract each user respectively Search data in keyword, and generate each user corresponding keyword feature vector respectively, wherein, the key The word frequency of each component and a keyword in word feature vector corresponds;
Input unit, is configured to the Input matrix of the corresponding keyword feature vector composition of each user to theme respectively Model algorithm, obtains the corresponding subject classification of each user as a result, wherein, the corresponding subject classification result bag of a user Include:The user belongs to the probability of each theme in multiple preset themes;
Determination unit, is configured to cluster the corresponding subject classification result of multiple users, obtains cluster result, with And the distribution according to the corresponding probability of each theme in cluster result, determine target topic, the corresponding information of target topic is pushed away Give the user belonging to the subject classification result in cluster result.
7. device according to claim 6, wherein, the extraction unit is further configured to:
For each user, the search type inputted based on predetermined dictionary from the user, the user are browsed in the predetermined website Web page text in extract multiple keywords, wherein the predetermined dictionary includes the corresponding technical term of multiple themes.
8. device according to claim 6, wherein, the extraction unit is further configured to:
For each user, each in the corresponding multiple keywords of the user is calculated based on word frequency-inverse document frequency algorithm The word frequency of keyword.
9. device according to claim 6, wherein, the determination unit is further configured to:
Clustered using default clustering algorithm output result corresponding to multiple users.
10. device according to claim 9, wherein, the default clustering algorithm is one of the following:
K mean cluster algorithm, Gaussian Mixture algorithm, Di Li Cray processes hybrid algorithm, synthesis clustering algorithm.
11. a kind of server, including:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are performed by one or more of processors so that one or more of processors Realize the method as described in any in claim 1-5.
12. a kind of computer-readable recording medium, is stored thereon with computer program, wherein, when which is executed by processor Realize the method as described in any in claim 1-5.
CN201711140407.4A 2017-11-16 2017-11-16 Information-pushing method and device Pending CN107943895A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711140407.4A CN107943895A (en) 2017-11-16 2017-11-16 Information-pushing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711140407.4A CN107943895A (en) 2017-11-16 2017-11-16 Information-pushing method and device

Publications (1)

Publication Number Publication Date
CN107943895A true CN107943895A (en) 2018-04-20

Family

ID=61932719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711140407.4A Pending CN107943895A (en) 2017-11-16 2017-11-16 Information-pushing method and device

Country Status (1)

Country Link
CN (1) CN107943895A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109218771A (en) * 2018-10-29 2019-01-15 百度在线网络技术(北京)有限公司 A kind of recommended method of video program, device, electronic equipment and storage medium
CN110413875A (en) * 2019-06-26 2019-11-05 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of text information push
CN110766488A (en) * 2018-07-25 2020-02-07 北京京东尚科信息技术有限公司 Method and device for automatically determining theme scene
CN111552851A (en) * 2020-04-24 2020-08-18 浙江每日互动网络科技股份有限公司 Type determination method and device, equipment and computer readable storage medium
CN111782801A (en) * 2019-05-17 2020-10-16 北京京东尚科信息技术有限公司 Method and device for grouping keywords
CN112559853A (en) * 2019-09-26 2021-03-26 北京沃东天骏信息技术有限公司 User label generation method and device
CN113656584A (en) * 2021-08-18 2021-11-16 维沃移动通信有限公司 User classification method and device, electronic equipment and storage medium
CN115526173A (en) * 2022-10-12 2022-12-27 湖北大学 Feature word extraction method and system based on computer information technology

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332006A (en) * 2011-08-03 2012-01-25 百度在线网络技术(北京)有限公司 Information push control method and device
CN102521248A (en) * 2011-11-14 2012-06-27 北京亿赞普网络技术有限公司 Network user classification method and device
CN103970863A (en) * 2014-05-08 2014-08-06 清华大学 Method and system for excavating interest of microblog users based on LDA theme model
CN104268290A (en) * 2014-10-22 2015-01-07 武汉科技大学 Recommendation method based on user cluster
US9129227B1 (en) * 2012-12-31 2015-09-08 Google Inc. Methods, systems, and media for recommending content items based on topics
CN105787770A (en) * 2016-04-27 2016-07-20 上海遥薇(集团)有限公司 Non-negative matrix factorization (NMF) algorithm-based big data commodity and service recommending method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332006A (en) * 2011-08-03 2012-01-25 百度在线网络技术(北京)有限公司 Information push control method and device
CN102521248A (en) * 2011-11-14 2012-06-27 北京亿赞普网络技术有限公司 Network user classification method and device
US9129227B1 (en) * 2012-12-31 2015-09-08 Google Inc. Methods, systems, and media for recommending content items based on topics
CN103970863A (en) * 2014-05-08 2014-08-06 清华大学 Method and system for excavating interest of microblog users based on LDA theme model
CN104268290A (en) * 2014-10-22 2015-01-07 武汉科技大学 Recommendation method based on user cluster
CN105787770A (en) * 2016-04-27 2016-07-20 上海遥薇(集团)有限公司 Non-negative matrix factorization (NMF) algorithm-based big data commodity and service recommending method and system

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110766488A (en) * 2018-07-25 2020-02-07 北京京东尚科信息技术有限公司 Method and device for automatically determining theme scene
CN109218771A (en) * 2018-10-29 2019-01-15 百度在线网络技术(北京)有限公司 A kind of recommended method of video program, device, electronic equipment and storage medium
CN111782801A (en) * 2019-05-17 2020-10-16 北京京东尚科信息技术有限公司 Method and device for grouping keywords
CN111782801B (en) * 2019-05-17 2024-02-06 北京京东尚科信息技术有限公司 Method and device for grouping keywords
CN110413875A (en) * 2019-06-26 2019-11-05 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of text information push
CN112559853A (en) * 2019-09-26 2021-03-26 北京沃东天骏信息技术有限公司 User label generation method and device
CN112559853B (en) * 2019-09-26 2024-01-12 北京沃东天骏信息技术有限公司 User tag generation method and device
CN111552851A (en) * 2020-04-24 2020-08-18 浙江每日互动网络科技股份有限公司 Type determination method and device, equipment and computer readable storage medium
CN113656584A (en) * 2021-08-18 2021-11-16 维沃移动通信有限公司 User classification method and device, electronic equipment and storage medium
CN115526173A (en) * 2022-10-12 2022-12-27 湖北大学 Feature word extraction method and system based on computer information technology

Similar Documents

Publication Publication Date Title
CN107943895A (en) Information-pushing method and device
CN107491534B (en) Information processing method and device
US11062089B2 (en) Method and apparatus for generating information
US11669579B2 (en) Method and apparatus for providing search results
CN107346336B (en) Information processing method and device based on artificial intelligence
CN108090162A (en) Information-pushing method and device based on artificial intelligence
US10579655B2 (en) Method and apparatus for compressing topic model
CN107220386A (en) Information-pushing method and device
CN110162767A (en) The method and apparatus of text error correction
CN106845999A (en) Risk subscribers recognition methods, device and server
CN106960030A (en) Pushed information method and device based on artificial intelligence
CN108628830A (en) A kind of method and apparatus of semantics recognition
US11651015B2 (en) Method and apparatus for presenting information
CN109376234A (en) A kind of method and apparatus of trained summarization generation model
CN109325121A (en) Method and apparatus for determining the keyword of text
CN107679119A (en) The method and apparatus for generating brand derivative words
CN109635094A (en) Method and apparatus for generating answer
CN106919711A (en) The method and apparatus of the markup information based on artificial intelligence
CN109582954A (en) Method and apparatus for output information
CN109165344A (en) Method and apparatus for pushed information
CN112052297B (en) Information generation method, apparatus, electronic device and computer readable medium
CN111753551A (en) Information generation method and device based on word vector generation model
CN109190123A (en) Method and apparatus for output information
CN110245357A (en) Principal recognition methods and device
CN110516033A (en) A kind of method and apparatus calculating user preference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination